android_kernel_samsung_sm8650

Author	SHA1	Message	Date
Kuniyuki Iwashima	d1e5e6408b	tcp: Introduce optional per-netns ehash. The more sockets we have in the hash table, the longer we spend looking up the socket. While running a number of small workloads on the same host, they penalise each other and cause performance degradation. The root cause might be a single workload that consumes much more resources than the others. It often happens on a cloud service where different workloads share the same computing resource. On EC2 c5.24xlarge instance (196 GiB memory and 524288 (1Mi / 2) ehash entries), after running iperf3 in different netns, creating 24Mi sockets without data transfer in the root netns causes about 10% performance regression for the iperf3's connection. thash_entries sockets length Gbps 524288 1 1 50.7 24Mi 48 45.1 It is basically related to the length of the list of each hash bucket. For testing purposes to see how performance drops along the length, I set 131072 (1Mi / 8) to thash_entries, and here's the result. thash_entries sockets length Gbps 131072 1 1 50.7 1Mi 8 49.9 2Mi 16 48.9 4Mi 32 47.3 8Mi 64 44.6 16Mi 128 40.6 24Mi 192 36.3 32Mi 256 32.5 40Mi 320 27.0 48Mi 384 25.0 To resolve the socket lookup degradation, we introduce an optional per-netns hash table for TCP, but it's just ehash, and we still share the global bhash, bhash2 and lhash2. With a smaller ehash, we can look up non-listener sockets faster and isolate such noisy neighbours. In addition, we can reduce lock contention. We can control the ehash size by a new sysctl knob. However, depending on workloads, it will require very sensitive tuning, so we disable the feature by default (net.ipv4.tcp_child_ehash_entries == 0). Moreover, we can fall back to using the global ehash in case we fail to allocate enough memory for a new ehash. The maximum size is 16Mi, which is large enough that even if we have 48Mi sockets, the average list length is 3, and regression would be less than 1%. We can check the current ehash size by another read-only sysctl knob, net.ipv4.tcp_ehash_entries. A negative value means the netns shares the global ehash (per-netns ehash is disabled or failed to allocate memory). # dmesg \| cut -d ' ' -f 5- \| grep "established hash" TCP established hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc hugepage) # sysctl net.ipv4.tcp_ehash_entries net.ipv4.tcp_ehash_entries = 524288 # can be changed by thash_entries # sysctl net.ipv4.tcp_child_ehash_entries net.ipv4.tcp_child_ehash_entries = 0 # disabled by default # ip netns add test1 # ip netns exec test1 sysctl net.ipv4.tcp_ehash_entries net.ipv4.tcp_ehash_entries = -524288 # share the global ehash # sysctl -w net.ipv4.tcp_child_ehash_entries=100 net.ipv4.tcp_child_ehash_entries = 100 # ip netns add test2 # ip netns exec test2 sysctl net.ipv4.tcp_ehash_entries net.ipv4.tcp_ehash_entries = 128 # own a per-netns ehash with 2^n buckets When more than two processes in the same netns create per-netns ehash concurrently with different sizes, we need to guarantee the size in one of the following ways: 1) Share the global ehash and create per-netns ehash First, unshare() with tcp_child_ehash_entries==0. It creates dedicated netns sysctl knobs where we can safely change tcp_child_ehash_entries and clone()/unshare() to create a per-netns ehash. 2) Control write on sysctl by BPF We can use BPF_PROG_TYPE_CGROUP_SYSCTL to allow/deny read/write on sysctl knobs. Note that the global ehash allocated at the boot time is spread over available NUMA nodes, but inet_pernet_hashinfo_alloc() will allocate pages for each per-netns ehash depending on the current process's NUMA policy. By default, the allocation is done in the local node only, so the per-netns hash table could fully reside on a random node. Thus, depending on the NUMA policy the netns is created with and the CPU the current thread is running on, we could see some performance differences for highly optimised networking applications. Note also that the default values of two sysctl knobs depend on the ehash size and should be tuned carefully: tcp_max_tw_buckets : tcp_child_ehash_entries / 2 tcp_max_syn_backlog : max(128, tcp_child_ehash_entries / 128) As a bonus, we can dismantle netns faster. Currently, while destroying netns, we call inet_twsk_purge(), which walks through the global ehash. It can be potentially big because it can have many sockets other than TIME_WAIT in all netns. Splitting ehash changes that situation, where it's only necessary for inet_twsk_purge() to clean up TIME_WAIT sockets in each netns. With regard to this, we do not free the per-netns ehash in inet_twsk_kill() to avoid UAF while iterating the per-netns ehash in inet_twsk_purge(). Instead, we do it in tcp_sk_exit_batch() after calling tcp_twsk_purge() to keep it protocol-family-independent. In the future, we could optimise ehash lookup/iteration further by removing netns comparison for the per-netns ehash. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 10:21:50 -07:00
Kuniyuki Iwashima	edc12f032a	tcp: Save unnecessary inet_twsk_purge() calls. While destroying netns, we call inet_twsk_purge() in tcp_sk_exit_batch() and tcpv6_net_exit_batch() for AF_INET and AF_INET6. These commands trigger the kernel to walk through the potentially big ehash twice even though the netns has no TIME_WAIT sockets. # ip netns add test # ip netns del test or # unshare -n /bin/true >/dev/null When tw_refcount is 1, we need not call inet_twsk_purge() at least for the net. We can save such unneeded iterations if all netns in net_exit_list have no TIME_WAIT sockets. This change eliminates the tax by the additional unshare() described in the next patch to guarantee the per-netns ehash size. Tested: # mount -t debugfs none /sys/kernel/debug/ # echo cleanup_net > /sys/kernel/debug/tracing/set_ftrace_filter # echo inet_twsk_purge >> /sys/kernel/debug/tracing/set_ftrace_filter # echo function > /sys/kernel/debug/tracing/current_tracer # cat ./add_del_unshare.sh for i in `seq 1 40` do (for j in `seq 1 100` ; do unshare -n /bin/true >/dev/null ; done) & done wait; # ./add_del_unshare.sh Before the patch: # cat /sys/kernel/debug/tracing/trace_pipe kworker/u128:0-8 [031] ...1. 174.162765: cleanup_net <-process_one_work kworker/u128:0-8 [031] ...1. 174.240796: inet_twsk_purge <-cleanup_net kworker/u128:0-8 [032] ...1. 174.244759: inet_twsk_purge <-tcp_sk_exit_batch kworker/u128:0-8 [034] ...1. 174.290861: cleanup_net <-process_one_work kworker/u128:0-8 [039] ...1. 175.245027: inet_twsk_purge <-cleanup_net kworker/u128:0-8 [046] ...1. 175.290541: inet_twsk_purge <-tcp_sk_exit_batch kworker/u128:0-8 [037] ...1. 175.321046: cleanup_net <-process_one_work kworker/u128:0-8 [024] ...1. 175.941633: inet_twsk_purge <-cleanup_net kworker/u128:0-8 [025] ...1. 176.242539: inet_twsk_purge <-tcp_sk_exit_batch After: # cat /sys/kernel/debug/tracing/trace_pipe kworker/u128:0-8 [038] ...1. 428.116174: cleanup_net <-process_one_work kworker/u128:0-8 [038] ...1. 428.262532: cleanup_net <-process_one_work kworker/u128:0-8 [030] ...1. 429.292645: cleanup_net <-process_one_work Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 10:21:50 -07:00
Kuniyuki Iwashima	4461568aa4	tcp: Access &tcp_hashinfo via net. We will soon introduce an optional per-netns ehash. This means we cannot use tcp_hashinfo directly in most places. Instead, access it via net->ipv4.tcp_death_row.hashinfo. The access will be valid only while initialising tcp_hashinfo itself and creating/destroying each netns. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 10:21:49 -07:00
Kuniyuki Iwashima	429e42c1c5	tcp: Set NULL to sk->sk_prot->h.hashinfo. We will soon introduce an optional per-netns ehash. This means we cannot use the global sk->sk_prot->h.hashinfo to fetch a TCP hashinfo. Instead, set NULL to sk->sk_prot->h.hashinfo for TCP and get a proper hashinfo from net->ipv4.tcp_death_row.hashinfo. Note that we need not use sk->sk_prot->h.hashinfo if DCCP is disabled. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 10:21:49 -07:00
Kuniyuki Iwashima	e9bd0cca09	tcp: Don't allocate tcp_death_row outside of struct netns_ipv4. We will soon introduce an optional per-netns ehash and access hash tables via net->ipv4.tcp_death_row->hashinfo instead of &tcp_hashinfo in most places. It could harm the fast path because dereferences of two fields in net and tcp_death_row might incur two extra cache line misses. To save one dereference, let's place tcp_death_row back in netns_ipv4 and fetch hashinfo via net->ipv4.tcp_death_row"."hashinfo. Note tcp_death_row was initially placed in netns_ipv4, and commit `fbb8295248` ("tcp: allocate tcp_death_row outside of struct netns_ipv4") changed it to a pointer so that we can fire TIME_WAIT timers after freeing net. However, we don't do so after commit `04c494e68a` ("Revert "tcp/dccp: get rid of inet_twsk_purge()""), so we need not define tcp_death_row as a pointer. Also, we move refcount_dec_and_test(&tw_refcount) from tcp_sk_exit() to tcp_sk_exit_batch() as a debug check. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 10:21:49 -07:00
Kuniyuki Iwashima	08eaef9040	tcp: Clean up some functions. This patch adds no functional change and cleans up some functions that the following patches touch around so that we make them tidy and easy to review/revert. The changes are - Keep reverse christmas tree order - Remove unnecessary init of port in inet_csk_find_open_port() - Use req_to_sk() once in reqsk_queue_unlink() - Use sock_net(sk) once in tcp_time_wait() and tcp_v[46]_connect() Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 10:21:49 -07:00
Christophe JAILLET	17df341d35	headers: Remove some left-over license text Remove a left-over from commit `2874c5fd28` ("treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152") There is no need for an empty "License:". Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/0e5ff727626b748238f4b78932f81572143d8f0b.1662896317.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 08:59:18 -07:00
Hangbin Liu	152e8ec776	selftests/bonding: add a test for bonding lladdr target This is a regression test for commit `592335a416` ("bonding: accept unsolicited NA message") and commit `b7f14132bf` ("bonding: use unspecified address if no available link local address"). When the bond interface up and no available link local address, unspecified address(::) is used to send the NS message. The unsolicited NA message should also be accepted for validation. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jonathan Toppins <jtoppins@redhat.com> Link: https://lore.kernel.org/r/20220920033047.173244-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 08:57:23 -07:00
Yang Yingliang	4633b39183	net: mdio: mux-multiplexer: Switch to use dev_err_probe() helper dev_err() can be replace with dev_err_probe() which will check if error code is -EPROBE_DEFER. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220915065043.665138-3-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 08:41:20 -07:00
Yang Yingliang	770aac8dc0	net: mdio: mux-mmioreg: Switch to use dev_err_probe() helper dev_err() can be replace with dev_err_probe() which will check if error code is -EPROBE_DEFER. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220915065043.665138-2-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 08:41:19 -07:00
Yang Yingliang	de0665c871	net: mdio: mux-meson-g12a: Switch to use dev_err_probe() helper dev_err() can be replace with dev_err_probe() which will check if error code is -EPROBE_DEFER. Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220915065043.665138-1-yangyingliang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 08:41:19 -07:00
Biju Das	1089877ada	ravb: Add RZ/G2L MII interface support EMAC IP found on RZ/G2L Gb ethernet supports MII interface. This patch adds support for selecting MII interface mode. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/20220914192604.265859-1-biju.das.jz@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 08:39:46 -07:00
Phil Sutter	a4abfa627c	net: rtnetlink: Enslave device before bringing it up Unlike with bridges, one can't add an interface to a bond and set it up at the same time: \| # ip link set dummy0 down \| # ip link set dummy0 master bond0 up \| Error: Device can not be enslaved while up. Of all drivers with ndo_add_slave callback, bond and team decline if IFF_UP flag is set, vrf cycles the interface (i.e., sets it down and immediately up again) and the others just don't care. Support the common notion of setting the interface up after enslaving it by sorting the operations accordingly. Signed-off-by: Phil Sutter <phil@nwl.cc> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220914150623.24152-1-phil@nwl.cc Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 08:37:44 -07:00
Jakub Kicinski	5f4e25641c	Merge branch 'macb-add-zynqmp-sgmii-dynamic-configuration-support' Radhey Shyam Pandey says: ==================== macb: add zynqmp SGMII dynamic configuration support This patchset add firmware and driver support to do SD/GEM dynamic configuration. In traditional flow GEM secure space configuration is done by FSBL. However in specific usescases like dynamic designs where GEM is not enabled in base vivado design, FSBL skips GEM initialization and we need a mechanism to configure GEM secure space in linux space at runtime. ==================== Link: https://lore.kernel.org/r/1663158796-14869-1-git-send-email-radhey.shyam.pandey@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 08:33:08 -07:00
Radhey Shyam Pandey	32cee78181	net: macb: Add zynqmp SGMII dynamic configuration support Add support for the dynamic configuration which takes care of configuring the GEM secure space configuration registers using EEMI APIs. High level sequence is to: - Check for the PM dynamic configuration support, if no error proceed with GEM dynamic configurations(next steps) otherwise skip the dynamic configuration. - Configure GEM Fixed configurations. - Configure GEM_CLK_CTRL (gemX_sgmii_mode). - Trigger GEM reset. Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Conor Dooley <conor.dooley@microchip.com> (for MPFS) Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 08:33:05 -07:00
Ronak Jain	256dea9134	firmware: xilinx: add support for sd/gem config Add new APIs in firmware to configure SD/GEM registers. Internally it calls PM IOCTL for below SD/GEM register configuration: - SD/EMMC select - SD slot type - SD base clock - SD 8 bit support - SD fixed config - GEM SGMII Mode - GEM fixed config Signed-off-by: Ronak Jain <ronak.jain@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 08:33:04 -07:00
ruanjinjie	53ff251709	xen-netfront: make bounce_skb static The symbol is not used outside of the file, so mark it static. Fixes the following warning: ./drivers/net/xen-netfront.c:676:16: warning: symbol 'bounce_skb' was not declared. Should it be static? Signed-off-by: ruanjinjie <ruanjinjie@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220914064339.49841-1-ruanjinjie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 08:19:40 -07:00
Horatiu Vultur	b324c6e5e0	net: phy: micrel: Add interrupts support for LAN8804 PHY Add support for interrupts for LAN8804 PHY. Tested-by: Michael Walle <michael@walle.cc> # on kontron-kswitch-d10 Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20220913142926.816746-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 08:00:26 -07:00
Jakub Kicinski	c3188dbafa	Merge branch 'sfp-add-support-for-halny-gpon-module' Russell King says: ==================== sfp: add support for HALNy GPON module This series adds support for the HALNy GPON SFP module. In order to do this sensibly, we need a more flexible quirk system, since we need to change the behaviour of the SFP cage driver to ignore the LOS and TX_FAULT signals after module detection. Since we move the SFP quirks into the SFP cage driver, we can use it for the MA5671A and 3FE46541AA modules as well. ==================== Link: https://lore.kernel.org/r/YyDUnvM1b0dZPmmd@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:54:17 -07:00
Russell King (Oracle)	73472c830e	net: sfp: add support for HALNy GPON SFP Add a quirk for the HALNy HL-GSFP module, which appears to have an inverted RX_LOS signal, and maybe uses TX_FAULT as a serial port transmit pin. Rather than use these hardware signals, switch to using software polling for these status signals. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:54:13 -07:00
Russell King (Oracle)	5029be7611	net: sfp: move Huawei MA5671A fixup Move this module over to the new fixup mechanism. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:54:13 -07:00
Russell King (Oracle)	275416754e	net: sfp: move Alcatel Lucent 3FE46541AA fixup Add a new fixup mechanism to the SFP quirks, and use it for this module. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:54:13 -07:00
Russell King (Oracle)	23571c7b96	net: sfp: move quirk handling into sfp.c We need to handle more quirks than just those which affect the link modes of the module. Move the quirk lookup into sfp.c, and pass the quirk to sfp-bus.c Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:54:13 -07:00
Russell King (Oracle)	8475c4b70b	net: sfp: re-implement soft state polling setup Re-implement the decision making for soft state polling. Instead of generating the soft state mask in sfp_soft_start_poll() by looking at which GPIOs are available, record their availability in sfp_sm_mod_probe() in sfp->state_hw_mask. This will then allow us to clear bits in sfp->state_hw_mask in module specific quirks when the hardware signals should not be used, thereby allowing us to switch to using the software state polling. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:54:13 -07:00
Vladimir Oltean	7f32974bdc	dt-bindings: net: dsa: convert ocelot.txt to dt-schema Replace the free-form description of device tree bindings for VSC9959 and VSC9953 with a YAML formatted dt-schema description. This contains more or less the same information, but reworded to be a bit more succint. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Maxim Kochetkov <fido_max@inbox.ru> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220913125806.524314-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:46:09 -07:00
Jakub Kicinski	93ece9a6de	Merge branch 'net-ipa-a-mix-of-cleanups' Alex Elder says: ==================== net: ipa: a mix of cleanups This series contains a set of cleanups done in preparation for a more substantitive upcoming series that reworks how IPA registers and their fields are defined. The first eliminates about half of the possible GSI register constant symbols by removing offset definitions that are not currently required. The next two mainly rearrange code for some common enumerated types. The next one fixes two spots that reuse local variable names in inner scopes when defining offsets. The next adds some additional restrictions on the value held in a register. And the last one just fixes two field mask symbol names so they adhere to the common naming convention. ==================== Link: https://lore.kernel.org/r/20220910011131.1431934-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:45:52 -07:00
Alex Elder	dae4af6bf2	net: ipa: fix two symbol names All field mask symbols are defined with a "_FMASK" suffix, but EOT_COAL_GRANULARITY and DRBIP_ACL_ENABLE are defined without one. Fix that. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:45:47 -07:00
Alex Elder	a14d593724	net: ipa: update sequencer definition constraints Starting with IPA v4.5, replication is done differently from before, and as a result the "replication" portion of the how the sequencer is specified must be zero. Add a check for the configuration data failing that requirement, and only update the sesquencer type value when it's supported. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:45:47 -07:00
Alex Elder	9eefd2fb96	net: ipa: don't reuse variable names In ipa_endpoint_init_hdr(), as well as ipa_endpoint_init_hdr_ext(), a top-level automatic variable named "offset" is used to represent the offset of a register. However, deeper within each of those functions is another definition of a local variable with the same name, representing something else. Scoping rules ensure the result is what was intended, but this variable name reuse is bad practice and makes the code confusing. Fix this by naming the inner variable "off". Use "off" instead of "checksum_offset" in ipa_endpoint_init_cfg() for consistency. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:45:47 -07:00
Alex Elder	8b3cb084b2	net: ipa: move and redefine ipa_version_valid() Move the definition of ipa_version_valid(), making it a static inline function defined together with the enumerated type in "ipa_version.h". Define a new count value in the type. Rename the function to be ipa_version_supported(), and have it return true only if the IPA version supplied is explicitly supported by the driver. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:45:47 -07:00
Alex Elder	bb788de30a	net: ipa: move the definition of gsi_ee_id Move the definition of the gsi_ee_id enumerated type out of "gsi.h" and into "ipa_version.h". That latter header file isolates the definition of the ipa_version enumerated type, allowing it to be included in both IPA and GSI code. We have the same requirement for gsi_ee_id, and moving it here makes it easier to get only that definition without everything else defined in "gsi.h". Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:45:46 -07:00
Alex Elder	5ea4285829	net: ipa: don't define unneeded GSI register offsets Each GSI execution environment (EE) is able to access many of the GSI registers associated with the other EEs. A block of GSI registers is contained within a region of memory, and an EE's register offset can be determined by adding the register's base offset to the product of the EE ID and a fixed constant. Despite this possibility, the AP IPA code never accesses any GSI registers other than its own. So there's no need to define the macros that compute register offsets for other EEs. Redefine the AP access macros to compute the offset the way the more general "any EE" macro would, and get rid of the unneeded macros. Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-09-20 07:45:46 -07:00
Paolo Abeni	01544a272b	Merge branch 'net-ethernet-adi-add-adin1110-support' Alexandru Tachici says: ==================== net: ethernet: adi: Add ADIN1110 support The ADIN1110 is a low power single port 10BASE-T1L MAC-PHY designed for industrial Ethernet applications. It integrates an Ethernet PHY core with a MAC and all the associated analog circuitry, input and output clock buffering. ADIN1110 MAC-PHY encapsulates the ADIN1100 PHY. The PHY registers can be accessed through the MDIO MAC registers. We are registering an MDIO bus with custom read/write in order to let the PHY to be discovered by the PAL. This will let the ADIN1100 Linux driver to probe and take control of the PHY. The ADIN2111 is a low power, low complexity, two-Ethernet ports switch with integrated 10BASE-T1L PHYs and one serial peripheral interface (SPI) port. The device is designed for industrial Ethernet applications using low power constrained nodes and is compliant with the IEEE 802.3cg-2019 Ethernet standard for long reach 10 Mbps single pair Ethernet (SPE). The switch supports various routing configurations between the two Ethernet ports and the SPI host port providing a flexible solution for line, daisy-chain, or ring network topologies. The ADIN2111 supports cable reach of up to 1700 meters with ultra low power consumption of 77 mW. The two PHY cores support the 1.0 V p-p operating mode and the 2.4 V p-p operating mode defined in the IEEE 802.3cg standard. The device integrates the switch, two Ethernet physical layer (PHY) cores with a media access control (MAC) interface and all the associated analog circuitry, and input and output clock buffering. The device also includes internal buffer queues, the SPI and subsystem registers, as well as the control logic to manage the reset and clock control and hardware pin configuration. Access to the PHYs is exposed via an internal MDIO bus. Writes/reads can be performed by reading/writing to the ADIN2111 MDIO registers via SPI. On probe, for each port, a struct net_device is allocated and registered. When both ports are added to the same bridge, the driver will enable offloading of frame forwarding at the hardware level. Driver offers STP support. Normal operation on forwarding state. Allows only frames with the 802.1d DA to be passed to the host when in any of the other states. When both ports of ADIN2111 belong to the same SW bridge a maximum of 12 FDB entries will offloaded by the hardware and are marked as such. ==================== Link: https://lore.kernel.org/r/20220913122629.124546-1-andrei.tachici@stud.acs.upb.ro Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 15:00:33 +02:00
Alexandru Tachici	9fd12e869e	dt-bindings: net: adin1110: Add docs Add bindings for the ADIN1110/2111 MAC-PHY/SWITCH. Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 15:00:31 +02:00
Alexandru Tachici	bc93e19d08	net: ethernet: adi: Add ADIN1110 support The ADIN1110 is a low power single port 10BASE-T1L MAC-PHY designed for industrial Ethernet applications. It integrates an Ethernet PHY core with a MAC and all the associated analog circuitry, input and output clock buffering. ADIN1110 MAC-PHY encapsulates the ADIN1100 PHY. The PHY registers can be accessed through the MDIO MAC registers. We are registering an MDIO bus with custom read/write in order to let the PHY to be discovered by the PAL. This will let the ADIN1100 Linux driver to probe and take control of the PHY. The ADIN2111 is a low power, low complexity, two-Ethernet ports switch with integrated 10BASE-T1L PHYs and one serial peripheral interface (SPI) port. The device is designed for industrial Ethernet applications using low power constrained nodes and is compliant with the IEEE 802.3cg-2019 Ethernet standard for long reach 10 Mbps single pair Ethernet (SPE). The switch supports various routing configurations between the two Ethernet ports and the SPI host port providing a flexible solution for line, daisy-chain, or ring network topologies. The ADIN2111 supports cable reach of up to 1700 meters with ultra low power consumption of 77 mW. The two PHY cores support the 1.0 V p-p operating mode and the 2.4 V p-p operating mode defined in the IEEE 802.3cg standard. The device integrates the switch, two Ethernet physical layer (PHY) cores with a media access control (MAC) interface and all the associated analog circuitry, and input and output clock buffering. The device also includes internal buffer queues, the SPI and subsystem registers, as well as the control logic to manage the reset and clock control and hardware pin configuration. Access to the PHYs is exposed via an internal MDIO bus. Writes/reads can be performed by reading/writing to the ADIN2111 MDIO registers via SPI. On probe, for each port, a struct net_device is allocated and registered. When both ports are added to the same bridge, the driver will enable offloading of frame forwarding at the hardware level. Driver offers STP support. Normal operation on forwarding state. Allows only frames with the 802.1d DA to be passed to the host when in any of the other states. When both ports of ADIN2111 belong to the same SW bridge a maximum of 12 FDB entries will offloaded by the hardware and are marked as such. Co-developed-by: Lennart Franzen <lennart@lfdomain.com> Signed-off-by: Lennart Franzen <lennart@lfdomain.com> Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 15:00:30 +02:00
Alexandru Tachici	875b718ac3	net: phy: adin1100: add PHY IDs of adin1110/adin2111 Add additional PHY IDs for the internal PHYs of adin1110 and adin2111. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 15:00:30 +02:00
Paolo Abeni	cec9d59e89	Merge branch 'seg6-add-next-c-sid-support-for-srv6-end-behavior' Andrea Mayer says: ==================== seg6: add NEXT-C-SID support for SRv6 End behavior The Segment Routing (SR) architecture is based on loose source routing. A list of instructions, called segments, can be added to the packet headers to influence the forwarding and processing of the packets in an SR enabled network. In SRv6 (Segment Routing over IPv6 data plane) [1], the segment identifiers (SIDs) are IPv6 addresses (128 bits) and the segment list (SID List) is carried in the Segment Routing Header (SRH). A segment may correspond to a "behavior" that is executed by a node when the packet is received. The Linux kernel currently supports a large subset of the behaviors described in [2] (e.g., End, End.X, End.T and so on). Some SRv6 scenarios (i.e.: traffic-engineering, fast-rerouting, VPN, mobile network backhaul, etc.) may require a large number of segments (i.e. up to 15). Therefore, reducing the size of the SID List is useful to minimize the impact on MTU (Maximum Transfer Unit) and to enable SRv6 on legacy hardware devices with limited processing power that can suffer from long IPv6 headers. Draft-ietf-spring-srv6-srh-compression [3] extends the SRv6 architecture by providing different mechanisms for the efficient representation (i.e. compression) of the SID List. The NEXT-C-SID mechanism described in [3] offers the possibility of encoding several SRv6 segments within a single 128 bit SID address. Such a SID address is called a Compressed SID Container. In this way, the length of the SID List can be drastically reduced. In some cases, the SRH can be omitted, as the IPv6 Destination Address can carry the whole Segment List, using its compressed representation. The NEXT-C-SID mechanism relies on the "flavors" framework defined in [2]. The flavors represent additional operations that can modify or extend a subset of the existing behaviors. In this patchset we extend the SRv6 Subsystem in order to support the NEXT-C-SID mechanism. In details the patchset is made of: - patch 1/3: add netlink_ext_ack support in parsing SRv6 behavior attributes; - patch 2/3: add NEXT-C-SID support for SRv6 End behavior; - patch 3/3: add selftest for NEXT-C-SID in SRv6 End behavior. The corresponding iproute2 patch for supporting the NEXT-C-SID in SRv6 End behavior is provided in a separated patchset. Comments, improvements and suggestions are always appreciated. [1] - https://datatracker.ietf.org/doc/html/rfc8754 [2] - https://datatracker.ietf.org/doc/html/rfc8986 [3] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression ==================== Link: https://lore.kernel.org/r/20220912171619.16943-1-andrea.mayer@uniroma2.it Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 12:33:24 +02:00
Andrea Mayer	19d6356ab3	selftests: seg6: add selftest for NEXT-C-SID flavor in SRv6 End behavior This selftest is designed for testing the support of NEXT-C-SID flavor for SRv6 End behavior. It instantiates a virtual network composed of several nodes: hosts and SRv6 routers. Each node is realized using a network namespace that is properly interconnected to others through veth pairs. The test considers SRv6 routers implementing IPv4/IPv6 L3 VPNs leveraged by hosts for communicating with each other. Such routers i) apply different SRv6 Policies to the traffic received from connected hosts, considering the IPv4 or IPv6 protocols; ii) use the NEXT-C-SID compression mechanism for encoding several SRv6 segments within a single 128-bit SID address, referred to as a Compressed SID (C-SID) container. The NEXT-C-SID is provided as a "flavor" of the SRv6 End behavior, enabling it to properly process the C-SID containers. The correct execution of the enabled NEXT-C-SID SRv6 End behavior is verified through reachability tests carried out between hosts belonging to the same VPN. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Acked-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 12:33:22 +02:00
Andrea Mayer	848f3c0d47	seg6: add NEXT-C-SID support for SRv6 End behavior The NEXT-C-SID mechanism described in [1] offers the possibility of encoding several SRv6 segments within a single 128 bit SID address. Such a SID address is called a Compressed SID (C-SID) container. In this way, the length of the SID List can be drastically reduced. A SID instantiated with the NEXT-C-SID flavor considers an IPv6 address logically structured in three main blocks: i) Locator-Block; ii) Locator-Node Function; iii) Argument. C-SID container +------------------------------------------------------------------+ \| Locator-Block \|Loc-Node\| Argument \| \| \|Function\| \| +------------------------------------------------------------------+ <--------- B -----------> <- NF -> <------------- A ---------------> (i) The Locator-Block can be any IPv6 prefix available to the provider; (ii) The Locator-Node Function represents the node and the function to be triggered when a packet is received on the node; (iii) The Argument carries the remaining C-SIDs in the current C-SID container. The NEXT-C-SID mechanism relies on the "flavors" framework defined in [2]. The flavors represent additional operations that can modify or extend a subset of the existing behaviors. This patch introduces the support for flavors in SRv6 End behavior implementing the NEXT-C-SID one. An SRv6 End behavior with NEXT-C-SID flavor works as an End behavior but it is capable of processing the compressed SID List encoded in C-SID containers. An SRv6 End behavior with NEXT-C-SID flavor can be configured to support user-provided Locator-Block and Locator-Node Function lengths. In this implementation, such lengths must be evenly divisible by 8 (i.e. must be byte-aligned), otherwise the kernel informs the user about invalid values with a meaningful error code and message through netlink_ext_ack. If Locator-Block and/or Locator-Node Function lengths are not provided by the user during configuration of an SRv6 End behavior instance with NEXT-C-SID flavor, the kernel will choose their default values i.e., 32-bit Locator-Block and 16-bit Locator-Node Function. [1] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression [2] - https://datatracker.ietf.org/doc/html/rfc8986 Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 12:33:22 +02:00
Andrea Mayer	e2a8ecc451	seg6: add netlink_ext_ack support in parsing SRv6 behavior attributes An SRv6 behavior instance can be set up using mandatory and/or optional attributes. In the setup phase, each supplied attribute is parsed and processed. If the parsing operation fails, the creation of the behavior instance stops and an error number/code is reported to the user. In many cases, it is challenging for the user to figure out exactly what happened by relying only on the error code. For this reason, we add the support for netlink_ext_ack in parsing SRv6 behavior attributes. In this way, when an SRv6 behavior attribute is parsed and an error occurs, the kernel can send a message to the userspace describing the error through a meaningful text message in addition to the classic error code. Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 12:33:22 +02:00
Richard Gobert	cb628a9a7e	net-next: gro: Fix use of skb_gro_header_slow In the cited commit, the function ipv6_gro_receive was accidentally changed to use skb_gro_header_slow, without attempting the fast path. Fix it. Fixes: `35ffb66547` ("net: gro: skb_gro_header helper function") Signed-off-by: Richard Gobert <richardbgobert@gmail.com> Link: https://lore.kernel.org/r/20220911184835.GA105063@debian Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 11:47:25 +02:00
Nathan Chancellor	2e50e9bf32	net/mlx5e: Ensure macsec_rule is always initiailized in macsec_fs_{r,t}x_add_rule() Clang warns: drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:539:6: error: variable 'macsec_rule' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (err) ^~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:598:9: note: uninitialized use occurs here return macsec_rule; ^~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:539:2: note: remove the 'if' if its condition is always false if (err) ^~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:523:38: note: initialize the variable 'macsec_rule' to silence this warning union mlx5e_macsec_rule macsec_rule; ^ = NULL drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1131:6: error: variable 'macsec_rule' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] if (err) ^~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1215:9: note: uninitialized use occurs here return macsec_rule; ^~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1131:2: note: remove the 'if' if its condition is always false if (err) ^~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec_fs.c:1118:38: note: initialize the variable 'macsec_rule' to silence this warning union mlx5e_macsec_rule macsec_rule; ^ = NULL 2 errors generated. If macsec_fs_{r,t}x_ft_get() fail, macsec_rule will be uninitialized. Initialize it to NULL at the top of each function so that it cannot be used uninitialized. Fixes: `e467b283ff` ("net/mlx5e: Add MACsec TX steering rules") Fixes: `3b20949cb2` ("net/mlx5e: Add MACsec RX steering rules") Link: https://github.com/ClangBuiltLinux/linux/issues/1706 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Raed Salem <raeds@nvidia.com> Link: https://lore.kernel.org/r/20220911085748.461033-1-nathan@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 11:32:49 +02:00
Paolo Abeni	e8b9f0da92	Merge branch 'dsa-changes-for-multiple-cpu-ports-part-4' Vladimir Oltean says: ==================== DSA changes for multiple CPU ports (part 4) Those who have been following part 1: https://patchwork.kernel.org/project/netdevbpf/cover/20220511095020.562461-1-vladimir.oltean@nxp.com/ part 2: https://patchwork.kernel.org/project/netdevbpf/cover/20220521213743.2735445-1-vladimir.oltean@nxp.com/ and part 3: https://patchwork.kernel.org/project/netdevbpf/cover/20220819174820.3585002-1-vladimir.oltean@nxp.com/ will know that I am trying to enable the second internal port pair from the NXP LS1028A Felix switch for DSA-tagged traffic via "ocelot-8021q". This series represents the final part of that effort. We have: - the introduction of new UAPI in the form of IFLA_DSA_MASTER, the iproute2 patch for which is here: https://patchwork.kernel.org/project/netdevbpf/patch/20220904190025.813574-1-vladimir.oltean@nxp.com/ - preparation for LAG DSA masters in terms of suppressing some operations for masters in the DSA core that simply don't make sense when those masters are a bonding/team interface - handling all the net device events that occur between DSA and a LAG DSA master, including migration to a different DSA master when the current master joins a LAG, or the LAG gets destroyed - updating documentation - adding an implementation for NXP LS1028A, where things are insanely complicated due to hardware limitations. We have 2 tagging protocols: * the native "ocelot" protocol (NPI port mode). This does not support CPU ports in a LAG, and supports a single DSA master. The DSA master can be changed between eno2 (2.5G) and eno3 (1G), but all ports must be down during the changing process, and user ports assigned to the old DSA master will refuse to come up if the user requests that during a "transient" state. * the "ocelot-8021q" software-defined protocol, where the Ethernet ports connected to the CPU are not actually "god mode" ports as far as the hardware is concerned. So here, static assignment between user and CPU ports is possible by editing the PGID_SRC masks for the port-based forwarding matrix, and "CPU ports in a LAG" simply means "a LAG like any other". The series was regression-tested on LS1028A using the local_termination.sh kselftest, in most of the possible operating modes and tagging protocols. I have not done a detailed performance evaluation yet, but using LAG, is possible to exceed the termination bandwidth of a single CPU port in an iperf3 test with multiple senders and multiple receivers. v1 at: https://patchwork.kernel.org/project/netdevbpf/cover/20220830195932.683432-1-vladimir.oltean@nxp.com/ Previous (older) RFC at: https://lore.kernel.org/netdev/20220523104256.3556016-1-olteanv@gmail.com/ ==================== Link: https://lore.kernel.org/r/20220911010706.2137967-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 10:32:39 +02:00
Vladimir Oltean	eca70102cf	net: dsa: felix: add support for changing DSA master Changing the DSA master means different things depending on the tagging protocol in use. For NPI mode ("ocelot" and "seville"), there is a single port which can be configured as NPI, but DSA only permits changing the CPU port affinity of user ports one by one. So changing a user port to a different NPI port globally changes what the NPI port is, and breaks the user ports still using the old one. To address this while still permitting the change of the NPI port, require that the user ports which are still affine to the old NPI port are down, and cannot be brought up until they are all affine to the same NPI port. The tag_8021q mode ("ocelot-8021q") is more flexible, in that each user port can be freely assigned to one CPU port or to the other. This works by filtering host addresses towards both tag_8021q CPU ports, and then restricting the forwarding from a certain user port only to one of the two tag_8021q CPU ports. Additionally, the 2 tag_8021q CPU ports can be placed in a LAG. This works by enabling forwarding via PGID_SRC from a certain user port towards the logical port ID containing both tag_8021q CPU ports, but then restricting forwarding per packet, via the LAG hash codes in PGID_AGGR, to either one or the other. When we change the DSA master to a LAG device, DSA guarantees us that the LAG has at least one lower interface as a physical DSA master. But DSA masters can come and go as lowers of that LAG, and ds->ops->port_change_master() will not get called, because the DSA master is still the same (the LAG). So we need to hook into the ds->ops->port_lag_{join,leave} calls on the CPU ports and update the logical port ID of the LAG that user ports are assigned to. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 10:32:36 +02:00
Vladimir Oltean	0773e3a851	docs: net: dsa: update information about multiple CPU ports DSA now supports multiple CPU ports, explain the use cases that are covered, the new UAPI, the permitted degrees of freedom, the driver API, and remove some old "hanging fruits". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 10:32:36 +02:00
Vladimir Oltean	acc43b7bf5	net: dsa: allow masters to join a LAG There are 2 ways in which a DSA user port may become handled by 2 CPU ports in a LAG: (1) its current DSA master joins a LAG ip link del bond0 && ip link add bond0 type bond mode 802.3ad ip link set eno2 master bond0 When this happens, all user ports with "eno2" as DSA master get automatically migrated to "bond0" as DSA master. (2) it is explicitly configured as such by the user # Before, the DSA master was eno3 ip link set swp0 type dsa master bond0 The design of this configuration is that the LAG device dynamically becomes a DSA master through dsa_master_setup() when the first physical DSA master becomes a LAG slave, and stops being so through dsa_master_teardown() when the last physical DSA master leaves. A LAG interface is considered as a valid DSA master only if it contains existing DSA masters, and no other lower interfaces. Therefore, we mainly rely on method (1) to enter this configuration. Each physical DSA master (LAG slave) retains its dev->dsa_ptr for when it becomes a standalone DSA master again. But the LAG master also has a dev->dsa_ptr, and this is actually duplicated from one of the physical LAG slaves, and therefore needs to be balanced when LAG slaves come and go. To the switch driver, putting DSA masters in a LAG is seen as putting their associated CPU ports in a LAG. We need to prepare cross-chip host FDB notifiers for CPU ports in a LAG, by calling the driver's ->lag_fdb_add method rather than ->port_fdb_add. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 10:32:36 +02:00
Vladimir Oltean	2e359b00a1	net: dsa: propagate extack to port_lag_join Drivers could refuse to offload a LAG configuration for a variety of reasons, mainly having to do with its TX type. Additionally, since DSA masters may now also be LAG interfaces, and this will translate into a call to port_lag_join on the CPU ports, there may be extra restrictions there. Propagate the netlink extack to this DSA method in order for drivers to give a meaningful error message back to the user. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 10:32:36 +02:00
Vladimir Oltean	13eccc1bbb	net: dsa: suppress device links to LAG DSA masters These don't work (print a harmless error about the operation failing) and make little sense to have anyway, because when a LAG DSA master goes away, we will introduce logic to move our CPU port back to the first physical DSA master. So suppress these device links in preparation for adding support for LAG DSA masters. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 10:32:36 +02:00
Vladimir Oltean	cfeb84a52f	net: dsa: suppress appending ethtool stats to LAG DSA masters Similar to the discussion about tracking the admin/oper state of LAG DSA masters, we have the problem here that struct dsa_port *cpu_dp caches a single pair of orig_ethtool_ops and netdev_ops pointers. So if we call dsa_master_setup(bond0, cpu_dp) where cpu_dp is also the dev->dsa_ptr of one of the physical DSA masters, we'd effectively overwrite what we cached from that physical netdev with what replaced from the bonding interface. We don't need DSA ethtool stats on the bonding interface when used as DSA master, it's good enough to have them just on the physical DSA masters, so suppress this logic. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 10:32:36 +02:00
Vladimir Oltean	6e61b55c6d	net: dsa: don't keep track of admin/oper state on LAG DSA masters We store information about the DSA master's state in cpu_dp->master_admin_up and cpu_dp->master_oper_up, and this assumes a bijective association between a CPU port and a DSA master. However, when we have CPU ports in a LAG (and DSA masters in a LAG too), the way in which we set up things is that the physical DSA masters still have dev->dsa_ptr pointing to our cpu_dp, but the bonding/team device itself also has its dev->dsa_ptr pointing towards one of the CPU port structures (the first one). So logically speaking, that first cpu_dp can't keep track of both the physical master's admin/oper state, and of the bonding master's state. This isn't even needed; the reason why we keep track of the DSA master's state is to know when it is available for Ethernet-based register access. For that use case, we don't even need LAG; we just need to decide upon one of the physical DSA masters (if there is more than 1 available) and use that. This change suppresses dsa_tree_master_{admin,oper}_state_change() calls on LAG DSA masters (which will be supported in a future change), to allow the tracking of just physical DSA masters. Link: https://lore.kernel.org/netdev/628cc94d.1c69fb81.15b0d.422d@mx.google.com/ Suggested-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-09-20 10:32:35 +02:00

1 2 3 4 5 ...

1123748 Commits