android_kernel_samsung_sm8650

Author	SHA1	Message	Date
Nikolay Aleksandrov	bbc6f2cca7	net: bridge: fix br_multicast_is_router stub when igmp is disabled br_multicast_is_router takes two arguments when bridge IGMP is enabled and just one when it's disabled, fix the stub to take two as well. Fixes: `1a3065a268` ("net: bridge: mcast: prepare is-router function for mcast router split") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-14 10:35:20 -07:00
Linus Lüssing	3b85f9ba34	net: bridge: mcast: export multicast router presence adjacent to a port To properly support routable multicast addresses in batman-adv in a group-aware way, a batman-adv node needs to know if it serves multicast routers. This adds a function to the bridge to export this so that batman-adv can then make full use of the Multicast Router Discovery capability of the bridge. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-13 14:04:31 -07:00
Linus Lüssing	b7fb091654	net: bridge: mcast: add ip4+ip6 mcast router timers to mdb netlink Now that we have split the multicast router state into two, one for IPv4 and one for IPv6, also add individual timers to the mdb netlink router port dump. Leaving the old timer attribute for backwards compatibility. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-13 14:04:31 -07:00
Linus Lüssing	a3c02e769e	net: bridge: mcast: split multicast router state for IPv4 and IPv6 A multicast router for IPv4 does not imply that the same host also is a multicast router for IPv6 and vice versa. To reduce multicast traffic when a host is only a multicast router for one of these two protocol families, keep router state for IPv4 and IPv6 separately. Similar to how querier state is kept separately. For backwards compatibility for netlink and switchdev notifications these two will still only notify if a port switched from either no IPv4/IPv6 multicast router to any IPv4/IPv6 multicast router or the other way round. However a full netlink MDB router dump will now also include a multicast router timeout for both IPv4 and IPv6. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-13 14:04:31 -07:00
Linus Lüssing	ed2d35971a	net: bridge: mcast: split router port del+notify for mcast router split In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants split router port deletion and notification into two functions. When we disable a port for instance later we want to only send one notification to switchdev and netlink for compatibility and want to avoid sending one for IPv4 and one for IPv6. For that the split is needed. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-13 14:04:31 -07:00
Linus Lüssing	d9b8c4d8d9	net: bridge: mcast: prepare add-router function for mcast router split In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants move the protocol specific router list and timer access to ip4 wrapper functions. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-13 14:04:31 -07:00
Linus Lüssing	ee5fb2223e	net: bridge: mcast: prepare expiry functions for mcast router split In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants move the protocol specific timer access to an ip4 wrapper function. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-13 14:04:31 -07:00
Linus Lüssing	1a3065a268	net: bridge: mcast: prepare is-router function for mcast router split In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants make br_multicast_is_router() protocol family aware. Note that for now br_ip6_multicast_is_router() uses the currently still common ip4_mc_router_timer for now. It will be renamed to ip6_mc_router_timer later when the split is performed. While at it also renames the "1" and "2" constants in br_multicast_is_router() to the MDB_RTR_TYPE_TEMP_QUERY and MDB_RTR_TYPE_PERM enums. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-13 14:04:31 -07:00
Linus Lüssing	b19232effd	net: bridge: mcast: prepare query reception for mcast router split In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants and as the br_multicast_mark_router() will be split for that remove the select querier wrapper and instead add ip4 and ip6 variants for br_multicast_query_received(). Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-13 14:04:31 -07:00
Linus Lüssing	ff391c5d98	net: bridge: mcast: prepare mdb netlink for mcast router split In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants and to avoid IPv6 #ifdef clutter later add some inline functions for the protocol specific parts in the mdb router netlink code. Also the we need iterate over the port instead of router list to be able put one router port entry with both the IPv4 and IPv6 multicast router info later. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-13 14:04:31 -07:00
Linus Lüssing	44ebb081dc	net: bridge: mcast: add wrappers for router node retrieval In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants and to avoid IPv6 #ifdef clutter later add two wrapper functions for router node retrieval in the payload forwarding code. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-13 14:04:31 -07:00
Linus Lüssing	ce6f709775	net: bridge: mcast: rename multicast router lists and timers In preparation for the upcoming split of multicast router state into their IPv4 and IPv6 variants, rename the affected variable to the IPv4 version first to avoid some renames in later commits. Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-05-13 14:04:31 -07:00
Zhang Zhengming	59259ff7a8	bridge: Fix possible races between assigning rx_handler_data and setting IFF_BRIDGE_PORT bit There is a crash in the function br_get_link_af_size_filtered, as the port_exists(dev) is true and the rx_handler_data of dev is NULL. But the rx_handler_data of dev is correct saved in vmcore. The oops looks something like: ... pc : br_get_link_af_size_filtered+0x28/0x1c8 [bridge] ... Call trace: br_get_link_af_size_filtered+0x28/0x1c8 [bridge] if_nlmsg_size+0x180/0x1b0 rtnl_calcit.isra.12+0xf8/0x148 rtnetlink_rcv_msg+0x334/0x370 netlink_rcv_skb+0x64/0x130 rtnetlink_rcv+0x28/0x38 netlink_unicast+0x1f0/0x250 netlink_sendmsg+0x310/0x378 sock_sendmsg+0x4c/0x70 __sys_sendto+0x120/0x150 __arm64_sys_sendto+0x30/0x40 el0_svc_common+0x78/0x130 el0_svc_handler+0x38/0x78 el0_svc+0x8/0xc In br_add_if(), we found there is no guarantee that assigning rx_handler_data to dev->rx_handler_data will before setting the IFF_BRIDGE_PORT bit of priv_flags. So there is a possible data competition: CPU 0: CPU 1: (RCU read lock) (RTNL lock) rtnl_calcit() br_add_slave() if_nlmsg_size() br_add_if() br_get_link_af_size_filtered() -> netdev_rx_handler_register ... // The order is not guaranteed ... -> dev->priv_flags \|= IFF_BRIDGE_PORT; // The IFF_BRIDGE_PORT bit of priv_flags has been set -> if (br_port_exists(dev)) { // The dev->rx_handler_data has NOT been assigned -> p = br_port_get_rcu(dev); .... -> rcu_assign_pointer(dev->rx_handler_data, rx_handler_data); ... Fix it in br_get_link_af_size_filtered, using br_port_get_check_rcu() and checking the return value. Signed-off-by: Zhang Zhengming <zhangzhengming@huawei.com> Reviewed-by: Zhao Lei <zhaolei69@huawei.com> Reviewed-by: Wang Xiaogang <wangxiaogang3@huawei.com> Suggested-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-29 15:33:17 -07:00
Linus Lüssing	9901408815	net: bridge: mcast: fix broken length + header check for MRDv6 Adv. The IPv6 Multicast Router Advertisements parsing has the following two issues: For one thing, ICMPv6 MRD Advertisements are smaller than ICMPv6 MLD messages (ICMPv6 MRD Adv.: 8 bytes vs. ICMPv6 MLDv1/2: >= 24 bytes, assuming MLDv2 Reports with at least one multicast address entry). When ipv6_mc_check_mld_msg() tries to parse an Multicast Router Advertisement its MLD length check will fail - and it will wrongly return -EINVAL, even if we have a valid MRD Advertisement. With the returned -EINVAL the bridge code will assume a broken packet and will wrongly discard it, potentially leading to multicast packet loss towards multicast routers. The second issue is the MRD header parsing in br_ip6_multicast_mrd_rcv(): It wrongly checks for an ICMPv6 header immediately after the IPv6 header (IPv6 next header type). However according to RFC4286, section 2 all MRD messages contain a Router Alert option (just like MLD). So instead there is an IPv6 Hop-by-Hop option for the Router Alert between the IPv6 and ICMPv6 header, again leading to the bridge wrongly discarding Multicast Router Advertisements. To fix these two issues, introduce a new return value -ENODATA to ipv6_mc_check_mld() to indicate a valid ICMPv6 packet with a hop-by-hop option which is not an MLD but potentially an MRD packet. This also simplifies further parsing in the bridge code, as ipv6_mc_check_mld() already fully checks the ICMPv6 header and hop-by-hop option. These issues were found and fixed with the help of the mrdisc tool (https://github.com/troglobit/mrdisc). Fixes: `4b3087c7e3` ("bridge: Snoop Multicast Router Advertisements") Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-27 14:02:06 -07:00
Florian Westphal	47a6959fa3	netfilter: allow to turn off xtables compat layer The compat layer needs to parse untrusted input (the ruleset) to translate it to a 64bit compatible format. We had a number of bugs in this department in the past, so allow users to turn this feature off. Add CONFIG_NETFILTER_XTABLES_COMPAT kconfig knob and make it default to y to keep existing behaviour. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-26 18:16:56 +02:00
Florian Westphal	4c95e0728e	netfilter: ebtables: remove the 3 ebtables pointers from struct net ebtables stores the table internal data (what gets passed to the ebt_do_table() interpreter) in struct net. nftables keeps the internal interpreter format in pernet lists and passes it via the netfilter core infrastructure (priv pointer). Do the same for ebtables: the nf_hook_ops are duplicated via kmemdup, then the ops->priv pointer is set to the table that is being registered. After that, the netfilter core passes this table info to the hookfn. This allows to remove the pointers from struct net. Same pattern can be applied to ip/ip6/arptables. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-26 03:20:07 +02:00
Vladimir Oltean	68f5c12abb	net: bridge: fix error in br_multicast_add_port when CONFIG_NET_SWITCHDEV=n When CONFIG_NET_SWITCHDEV is disabled, the shim for switchdev_port_attr_set inside br_mc_disabled_update returns -EOPNOTSUPP. This is not caught, and propagated to the caller of br_multicast_add_port, preventing ports from joining the bridge. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: `ae1ea84b33` ("net: bridge: propagate error code and extack from br_mc_disabled_update") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-21 13:13:33 -07:00
Jakub Kicinski	8203c7ce4e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/stmicro/stmmac/stmmac_main.c - keep the ZC code, drop the code related to reinit net/bridge/netfilter/ebtables.c - fix build after move to net_generic Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-04-17 11:08:07 -07:00
Vladimir Oltean	2c4eca3ef7	net: bridge: switchdev: include local flag in FDB notifications As explained in bugfix commit `6ab4c3117a` ("net: bridge: don't notify switchdev for local FDB addresses") as well as in this discussion: https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/ the switchdev notifiers for FDB entries managed to have a zero-day bug, which was that drivers would not know what to do with local FDB entries, because they were not told that they are local. The bug fix was to simply not notify them of those addresses. Let us now add the 'is_local' bit to bridge FDB entries, and make all drivers ignore these entries by their own choice. Co-developed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 15:15:45 -07:00
Tobias Waldekranz	e5b4b8988b	net: bridge: switchdev: refactor br_switchdev_fdb_notify Instead of having to add more and more arguments to br_switchdev_fdb_call_notifiers, get rid of it and build the info struct directly in br_switchdev_fdb_notify. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-16 15:15:45 -07:00
Florian Fainelli	ae1ea84b33	net: bridge: propagate error code and extack from br_mc_disabled_update Some Ethernet switches might only be able to support disabling multicast snooping globally, which is an issue for example when several bridges span the same physical device and request contradictory settings. Propagate the return value of br_mc_disabled_update() such that this limitation is transmitted correctly to user-space. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-04-14 14:32:05 -07:00
Florian Westphal	7ee3c61dcd	netfilter: bridge: add pre_exit hooks for ebtable unregistration Just like ip/ip6/arptables, the hooks have to be removed, then synchronize_rcu() has to be called to make sure no more packets are being processed before the ruleset data is released. Place the hook unregistration in the pre_exit hook, then call the new ebtables pre_exit function from there. Years ago, when first netns support got added for netfilter+ebtables, this used an older (now removed) netfilter hook unregister API, that did a unconditional synchronize_rcu(). Now that all is done with call_rcu, ebtable_{filter,nat,broute} pernet exit handlers may free the ebtable ruleset while packets are still in flight. This can only happens on module removal, not during netns exit. The new function expects the table name, not the table struct. This is because upcoming patch set (targeting -next) will remove all net->xt.{nat,filter,broute}_table instances, this makes it necessary to avoid external references to those member variables. The existing APIs will be converted, so follow the upcoming scheme of passing name + hook type instead. Fixes: `aee12a0a37` ("ebtables: remove nf_hook_register usage") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-10 21:16:54 +02:00
Florian Westphal	5b53951cfc	netfilter: ebtables: use net_generic infra ebtables currently uses net->xt.tables[BRIDGE], but upcoming patch will move net->xt.tables away from struct net. To avoid exposing x_tables internals to ebtables, use a private list instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-04-06 00:34:52 +02:00
Florian Westphal	77ccee96a6	netfilter: nf_log_bridge: merge with nf_log_syslog Provide bridge log support from nf_log_syslog. After the merge there is no need to load the "real packet loggers", all of them now reside in the same module. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-03-31 22:34:05 +02:00
David S. Miller	efd13b71a3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-25 15:31:22 -07:00
Felix Fietkau	26267bf9bb	netfilter: flowtable: bridge vlan hardware offload and switchdev The switch might have already added the VLAN tag through PVID hardware offload. Keep this extra VLAN in the flowtable but skip it on egress. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:39 -07:00
Felix Fietkau	bcf2766b13	net: bridge: resolve forwarding path for VLAN tag actions in bridge devices Depending on the VLAN settings of the bridge and the port, the bridge can either add or remove a tag. When vlan filtering is enabled, the fdb lookup also needs to know the VLAN tag/proto for the destination address To provide this, keep track of the stack of VLAN tags for the path in the lookup context Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:38 -07:00
Pablo Neira Ayuso	ec9d16bab6	net: bridge: resolve forwarding path for bridge devices Add .ndo_fill_forward_path for bridge devices. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:48:38 -07:00
Colin Ian King	ad248f7761	net: bridge: Fix missing return assignment from br_vlan_replay_one call The call to br_vlan_replay_one is returning an error return value but this is not being assigned to err and the following check on err is currently always false because err was initialized to zero. Fix this by assigning err. Addresses-Coverity: ("'Constant' variable guards dead code") Fixes: `22f67cdfae` ("net: bridge: add helper to replay VLANs installed on port") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:45:48 -07:00
Horatiu Vultur	b3cb91b97c	bridge: mrp: Disable roles before deleting the MRP instance When an MRP instance was created, the driver was notified that the instance is created and then in a different callback about role of the instance. But when the instance was deleted the driver was notified only that the MRP instance is deleted and not also that the role is disabled. This patch make sure that the driver is notified that the role is changed to disabled before the MRP instance is deleted to have similar callbacks with the creating of the instance. In this way it would simplify the logic in the drivers. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-24 12:14:08 -07:00
Vladimir Oltean	22f67cdfae	net: bridge: add helper to replay VLANs installed on port Currently this simple setup with DSA: ip link add br0 type bridge vlan_filtering 1 ip link add bond0 type bond ip link set bond0 master br0 ip link set swp0 master bond0 will not work because the bridge has created the PVID in br_add_if -> nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1, but that was too early, since swp0 was not yet a lower of bond0, so it had no reason to act upon that notification. We need a helper in the bridge to replay the switchdev VLAN objects that were notified since the bridge port creation, because some of them may have been missed. As opposed to the br_mdb_replay function, the vg->vlan_list write side protection is offered by the rtnl_mutex which is sleepable, so we don't need to queue up the objects in atomic context, we can replay them right away. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:05 -07:00
Vladimir Oltean	04846f903b	net: bridge: add helper to replay port and local fdb entries When a switchdev port starts offloading a LAG that is already in a bridge and has an FDB entry pointing to it: ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp0 master bond0 the switchdev driver will have no idea that this FDB entry is there, because it missed the switchdev event emitted at its creation. Ido Schimmel pointed this out during a discussion about challenges with switchdev offloading of stacked interfaces between the physical port and the bridge, and recommended to just catch that condition and deny the CHANGEUPPER event: https://lore.kernel.org/netdev/20210210105949.GB287766@shredder.lan/ But in fact, we might need to deal with the hard thing anyway, which is to replay all FDB addresses relevant to this port, because it isn't just static FDB entries, but also local addresses (ones that are not forwarded but terminated by the bridge). There, we can't just say 'oh yeah, there was an upper already so I'm not joining that'. So, similar to the logic for replaying MDB entries, add a function that must be called by individual switchdev drivers and replays local FDB entries as well as ones pointing towards a bridge port. This time, we use the atomic switchdev notifier block, since that's what FDB entries expect for some reason. Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:05 -07:00
Vladimir Oltean	4f2673b3a2	net: bridge: add helper to replay port and host-joined mdb entries I have a system with DSA ports, and udhcpcd is configured to bring interfaces up as soon as they are created. I create a bridge as follows: ip link add br0 type bridge As soon as I create the bridge and udhcpcd brings it up, I also have avahi which automatically starts sending IPv6 packets to advertise some local services, and because of that, the br0 bridge joins the following IPv6 groups due to the code path detailed below: 33:33:ff:6d:c1:9c vid 0 33:33:00:00:00:6a vid 0 33:33:00:00:00:fb vid 0 br_dev_xmit -> br_multicast_rcv -> br_ip6_multicast_add_group -> __br_multicast_add_group -> br_multicast_host_join -> br_mdb_notify This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host hooked up, and switchdev will attempt to offload the host joined groups to an empty list of ports. Of course nobody offloads them. Then when we add a port to br0: ip link set swp0 master br0 the bridge doesn't replay the host-joined MDB entries from br_add_if, and eventually the host joined addresses expire, and a switchdev notification for deleting it is emitted, but surprise, the original addition was already completely missed. The strategy to address this problem is to replay the MDB entries (both the port ones and the host joined ones) when the new port joins the bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can be populated and only then attached to a bridge that you offload). However there are 2 possibilities: the addresses can be 'pushed' by the bridge into the port, or the port can 'pull' them from the bridge. Considering that in the general case, the new port can be really late to the party, and there may have been many other switchdev ports that already received the initial notification, we would like to avoid delivering duplicate events to them, since they might misbehave. And currently, the bridge calls the entire switchdev notifier chain, whereas for replaying it should just call the notifier block of the new guy. But the bridge doesn't know what is the new guy's notifier block, it just knows where the switchdev notifier chain is. So for simplification, we make this a driver-initiated pull for now, and the notifier block is passed as an argument. To emulate the calling context for mdb objects (deferred and put on the blocking notifier chain), we must iterate under RCU protection through the bridge's mdb entries, queue them, and only call them once we're out of the RCU read-side critical section. There was some opportunity for reuse between br_mdb_switchdev_host_port, br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev mdb object is created, so a helper was created. Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:05 -07:00
Vladimir Oltean	f1d42ea100	net: bridge: add helper to retrieve the current ageing time The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from: sysfs/ioctl/netlink -> br_set_ageing_time -> __set_ageing_time therefore not at bridge port creation time, so: (a) switchdev drivers have to hardcode the initial value for the address ageing time, because they didn't get any notification (b) that hardcoded value can be out of sync, if the user changes the ageing time before enslaving the port to the bridge We need a helper in the bridge, such that switchdev drivers can query the current value of the bridge ageing time when they start offloading it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:05 -07:00
Vladimir Oltean	c0e715bbd5	net: bridge: add helper for retrieving the current bridge port STP state It may happen that we have the following topology with DSA or any other switchdev driver with LAG offload: ip link add br0 type bridge stp_state 1 ip link add bond0 type bond ip link set bond0 master br0 ip link set swp0 master bond0 ip link set swp1 master bond0 STP decides that it should put bond0 into the BLOCKING state, and that's that. The ports that are actively listening for the switchdev port attributes emitted for the bond0 bridge port (because they are offloading it) and have the honor of seeing that switchdev port attribute can react to it, so we can program swp0 and swp1 into the BLOCKING state. But if then we do: ip link set swp2 master bond0 then as far as the bridge is concerned, nothing has changed: it still has one bridge port. But this new bridge port will not see any STP state change notification and will remain FORWARDING, which is how the standalone code leaves it in. We need a function in the bridge driver which retrieves the current STP state, such that drivers can synchronize to it when they may have missed switchdev events. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:49:05 -07:00
Vladimir Oltean	6ab4c3117a	net: bridge: don't notify switchdev for local FDB addresses As explained in this discussion: https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/ the switchdev notifiers for FDB entries managed to have a zero-day bug. The bridge would not say that this entry is local: ip link add br0 type bridge ip link set swp0 master br0 bridge fdb add dev swp0 00:01:02:03:04:05 master local and the switchdev driver would be more than happy to offload it as a normal static FDB entry. This is despite the fact that 'local' and non-'local' entries have completely opposite directions: a local entry is locally terminated and not forwarded, whereas a static entry is forwarded and not locally terminated. So, for example, DSA would install this entry on swp0 instead of installing it on the CPU port as it should. There is an even sadder part, which is that the 'local' flag is implicit if 'static' is not specified, meaning that this command produces the same result of adding a 'local' entry: bridge fdb add dev swp0 00:01:02:03:04:05 master I've updated the man pages for 'bridge', and after reading it now, it should be pretty clear to any user that the commands above were broken and should have never resulted in the 00:01:02:03:04:05 address being forwarded (this behavior is coherent with non-switchdev interfaces): https://patchwork.kernel.org/project/netdevbpf/cover/20210211104502.2081443-1-olteanv@gmail.com/ If you're a user reading this and this is what you want, just use: bridge fdb add dev swp0 00:01:02:03:04:05 master static Because switchdev should have given drivers the means from day one to classify FDB entries as local/non-local, but didn't, it means that all drivers are currently broken. So we can just as well omit the switchdev notifications for local FDB entries, which is exactly what this patch does to close the bug in stable trees. For further development work where drivers might want to trap the local FDB entries to the host, we can add a 'bool is_local' to br_switchdev_fdb_call_notifiers(), and selectively make drivers act upon that bit, while all the others ignore those entries if the 'is_local' bit is set. Fixes: `6b26b51b1d` ("net: bridge: Add support for notifying devices about FDB add/del") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-23 14:39:41 -07:00
Nikolay Aleksandrov	0353b4a96b	net: bridge: when suppression is enabled exclude RARP packets Recently we had an interop issue where RARP packets got suppressed with bridge neigh suppression enabled, but the check in the code was meant to suppress GARP. Exclude RARP packets from it which would allow some VMWare setups to work, to quote the report: "Those RARP packets usually get generated by vMware to notify physical switches when vMotion occurs. vMware may use random sip/tip or just use sip=tip=0. So the RARP packet sometimes get properly flooded by the vtep and other times get dropped by the logic" Reported-by: Amer Abdalamer <amer@nvidia.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 13:30:24 -07:00
Vladimir Oltean	f5fcca89f5	net: bridge: declare br_vlan_tunnel_lookup argument tunnel_id as __be64 The only caller of br_vlan_tunnel_lookup, br_handle_ingress_vlan_tunnel, extracts the tunnel_id from struct ip_tunnel_info::struct ip_tunnel_key:: tun_id which is a __be64 value. The exact endianness does not seem to matter, because the tunnel id is just used as a lookup key for the VLAN group's tunnel hash table, and the value is not interpreted directly per se. Moreover, rhashtable_lookup_fast treats the key argument as a const void *. Therefore, there is no functional change associated with this patch, just one to silence "make W=1" builds. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-22 13:11:59 -07:00
Nikolay Aleksandrov	e09cf58205	net: bridge: mcast: factor out common allow/block EHT handling We hande EHT state change for ALLOW messages in INCLUDE mode and for BLOCK messages in EXCLUDE mode similarly - create the new set entries with the proper filter mode. We also handle EHT state change for ALLOW messages in EXCLUDE mode and for BLOCK messages in INCLUDE mode in a similar way - delete the common entries (current set and new set). Factor out all the common code as follows: - ALLOW/INCLUDE, BLOCK/EXCLUDE: call __eht_create_set_entries() - ALLOW/EXCLUDE, BLOCK/INCLUDE: call __eht_del_common_set_entries() The set entries creation can be reused in __eht_inc_exc() as well. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 11:57:57 -07:00
Nikolay Aleksandrov	6aa2c371c7	net: bridge: mcast: remove unreachable EHT code In the initial EHT versions there were common functions which handled allow/block messages for both INCLUDE and EXCLUDE modes, but later they were separated. It seems I've left some common code which cannot be reached because the filter mode is checked before calling the respective functions, i.e. the host filter is always in EXCLUDE mode when using __eht_allow_excl() and __eht_block_excl() thus we can drop the host_excl checks inside and simplify the code a bit. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-16 11:57:57 -07:00
Gustavo A. R. Silva	ecd1c6a51f	net: bridge: Fix fall-through warnings for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-03-10 12:45:15 -08:00
Horatiu Vultur	cd605d455a	bridge: mrp: Update br_mrp to use new return values of br_mrp_switchdev Check the return values of the br_mrp_switchdev function. In case of: - BR_MRP_NONE, return the error to userspace, - BR_MRP_SW, continue with SW implementation, - BR_MRP_HW, continue without SW implementation, Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-16 14:47:46 -08:00
Horatiu Vultur	1a3ddb0b75	bridge: mrp: Extend br_mrp_switchdev to detect better the errors This patch extends the br_mrp_switchdev functions to be able to have a better understanding what cause the issue and if the SW needs to be used as a backup. There are the following cases: - when the code is compiled without CONFIG_NET_SWITCHDEV. In this case return success so the SW can continue with the protocol. Depending on the function, it returns 0 or BR_MRP_SW. - when code is compiled with CONFIG_NET_SWITCHDEV and the driver doesn't implement any MRP callbacks. In this case the HW can't run MRP so it just returns -EOPNOTSUPP. So the SW will stop further to configure the node. - when code is compiled with CONFIG_NET_SWITCHDEV and the driver fully supports any MRP functionality. In this case the SW doesn't need to do anything. The functions will return 0 or BR_MRP_HW. - when code is compiled with CONFIG_NET_SWITCHDEV and the HW can't run completely the protocol but it can help the SW to run it. For example, the HW can't support completely MRM role(can't detect when it stops receiving MRP Test frames) but it can redirect these frames to CPU. In this case it is possible to have a SW fallback. The SW will try initially to call the driver with sw_backup set to false, meaning that the HW should implement completely the role. If the driver returns -EOPNOTSUPP, the SW will try again with sw_backup set to false, meaning that the SW will detect when it stops receiving the frames but it needs HW support to redirect the frames to CPU. In case the driver returns 0 then the SW will continue to configure the node accordingly. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-16 14:47:46 -08:00
Horatiu Vultur	e1bd99d07e	bridge: mrp: Add 'enum br_mrp_hw_support' Add the enum br_mrp_hw_support that is used by the br_mrp_switchdev functions to allow the SW to detect the cases where HW can't implement the functionality or when SW is used as a backup. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-16 14:47:46 -08:00
Vladimir Oltean	c97f47e3c1	net: bridge: fix br_vlan_filter_toggle stub when CONFIG_BRIDGE_VLAN_FILTERING=n The prototype of br_vlan_filter_toggle was updated to include a netlink extack, but the stub definition wasn't, which results in a build error when CONFIG_BRIDGE_VLAN_FILTERING=n. Fixes: `9e781401cb` ("net: bridge: propagate extack through store_bridge_parm") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-15 13:15:10 -08:00
Vladimir Oltean	dcbdf1350e	net: bridge: propagate extack through switchdev_port_attr_set The benefit is the ability to propagate errors from switchdev drivers for the SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING and SWITCHDEV_ATTR_ID_BRIDGE_VLAN_PROTOCOL attributes. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-14 17:38:11 -08:00
Vladimir Oltean	9e781401cb	net: bridge: propagate extack through store_bridge_parm The bridge sysfs interface stores parameters for the STP, VLAN, multicast etc subsystems using a predefined function prototype. Sometimes the underlying function being called supports a netlink extended ack message, and we ignore it. Let's expand the store_bridge_parm function prototype to include the extack, and just print it to console, but at least propagate it where applicable. Where not applicable, create a shim function in the br_sysfs_br.c file that discards the extra function argument. This patch allows us to propagate the extack argument to br_vlan_set_default_pvid, br_vlan_set_proto and br_vlan_filter_toggle, and from there, further up in br_changelink from br_netlink.c. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-14 17:38:11 -08:00
Vladimir Oltean	7a572964e0	net: bridge: remove __br_vlan_filter_toggle This function is identical with br_vlan_filter_toggle. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-14 17:38:11 -08:00
Vladimir Oltean	e18f4c18ab	net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes This switchdev attribute offers a counterproductive API for a driver writer, because although br_switchdev_set_port_flag gets passed a "flags" and a "mask", those are passed piecemeal to the driver, so while the PRE_BRIDGE_FLAGS listener knows what changed because it has the "mask", the BRIDGE_FLAGS listener doesn't, because it only has the final value. But certain drivers can offload only certain combinations of settings, like for example they cannot change unicast flooding independently of multicast flooding - they must be both on or both off. The way the information is passed to switchdev makes drivers not expressive enough, and unable to reject this request ahead of time, in the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during the deferred BRIDGE_FLAGS attribute, where the rejection is currently ignored. This patch also changes drivers to make use of the "mask" field for edge detection when possible. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-12 17:08:04 -08:00
Vladimir Oltean	078bbb851e	net: bridge: don't print in br_switchdev_set_port_flag For the netlink interface, propagate errors through extack rather than simply printing them to the console. For the sysfs interface, we still print to the console, but at least that's one layer higher than in switchdev, which also allows us to silently ignore the offloading of flags if that is ever needed in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-12 17:08:04 -08:00
Vladimir Oltean	304ae3bf1c	net: bridge: offload all port flags at once in br_setport If for example this command: ip link set swp0 type bridge_slave flood off mcast_flood off learning off succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at BR_LEARNING, there would be no attempt to revert the partial state in any way. Arguably, if the user changes more than one flag through the same netlink command, this one _should_ be all or nothing, which means it should be passed through switchdev as all or nothing. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-12 17:08:04 -08:00
David S. Miller	dc9d87581d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	2021-02-10 13:30:12 -08:00
Horatiu Vultur	b2bdba1cbc	bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state The function br_mrp_port_switchdev_set_state was called both with MRP port state and STP port state, which is an issue because they don't match exactly. Therefore, update the function to be used only with STP port state and use the id SWITCHDEV_ATTR_ID_PORT_STP_STATE. The choice of using STP over MRP is that the drivers already implement SWITCHDEV_ATTR_ID_PORT_STP_STATE and already in SW we update the port STP state. Fixes: `9a9f26e8f7` ("bridge: mrp: Connect MRP API with the switchdev API") Fixes: `fadd409136` ("bridge: switchdev: mrp: Implement MRP API for switchdev") Fixes: `2f1a11ae11` ("bridge: mrp: Add MRP interface.") Reported-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 16:20:57 -08:00
Vladimir Oltean	8043c845b6	net: bridge: use switchdev for port flags set through sysfs too Looking through patchwork I don't see that there was any consensus to use switchdev notifiers only in case of netlink provided port flags but not sysfs (as a sort of deprecation, punishment or anything like that), so we should probably keep the user interface consistent in terms of functionality. http://patchwork.ozlabs.org/project/netdev/patch/20170605092043.3523-3-jiri@resnulli.us/ http://patchwork.ozlabs.org/project/netdev/patch/20170608064428.4785-3-jiri@resnulli.us/ Fixes: `3922285d96` ("net: bridge: Add support for offloading port attributes") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-02-08 15:43:19 -08:00
Jakub Kicinski	c273a20c30	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next 1) Remove indirection and use nf_ct_get() instead from nfnetlink_log and nfnetlink_queue, from Florian Westphal. 2) Add weighted random twos choice least-connection scheduling for IPVS, from Darby Payne. 3) Add a __hash placeholder in the flow tuple structure to identify the field to be included in the rhashtable key hash calculation. 4) Add a new nft_parse_register_load() and nft_parse_register_store() to consolidate register load and store in the core. 5) Statify nft_parse_register() since it has no more module clients. 6) Remove redundant assignment in nft_cmp, from Colin Ian King. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: netfilter: nftables: remove redundant assignment of variable err netfilter: nftables: statify nft_parse_register() netfilter: nftables: add nft_parse_register_store() and use it netfilter: nftables: add nft_parse_register_load() and use it netfilter: flowtable: add hash offset field to tuple ipvs: add weighted random twos choice algorithm netfilter: ctnetlink: remove get_ct indirection ==================== Link: https://lore.kernel.org/r/20210206015005.23037-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-06 15:34:23 -08:00
Xu Wang	1697291dae	net: bridge: mcast: Use ERR_CAST instead of ERR_PTR(PTR_ERR()) Use ERR_CAST inlined function instead of ERR_PTR(PTR_ERR(...)). net/bridge/br_multicast.c:1246:9-16: WARNING: ERR_CAST can be used with mp Generated by: scripts/coccinelle/api/err_cast.cocci Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Link: https://lore.kernel.org/r/20210204070549.83636-1-vulab@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-02-06 10:51:01 -08:00
Nikolay Aleksandrov	1e16f382ae	net: bridge: add warning comments to avoid extending sysfs We're moving to netlink-only options, so add comments in the bridge's sysfs files to warn against adding any new sysfs entries. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:33:02 -08:00
Nikolay Aleksandrov	7d0888d52f	net: bridge: mcast: drop hosts limit sysfs support We decided to stop adding new sysfs bridge options and continue with netlink only, so remove hosts limit sysfs support. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-29 21:33:02 -08:00
Jakub Kicinski	c358f95205	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/can/dev.c `b552766c87` ("can: dev: prevent potential information leak in can_fill_info()") `3e77f70e73` ("can: dev: move driver related infrastructure into separate subdir") `0a042c6ec9` ("can: dev: move netlink related code into seperate file") Code move. drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c `57ac4a31c4` ("net/mlx5e: Correctly handle changing the number of queues when the interface is down") `214baf2287` ("net/mlx5e: Support HTB offload") Adjacent code changes net/switchdev/switchdev.c `20776b465c` ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP") `ffb68fc58e` ("net: switchdev: remove the transaction structure from port object notifiers") `bae33f2b5a` ("net: switchdev: remove the transaction structure from port attributes") Transaction parameter gets dropped otherwise keep the fix. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-28 17:09:31 -08:00
Nikolay Aleksandrov	2dba407f99	net: bridge: multicast: make tracked EHT hosts limit configurable Add two new port attributes which make EHT hosts limit configurable and export the current number of tracked EHT hosts: - IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT: configure/retrieve current limit - IFLA_BRPORT_MCAST_EHT_HOSTS_CNT: current number of tracked hosts Setting IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT to 0 is currently not allowed. Note that we have to increase RTNL_SLAVE_MAX_TYPE to 38 minimum, I've increased it to 40 to have space for two more future entries. v2: move br_multicast_eht_set_hosts_limit() to br_multicast_eht.c, no functional change Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:40:35 -08:00
Nikolay Aleksandrov	89268b056e	net: bridge: multicast: add per-port EHT hosts limit Add a default limit of 512 for number of tracked EHT hosts per-port. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-27 17:40:35 -08:00
Pablo Neira Ayuso	345023b0db	netfilter: nftables: add nft_parse_register_store() and use it This new function combines the netlink register attribute parser and the store validation function. This update requires to replace: enum nft_registers dreg:8; in many of the expression private areas otherwise compiler complains with: error: cannot take address of bit-field ‘dreg’ when passing the register field as reference. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2021-01-27 23:16:02 +01:00
Nikolay Aleksandrov	3e841bacf7	net: bridge: multicast: fix br_multicast_eht_set_entry_lookup indentation Fix the messed up indentation in br_multicast_eht_set_entry_lookup(). Fixes: `baa74d39ca` ("net: bridge: multicast: add EHT source set handling functions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210125082040.13022-1-razor@blackwall.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-26 16:26:50 -08:00
Jiapeng Zhong	8d21c882ab	bridge: Use PTR_ERR_OR_ZERO instead if(IS_ERR(...)) + PTR_ERR coccicheck suggested using PTR_ERR_OR_ZERO() and looking at the code. Fix the following coccicheck warnings: ./net/bridge/br_multicast.c:1295:7-13: WARNING: PTR_ERR_OR_ZERO can be used. Reported-by: Abaci <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/1611542381-91178-1-git-send-email-abaci-bugfix@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-25 18:23:07 -08:00
Rasmus Villemoes	6781939054	net: mrp: move struct definitions out of uapi None of these are actually used in the kernel/userspace interface - there's a userspace component of implementing MRP, and userspace will need to construct certain frames to put on the wire, but there's no reason the kernel should provide the relevant definitions in a UAPI header. In fact, some of those definitions were broken until previous commit, so only keep the few that are actually referenced in the kernel code, and move them to the br_private_mrp.h header. Signed-off-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-23 12:38:42 -08:00
Nikolay Aleksandrov	d5a1022283	net: bridge: multicast: mark IGMPv3/MLDv2 fast-leave deletes Mark groups which were deleted due to fast leave/EHT. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	e87e4b5caa	net: bridge: multicast: handle block pg delete for all cases A block report can result in empty source and host sets for both include and exclude groups so if there are no hosts left we can safely remove the group. Pull the block group handling so it can cover both cases and add a check if EHT requires the delete. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	c9739016a0	net: bridge: multicast: add EHT host filter_mode handling We should be able to handle host filter mode changing. For exclude mode we must create a zero-src entry so the group will be kept even without any S,G entries (non-zero source sets). That entry doesn't count to the entry limit and can always be created, its timer is refreshed on new exclude reports and if we change the host filter mode to include then it gets removed and we rely only on the non-zero source sets. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	b66bf55bbc	net: bridge: multicast: optimize TO_INCLUDE EHT timeouts This is an optimization specifically for TO_INCLUDE which sends queries for the older entries and thus lowers the S,G timers to LMQT. If we have the following situation for a group in either include or exclude mode: - host A was interested in srcs X and Y, but is timing out - host B sends TO_INCLUDE src Z, the bridge lowers X and Y's timeouts to LMQT - host B sends BLOCK src Z after LMQT time has passed => since host B is the last host we can delete the group, but if we still have host A's EHT entries for X and Y (i.e. if they weren't lowered to LMQT previously) then we'll have to wait another LMQT time before deleting the group, with this optimization we can directly remove it regardless of the group mode as there are no more interested hosts Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	ddc255d993	net: bridge: multicast: add EHT include and exclude handling Add support for IGMPv3/MLDv2 include and exclude EHT handling. Similar to how the reports are processed we have 2 cases when the group is in include or exclude mode, these are processed as follows: - group include - is_include: create missing entries - to_include: flush existing entries and create a new set from the report, obviously if the src set is empty then we delete the group - group exclude - is_exclude: create missing entries - to_exclude: flush existing entries and create a new set from the report, any empty source set entries are removed If the group is in a different mode then we just flush all entries reported by the host and we create a new set with the new mode entries created from the report. If the report is include type, the source list is empty and the group has empty sources' set then we remove it. Any source set entries which are empty are removed as well. If the group is in exclude mode it can exist without any S,G entries (allowing for all traffic to pass). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:57 -08:00
Nikolay Aleksandrov	474ddb37fa	net: bridge: multicast: add EHT allow/block handling Add support for IGMPv3/MLDv2 allow/block EHT handling. Similar to how the reports are processed we have 2 cases when the group is in include or exclude mode, these are processed as follows: - group include - allow: create missing entries - block: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries, then possibly delete the whole group if there are no more S,G entries - group exclude - allow - host include: create missing entries - host exclude: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries - block - host include: remove existing matching entries and remove the corresponding S,G entries if there are no more set host entries, then possibly delete the whole group if there are no more S,G entries - host exclude: create missing entries Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	dba6b0a5ca	net: bridge: multicast: add EHT host delete function Now that we can delete set entries, we can use that to remove EHT hosts. Since the group's host set entries exist only when there are related source set entries we just have to flush all source set entries joined by the host set entry and it will be automatically removed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	baa74d39ca	net: bridge: multicast: add EHT source set handling functions Add EHT source set and set-entry create, delete and lookup functions. These allow to manipulate source sets which contain their own host sets with entries which joined that S,G. We're limiting the maximum number of tracked S,G entries per host to PG_SRC_ENT_LIMIT (currently 32) which is the current maximum of S,G entries for a group. There's a per-set timer which will be used to destroy the whole set later. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	5b16328879	net: bridge: multicast: add EHT host handling functions Add functions to create, destroy and lookup an EHT host. These are per-host entries contained in the eht_host_tree in net_bridge_port_group which are used to store a list of all sources (S,G) entries joined for that group by each host, the host's current filter mode and total number of joined entries. No functional changes yet, these would be used in later patches. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	8f07b83119	net: bridge: multicast: add EHT structures and definitions Add EHT structures for tracking hosts and sources per group. We keep one set for each host which has all of the host's S,G entries, and one set for each multicast source which has all hosts that have joined that S,G. For each host, source entry we record the filter_mode and we keep an expiry timer. There is also one global expiry timer per source set, it is updated with each set entry update, it will be later used to lower the set's timer instead of lowering each entry's timer separately. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	e7cfcf2c18	net: bridge: multicast: calculate idx position without changing ptr We need to preserve the srcs pointer since we'll be passing it for EHT handling later. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	0ad57c99e8	net: bridge: multicast: __grp_src_block_incl can modify pg Prepare __grp_src_block_incl() for being able to cause a notification due to changes. Currently it cannot happen, but EHT would change that since we'll be deleting sources immediately. Make sure that if the pg is deleted we don't return true as that would cause the caller to access freed pg. This patch shouldn't cause any functional change. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	54bea72196	net: bridge: multicast: pass host src address to IGMPv3/MLDv2 functions We need to pass the host address so later it can be used for explicit host tracking. No functional change. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Nikolay Aleksandrov	9e10b9e656	net: bridge: multicast: rename src_size to addr_size Rename src_size argument to addr_size in preparation for passing host address as an argument to IGMPv3/MLDv2 functions. No functional change. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-22 19:39:56 -08:00
Menglong Dong	a98c0c4742	net: bridge: check vlan with eth_type_vlan() method Replace some checks for ETH_P_8021Q and ETH_P_8021AD with eth_type_vlan(). Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20210117080950.122761-1-dong.menglong@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-18 14:27:33 -08:00
Vladimir Oltean	b7a9e0da2d	net: switchdev: remove vid_begin -> vid_end range from VLAN objects The call path of a switchdev VLAN addition to the bridge looks something like this today: nbp_vlan_init \| __br_vlan_set_default_pvid \| \| \| \| \| br_afspec \| \| \| \| \| \| \| v \| \| \| br_process_vlan_info \| \| \| \| \| \| \| v \| \| \| br_vlan_info \| \| \| / \ / \| \| / \ / \| \| / \ / \| \| / \ / v v v v v nbp_vlan_add br_vlan_add ------+ \| ^ ^ \| \| \| / \| \| \| \| / / / \| \ br_vlan_get_master/ / v \ ^ / / br_vlan_add_existing \ \| / / \| \ \| / / / \ \| / / / \ \| / / / \ \| / / / v \| \| v / __vlan_add / / \| / / \| / v \| / __vlan_vid_add \| / \ \| / v v v br_switchdev_port_vlan_add The ranges UAPI was introduced to the bridge in commit `bdced7ef78` ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec) have always been passed one by one, through struct bridge_vlan_info tmp_vinfo, to br_vlan_info. So the range never went too far in depth. Then Scott Feldman introduced the switchdev_port_bridge_setlink function in commit `47f8328bb1` ("switchdev: add new switchdev bridge setlink"). That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made full use of the range. But switchdev_port_bridge_setlink was called like this: br_setlink -> br_afspec -> switchdev_port_bridge_setlink Basically, the switchdev and the bridge code were not tightly integrated. Then commit `41c498b935` ("bridge: restore br_setlink back to original") came, and switchdev drivers were required to implement .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while. In the meantime, commits such as `0944d6b5a2` ("bridge: try switchdev op first in __vlan_vid_add/del") finally made switchdev penetrate the br_vlan_info() barrier and start to develop the call path we have today. But remember, br_vlan_info() still receives VLANs one by one. Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit `29ab586c3d` ("net: switchdev: Remove bridge bypass support from switchdev") so that drivers would not implement .ndo_bridge_setlink any longer. The switchdev_port_bridge_setlink also got deleted. This refactoring removed the parallel bridge_setlink implementation from switchdev, and left the only switchdev VLAN objects to be the ones offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add (the latter coming from commit `9c86ce2c1a` ("net: bridge: Notify about bridge VLANs")). That is to say, today the switchdev VLAN object ranges are not used in the kernel. Refactoring the above call path is a bit complicated, when the bridge VLAN call path is already a bit complicated. Let's go off and finish the job of commit `29ab586c3d` by deleting the bogus iteration through the VLAN ranges from the drivers. Some aspects of this feature never made too much sense in the first place. For example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID flag supposed to mean, when a port can obviously have a single pvid? This particular configuration _is_ denied as of commit `6623c60dc2` ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API perspective, the driver still has to play pretend, and only offload the vlan->vid_end as pvid. And the addition of a switchdev VLAN object can modify the flags of another, completely unrelated, switchdev VLAN object! (a VLAN that is PVID will invalidate the PVID flag from whatever other VLAN had previously been offloaded with switchdev and had that flag. Yet switchdev never notifies about that change, drivers are supposed to guess). Nonetheless, having a VLAN range in the API makes error handling look scarier than it really is - unwinding on errors and all of that. When in reality, no one really calls this API with more than one VLAN. It is all unnecessary complexity. And despite appearing pretentious (two-phase transactional model and all), the switchdev API is really sloppy because the VLAN addition and removal operations are not paired with one another (you can add a VLAN 100 times and delete it just once). The bridge notifies through switchdev of a VLAN addition not only when the flags of an existing VLAN change, but also when nothing changes. There are switchdev drivers out there who don't like adding a VLAN that has already been added, and those checks don't really belong at driver level. But the fact that the API contains ranges is yet another factor that prevents this from being addressed in the future. Of the existing switchdev pieces of hardware, it appears that only Mellanox Spectrum supports offloading more than one VLAN at a time, through mlxsw_sp_port_vlan_set. I have kept that code internal to the driver, because there is some more bookkeeping that makes use of it, but I deleted it from the switchdev API. But since the switchdev support for ranges has already been de facto deleted by a Mellanox employee and nobody noticed for 4 years, I'm going to assume it's not a biggie. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-11 16:00:56 -08:00
Menglong Dong	efb5b338da	net: bridge: fix misspellings using codespell tool Some typos are found out by codespell tool: $ codespell ./net/bridge/ ./net/bridge/br_stp.c:604: permanant ==> permanent ./net/bridge/br_stp.c:605: persistance ==> persistence ./net/bridge/br.c:125: underlaying ==> underlying ./net/bridge/br_input.c:43: modue ==> mode ./net/bridge/br_mrp.c:828: Determin ==> Determine ./net/bridge/br_mrp.c:848: Determin ==> Determine ./net/bridge/br_mrp.c:897: Determin ==> Determine Fix typos found by codespell. Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210108025332.52480-1-dong.menglong@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-09 13:54:47 -08:00
Vladimir Oltean	90dc8fd360	net: bridge: notify switchdev of disappearance of old FDB entry upon migration Currently the bridge emits atomic switchdev notifications for dynamically learnt FDB entries. Monitoring these notifications works wonders for switchdev drivers that want to keep their hardware FDB in sync with the bridge's FDB. For example station A wants to talk to station B in the diagram below, and we are concerned with the behavior of the bridge on the DUT device: DUT +-------------------------------------+ \| br0 \| \| +------+ +------+ +------+ +------+ \| \| \| \| \| \| \| \| \| \| \| \| \| swp0 \| \| swp1 \| \| swp2 \| \| eth0 \| \| +-------------------------------------+ \| \| \| Station A \| \| \| \| +--+------+--+ +--+------+--+ \| \| \| \| \| \| \| \| \| \| swp0 \| \| \| \| swp0 \| \| Another \| +------+ \| \| +------+ \| Another switch \| br0 \| \| br0 \| switch \| +------+ \| \| +------+ \| \| \| \| \| \| \| \| \| \| \| swp1 \| \| \| \| swp1 \| \| +--+------+--+ +--+------+--+ \| Station B Interfaces swp0, swp1, swp2 are handled by a switchdev driver that has the following property: frames injected from its control interface bypass the internal address analyzer logic, and therefore, this hardware does not learn from the source address of packets transmitted by the network stack through it. So, since bridging between eth0 (where Station B is attached) and swp0 (where Station A is attached) is done in software, the switchdev hardware will never learn the source address of Station B. So the traffic towards that destination will be treated as unknown, i.e. flooded. This is where the bridge notifications come in handy. When br0 on the DUT sees frames with Station B's MAC address on eth0, the switchdev driver gets these notifications and can install a rule to send frames towards Station B's address that are incoming from swp0, swp1, swp2, only towards the control interface. This is all switchdev driver private business, which the notification makes possible. All is fine until someone unplugs Station B's cable and moves it to the other switch: DUT +-------------------------------------+ \| br0 \| \| +------+ +------+ +------+ +------+ \| \| \| \| \| \| \| \| \| \| \| \| \| swp0 \| \| swp1 \| \| swp2 \| \| eth0 \| \| +-------------------------------------+ \| \| \| Station A \| \| \| \| +--+------+--+ +--+------+--+ \| \| \| \| \| \| \| \| \| \| swp0 \| \| \| \| swp0 \| \| Another \| +------+ \| \| +------+ \| Another switch \| br0 \| \| br0 \| switch \| +------+ \| \| +------+ \| \| \| \| \| \| \| \| \| \| \| swp1 \| \| \| \| swp1 \| \| +--+------+--+ +--+------+--+ \| Station B Luckily for the use cases we care about, Station B is noisy enough that the DUT hears it (on swp1 this time). swp1 receives the frames and delivers them to the bridge, who enters the unlikely path in br_fdb_update of updating an existing entry. It moves the entry in the software bridge to swp1 and emits an addition notification towards that. As far as the switchdev driver is concerned, all that it needs to ensure is that traffic between Station A and Station B is not forever broken. If it does nothing, then the stale rule to send frames for Station B towards the control interface remains in place. But Station B is no longer reachable via the control interface, but via a port that can offload the bridge port learning attribute. It's just that the port is prevented from learning this address, since the rule overrides FDB updates. So the rule needs to go. The question is via what mechanism. It sure would be possible for this switchdev driver to keep track of all addresses which are sent to the control interface, and then also listen for bridge notifier events on its own ports, searching for the ones that have a MAC address which was previously sent to the control interface. But this is cumbersome and inefficient. Instead, with one small change, the bridge could notify of the address deletion from the old port, in a symmetrical manner with how it did for the insertion. Then the switchdev driver would not be required to monitor learn/forget events for its own ports. It could just delete the rule towards the control interface upon bridge entry migration. This would make hardware address learning be possible again. Then it would take a few more packets until the hardware and software FDB would be in sync again. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2021-01-07 15:34:45 -08:00
Wang Hai	989a1db06e	net: bridge: Fix a warning when del bridge sysfs I got a warining report: br_sysfs_addbr: can't create group bridge4/bridge ------------[ cut here ]------------ sysfs group 'bridge' not found for kobject 'bridge4' WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 sysfs_remove_group fs/sysfs/group.c:279 [inline] WARNING: CPU: 2 PID: 9004 at fs/sysfs/group.c:279 sysfs_remove_group+0x153/0x1b0 fs/sysfs/group.c:270 Modules linked in: iptable_nat ... Call Trace: br_dev_delete+0x112/0x190 net/bridge/br_if.c:384 br_dev_newlink net/bridge/br_netlink.c:1381 [inline] br_dev_newlink+0xdb/0x100 net/bridge/br_netlink.c:1362 __rtnl_newlink+0xe11/0x13f0 net/core/rtnetlink.c:3441 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3500 rtnetlink_rcv_msg+0x385/0x980 net/core/rtnetlink.c:5562 netlink_rcv_skb+0x134/0x3d0 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x4a0/0x6a0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x793/0xc80 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:651 [inline] sock_sendmsg+0x139/0x170 net/socket.c:671 ____sys_sendmsg+0x658/0x7d0 net/socket.c:2353 ___sys_sendmsg+0xf8/0x170 net/socket.c:2407 __sys_sendmsg+0xd3/0x190 net/socket.c:2440 do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 In br_device_event(), if the bridge sysfs fails to be added, br_device_event() should return error. This can prevent warining when removing bridge sysfs that do not exist. Fixes: `bb900b27a2` ("bridge: allow creating bridge devices with netlink") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Tested-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20201211122921.40386-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-14 18:27:49 -08:00
Jakub Kicinski	7bca5021a4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next 1) Missing dependencies in NFT_BRIDGE_REJECT, from Randy Dunlap. 2) Use atomic_inc_return() instead of atomic_add_return() in IPVS, from Yejune Deng. 3) Simplify check for overquota in xt_nfacct, from Kaixu Xia. 4) Move nfnl_acct_list away from struct net, from Miao Wang. 5) Pass actual sk in reject actions, from Jan Engelhardt. 6) Add timeout and protoinfo to ctnetlink destroy events, from Florian Westphal. 7) Four patches to generalize set infrastructure to support for multiple expressions per set element. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: netfilter: nftables: netlink support for several set element expressions netfilter: nftables: generalize set extension to support for several expressions netfilter: nftables: move nft_expr before nft_set netfilter: nftables: generalize set expressions support netfilter: ctnetlink: add timeout and protoinfo to destroy events netfilter: use actual socket sk for REJECT action netfilter: nfnl_acct: remove data from struct net netfilter: Remove unnecessary conversion to bool ipvs: replace atomic_add_return() netfilter: nft_reject_bridge: fix build errors due to code movement ==================== Link: https://lore.kernel.org/r/20201212230513.3465-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-14 15:43:21 -08:00
Jakub Kicinski	46d5e62dd3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-11 22:29:38 -08:00
Joseph Huang	851d0a73c9	bridge: Fix a deadlock when enabling multicast snooping When enabling multicast snooping, bridge module deadlocks on multicast_lock if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2 network. The deadlock was caused by the following sequence: While holding the lock, br_multicast_open calls br_multicast_join_snoopers, which eventually causes IP stack to (attempt to) send out a Listener Report (in igmp6_join_group). Since the destination Ethernet address is a multicast address, br_dev_xmit feeds the packet back to the bridge via br_multicast_rcv, which in turn calls br_multicast_add_group, which then deadlocks on multicast_lock. The fix is to move the call br_multicast_join_snoopers outside of the critical section. This works since br_multicast_join_snoopers only deals with IP and does not modify any multicast data structures of the bridge, so there's no need to hold the lock. Steps to reproduce: 1. sysctl net.ipv6.conf.all.force_mld_version=1 2. have another querier 3. ip link set dev bridge type bridge mcast_snooping 0 && \ ip link set dev bridge type bridge mcast_snooping 1 < deadlock > A typical call trace looks like the following: [ 936.251495] _raw_spin_lock+0x5c/0x68 [ 936.255221] br_multicast_add_group+0x40/0x170 [bridge] [ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge] [ 936.265322] br_dev_xmit+0x140/0x368 [bridge] [ 936.269689] dev_hard_start_xmit+0x94/0x158 [ 936.273876] __dev_queue_xmit+0x5ac/0x7f8 [ 936.277890] dev_queue_xmit+0x10/0x18 [ 936.281563] neigh_resolve_output+0xec/0x198 [ 936.285845] ip6_finish_output2+0x240/0x710 [ 936.290039] __ip6_finish_output+0x130/0x170 [ 936.294318] ip6_output+0x6c/0x1c8 [ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8 [ 936.301834] igmp6_send+0x358/0x558 [ 936.305326] igmp6_join_group.part.0+0x30/0xf0 [ 936.309774] igmp6_group_added+0xfc/0x110 [ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290 [ 936.317885] ipv6_dev_mc_inc+0x10/0x18 [ 936.321677] br_multicast_open+0xbc/0x110 [bridge] [ 936.326506] br_multicast_toggle+0xec/0x140 [bridge] Fixes: `4effd28c12` ("bridge: join all-snoopers multicast address") Signed-off-by: Joseph Huang <Joseph.Huang@garmin.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/20201204235628.50653-1-Joseph.Huang@garmin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-07 17:14:43 -08:00
Zhang Changzhong	ee4f52a8de	net: bridge: vlan: fix error return code in __vlan_add() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `f8ed289fab` ("bridge: vlan: use br_vlan_(get\|put)_master to deal with refcounts") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Link: https://lore.kernel.org/r/1607071737-33875-1-git-send-email-zhangchangzhong@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-04 15:41:06 -08:00
Jakub Kicinski	55fd59b003	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: drivers/net/ethernet/ibm/ibmvnic.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-03 15:44:09 -08:00
Danielle Ratson	22ec19f3ae	bridge: switchdev: Notify about VLAN protocol changes Drivers that support bridge offload need to be notified about changes to the bridge's VLAN protocol so that they could react accordingly and potentially veto the change. Add a new switchdev attribute to communicate the change to drivers. Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-12-01 15:21:13 -08:00
Antoine Tenart	44f64f23ba	netfilter: bridge: reset skb->pkt_type after NF_INET_POST_ROUTING traversal Netfilter changes PACKET_OTHERHOST to PACKET_HOST before invoking the hooks as, while it's an expected value for a bridge, routing expects PACKET_HOST. The change is undone later on after hook traversal. This can be seen with pairs of functions updating skb>pkt_type and then reverting it to its original value: For hook NF_INET_PRE_ROUTING: setup_pre_routing / br_nf_pre_routing_finish For hook NF_INET_FORWARD: br_nf_forward_ip / br_nf_forward_finish But the third case where netfilter does this, for hook NF_INET_POST_ROUTING, the packet type is changed in br_nf_post_routing but never reverted. A comment says: /* We assume any code from br_dev_queue_push_xmit onwards doesn't care * about the value of skb->pkt_type. */ But when having a tunnel (say vxlan) attached to a bridge we have the following call trace: br_nf_pre_routing br_nf_pre_routing_ipv6 br_nf_pre_routing_finish br_nf_forward_ip br_nf_forward_finish br_nf_post_routing <- pkt_type is updated to PACKET_HOST br_nf_dev_queue_xmit <- but not reverted to its original value vxlan_xmit vxlan_xmit_one skb_tunnel_check_pmtu <- a check on pkt_type is performed In this specific case, this creates issues such as when an ICMPv6 PTB should be sent back. When CONFIG_BRIDGE_NETFILTER is enabled, the PTB isn't sent (as skb_tunnel_check_pmtu checks if pkt_type is PACKET_HOST and returns early). If the comment is right and no one cares about the value of skb->pkt_type after br_dev_queue_push_xmit (which isn't true), resetting it to its original value should be safe. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Antoine Tenart <atenart@kernel.org> Reviewed-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20201123174902.622102-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-28 11:46:51 -08:00
Horatiu Vultur	bfd042321a	bridge: mrp: Implement LC mode for MRP Extend MRP to support LC mode(link check) for the interconnect port. This applies only to the interconnect ring. Opposite to RC mode(ring check) the LC mode is using CFM frames to detect when the link goes up or down and based on that the userspace will need to react. One advantage of the LC mode over RC mode is that there will be fewer frames in the normal rings. Because RC mode generates InTest on all ports while LC mode sends CFM frame only on the interconnect port. All 4 nodes part of the interconnect ring needs to have the same mode. And it is not possible to have running LC and RC mode at the same time on a node. Whenever the MIM starts it needs to detect the status of the other 3 nodes in the interconnect ring so it would send a frame called InLinkStatus, on which the clients needs to reply with their link status. This patch adds InLinkStatus frame type and extends existing rules on how to forward this frame. Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20201124082525.273820-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-25 13:33:35 -08:00
Randy Dunlap	fd2d6bc4c2	netfilter: nft_reject_bridge: fix build errors due to code movement Fix build errors in net/bridge/netfilter/nft_reject_bridge.ko by selecting NF_REJECT_IPV4, which provides the missing symbols. ERROR: modpost: "nf_reject_skb_v4_tcp_reset" [net/bridge/netfilter/nft_reject_bridge.ko] undefined! ERROR: modpost: "nf_reject_skb_v4_unreach" [net/bridge/netfilter/nft_reject_bridge.ko] undefined! Fixes: `fa538f7cf0` ("netfilter: nf_reject: add reject skbuff creation helpers") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-11-22 13:44:51 +01:00
Heiner Kallweit	7609ecb2ed	net: bridge: switch to net core statistics counters handling Use netdev->tstats instead of a member of net_bridge for storing a pointer to the per-cpu counters. This allows us to use core functionality for statistics handling. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/9bad2be2-fd84-7c6e-912f-cee433787018@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-21 14:40:50 -08:00
Jakub Kicinski	56495a2442	Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-19 19:08:46 -08:00
Heiner Kallweit	281cc2843b	net: bridge: replace struct br_vlan_stats with pcpu_sw_netstats Struct br_vlan_stats duplicates pcpu_sw_netstats (apart from br_vlan_stats not defining an alignment requirement), therefore switch to using the latter one. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/04d25c3d-c5f6-3611-6d37-c2f40243dae2@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-18 17:23:16 -08:00
Heiner Kallweit	7a30ecc923	net: bridge: add missing counters to ndo_get_stats64 callback In br_forward.c and br_input.c fields dev->stats.tx_dropped and dev->stats.multicast are populated, but they are ignored in ndo_get_stats64. Fixes: `28172739f0` ("net: fix 64 bit counters on 32 bit arches") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/58ea9963-77ad-a7cf-8dfd-fc95ab95f606@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-16 15:47:50 -08:00
Horatiu Vultur	0169b82054	bridge: mrp: Use hlist_head instead of list_head for mrp Replace list_head with hlist_head for MRP list under the bridge. There is no need for a circular list when a linear list will work. This will also decrease the size of 'struct net_bridge'. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Link: https://lore.kernel.org/r/20201106215049.1448185-1-horatiu.vultur@microchip.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-09 16:42:12 -08:00
Jakub Kicinski	b65ca4c388	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next 1) Move existing bridge packet reject infra to nf_reject_{ipv4,ipv6}.c from Jose M. Guisado. 2) Consolidate nft_reject_inet initialization and dump, also from Jose. 3) Add the netdev reject action, from Jose. 4) Allow to combine the exist flag and the destroy command in ipset, from Joszef Kadlecsik. 5) Expose bucket size parameter for hashtables, also from Jozsef. 6) Expose the init value for reproducible ipset listings, from Jozsef. 7) Use __printf attribute in nft_request_module, from Andrew Lunn. 8) Allow to use reject from the inet ingress chain. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next: netfilter: nft_reject_inet: allow to use reject from inet ingress netfilter: nftables: Add __printf() attribute netfilter: ipset: Expose the initval hash parameter to userspace netfilter: ipset: Add bucketsize parameter to all hash types netfilter: ipset: Support the -exist flag with the destroy command netfilter: nft_reject: add reject verdict support for netdev netfilter: nft_reject: unify reject init and dump into nft_reject netfilter: nf_reject: add reject skbuff creation helpers ==================== Link: https://lore.kernel.org/r/20201104141149.30082-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-11-04 18:05:56 -08:00
Vladimir Oltean	c43fd36f7f	net: bridge: mcast: fix stub definition of br_multicast_querier_exists The commit cited below has changed only the functional prototype of br_multicast_querier_exists, but forgot to do that for the stub prototype (the one where CONFIG_BRIDGE_IGMP_SNOOPING is disabled). Fixes: `955062b03f` ("net: bridge: mcast: add support for raw L2 multicast groups") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201101000845.190009-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-31 17:23:19 -07:00
Jose M. Guisado Gomez	312ca575a5	netfilter: nft_reject: unify reject init and dump into nft_reject Bridge family is using the same static init and dump function as inet. This patch removes duplicate code unifying these functions body into nft_reject.c so they can be reused in the rest of families supporting reject verdict. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-10-31 10:40:42 +01:00
Jose M. Guisado Gomez	fa538f7cf0	netfilter: nf_reject: add reject skbuff creation helpers Adds reject skbuff creation helper functions to ipv4/6 nf_reject infrastructure. Use these functions for reject verdict in bridge family. Can be reused by all different families that support reject and will not inject the reject packet through ip local out. Signed-off-by: Jose M. Guisado Gomez <guigom@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-10-31 10:40:22 +01:00
Vladimir Oltean	0e761ac08f	net: bridge: explicitly convert between mdb entry state and port group flags When creating a new multicast port group, there is implicit conversion between the __u8 state member of struct br_mdb_entry and the unsigned char flags member of struct net_bridge_port_group. This implicit conversion relies on the fact that MDB_PERMANENT is equal to MDB_PG_FLAGS_PERMANENT. Let's be more explicit and convert the state to flags manually. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201028234815.613226-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 17:58:16 -07:00
Nikolay Aleksandrov	955062b03f	net: bridge: mcast: add support for raw L2 multicast groups Extend the bridge multicast control and data path to configure routes for L2 (non-IP) multicast groups. The uapi struct br_mdb_entry union u is extended with another variant, mac_addr, which does not change the structure size, and which is valid when the proto field is zero. To be compatible with the forwarding code that is already in place, which acts as an IGMP/MLD snooping bridge with querier capabilities, we need to declare that for L2 MDB entries (for which there exists no such thing as IGMP/MLD snooping/querying), that there is always a querier. Otherwise, these entries would be flooded to all bridge ports and not just to those that are members of the L2 multicast group. Needless to say, only permanent L2 multicast groups can be installed on a bridge port. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20201028233831.610076-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-30 17:49:19 -07:00
Henrik Bjoernlund	b6d0425b81	bridge: cfm: Netlink Notifications. This is the implementation of Netlink notifications out of CFM. Notifications are initiated whenever a state change happens in CFM. IFLA_BRIDGE_CFM: Points to the CFM information. IFLA_BRIDGE_CFM_MEP_STATUS_INFO: This indicate that the MEP instance status are following. IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO: This indicate that the peer MEP status are following. CFM nested attribute has the following attributes in next level. IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is NLA_U32. IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected Opcode. The type is NLA_U32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected version. The type is NLA_U32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN: The MEP instance received CCM PDU with MD level lower than configured level. This frame is discarded. The type is NLA_U32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is NLA_U32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID: The added Peer MEP ID of the delivered status. The type is NLA_U32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT: The CCM defect status. The type is NLA_U32 (bool). True means no CCM frame is received for 3.25 intervals. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL. IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI: The last received CCM PDU RDI. The type is NLA_U32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE: The last received CCM PDU Port Status TLV value field. The type is NLA_U8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE: The last received CCM PDU Interface Status TLV value field. The type is NLA_U8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN: A CCM frame has been received from Peer MEP. The type is NLA_U32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN: A CCM frame with TLV has been received from Peer MEP. The type is NLA_U32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN: A CCM frame with unexpected sequence number has been received from Peer MEP. The type is NLA_U32 (bool). When a sequence number is not one higher than previously received then it is unexpected. This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:44 -07:00
Henrik Bjoernlund	e77824d81d	bridge: cfm: Netlink GET status Interface. This is the implementation of CFM netlink status get information interface. Add new nested netlink attributes. These attributes are used by the user space to get status information. GETLINK: Request filter RTEXT_FILTER_CFM_STATUS: Indicating that CFM status information must be delivered. IFLA_BRIDGE_CFM: Points to the CFM information. IFLA_BRIDGE_CFM_MEP_STATUS_INFO: This indicate that the MEP instance status are following. IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO: This indicate that the peer MEP status are following. CFM nested attribute has the following attributes in next level. GETLINK RTEXT_FILTER_CFM_STATUS: IFLA_BRIDGE_CFM_MEP_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is u32. IFLA_BRIDGE_CFM_MEP_STATUS_OPCODE_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected Opcode. The type is u32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_VERSION_UNEXP_SEEN: The MEP instance received CFM PDU with unexpected version. The type is u32 (bool). IFLA_BRIDGE_CFM_MEP_STATUS_RX_LEVEL_LOW_SEEN: The MEP instance received CCM PDU with MD level lower than configured level. This frame is discarded. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_INSTANCE: The MEP instance number of the delivered status. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_PEER_MEPID: The added Peer MEP ID of the delivered status. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_STATUS_CCM_DEFECT: The CCM defect status. The type is u32 (bool). True means no CCM frame is received for 3.25 intervals. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL. IFLA_BRIDGE_CFM_CC_PEER_STATUS_RDI: The last received CCM PDU RDI. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_PEER_STATUS_PORT_TLV_VALUE: The last received CCM PDU Port Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_IF_TLV_VALUE: The last received CCM PDU Interface Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEEN: A CCM frame has been received from Peer MEP. The type is u32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_TLV_SEEN: A CCM frame with TLV has been received from Peer MEP. The type is u32 (bool). This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. IFLA_BRIDGE_CFM_CC_PEER_STATUS_SEQ_UNEXP_SEEN: A CCM frame with unexpected sequence number has been received from Peer MEP. The type is u32 (bool). When a sequence number is not one higher than previously received then it is unexpected. This is cleared after GETLINK IFLA_BRIDGE_CFM_CC_PEER_STATUS_INFO. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:44 -07:00
Henrik Bjoernlund	5e312fc0e7	bridge: cfm: Netlink GET configuration Interface. This is the implementation of CFM netlink configuration get information interface. Add new nested netlink attributes. These attributes are used by the user space to get configuration information. GETLINK: Request filter RTEXT_FILTER_CFM_CONFIG: Indicating that CFM configuration information must be delivered. IFLA_BRIDGE_CFM: Points to the CFM information. IFLA_BRIDGE_CFM_MEP_CREATE_INFO: This indicate that MEP instance create parameters are following. IFLA_BRIDGE_CFM_MEP_CONFIG_INFO: This indicate that MEP instance config parameters are following. IFLA_BRIDGE_CFM_CC_CONFIG_INFO: This indicate that MEP instance CC functionality parameters are following. IFLA_BRIDGE_CFM_CC_RDI_INFO: This indicate that CC transmitted CCM PDU RDI parameters are following. IFLA_BRIDGE_CFM_CC_CCM_TX_INFO: This indicate that CC transmitted CCM PDU parameters are following. IFLA_BRIDGE_CFM_CC_PEER_MEP_INFO: This indicate that the added peer MEP IDs are following. CFM nested attribute has the following attributes in next level. GETLINK RTEXT_FILTER_CFM_CONFIG: IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE: The created MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN: The created MEP domain. The type is u32 (br_cfm_domain). It must be BR_CFM_PORT. This means that CFM frames are transmitted and received directly on the port - untagged. Not in a VLAN. IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION: The created MEP direction. The type is u32 (br_cfm_mep_direction). It must be BR_CFM_MEP_DIRECTION_DOWN. This means that CFM frames are transmitted and received on the port. Not in the bridge. IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX: The created MEP residence port ifindex. The type is u32 (ifindex). IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE: The deleted MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC: The configured MEP unicast MAC address. The type is 6u8 (array). This is used as SMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL: The configured MEP unicast MD level. The type is u32. It must be in the range 1-7. No CFM frames are passing through this MEP on lower levels. IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID: The configured MEP ID. The type is u32. It must be in the range 0-0x1FFF. This MEP ID is inserted in any transmitted CCM frame. IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE: The Continuity Check (CC) functionality is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL: The CC expected receive interval of CCM frames. The type is u32 (br_cfm_ccm_interval). This is also the transmission interval of CCM frames when enabled. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID: The CC expected receive MAID in CCM frames. The type is CFM_MAID_LENGTHu8. This is MAID is also inserted in transmitted CCM frames. IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_MEPID: The CC Peer MEP ID added. The type is u32. When a Peer MEP ID is added and CC is enabled it is expected to receive CCM frames from that Peer MEP. IFLA_BRIDGE_CFM_CC_RDI_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_RDI_RDI: The RDI that is inserted in transmitted CCM PDU. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC: The transmitted CCM frame destination MAC address. The type is 6*u8 (array). This is used as DMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE: The transmitted CCM frame update (increment) of sequence number is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD: The period of time where CCM frame are transmitted. The type is u32. The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX must be done before timeout to keep transmission alive. When period is zero any ongoing CCM frame transmission will be stopped. IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV: The transmitted CCM frame update with Interface Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE: The transmitted Interface Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV: The transmitted CCM frame update with Port Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE: The transmitted Port Status TLV value field. The type is u8. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	2be665c394	bridge: cfm: Netlink SET configuration Interface. This is the implementation of CFM netlink configuration set information interface. Add new nested netlink attributes. These attributes are used by the user space to create/delete/configure CFM instances. SETLINK: IFLA_BRIDGE_CFM: Indicate that the following attributes are CFM. IFLA_BRIDGE_CFM_MEP_CREATE: This indicate that a MEP instance must be created. IFLA_BRIDGE_CFM_MEP_DELETE: This indicate that a MEP instance must be deleted. IFLA_BRIDGE_CFM_MEP_CONFIG: This indicate that a MEP instance must be configured. IFLA_BRIDGE_CFM_CC_CONFIG: This indicate that a MEP instance Continuity Check (CC) functionality must be configured. IFLA_BRIDGE_CFM_CC_PEER_MEP_ADD: This indicate that a CC Peer MEP must be added. IFLA_BRIDGE_CFM_CC_PEER_MEP_REMOVE: This indicate that a CC Peer MEP must be removed. IFLA_BRIDGE_CFM_CC_CCM_TX: This indicate that the CC transmitted CCM PDU must be configured. IFLA_BRIDGE_CFM_CC_RDI: This indicate that the CC transmitted CCM PDU RDI must be configured. CFM nested attribute has the following attributes in next level. SETLINK RTEXT_FILTER_CFM_CONFIG: IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE: The created MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN: The created MEP domain. The type is u32 (br_cfm_domain). It must be BR_CFM_PORT. This means that CFM frames are transmitted and received directly on the port - untagged. Not in a VLAN. IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION: The created MEP direction. The type is u32 (br_cfm_mep_direction). It must be BR_CFM_MEP_DIRECTION_DOWN. This means that CFM frames are transmitted and received on the port. Not in the bridge. IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX: The created MEP residence port ifindex. The type is u32 (ifindex). IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE: The deleted MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC: The configured MEP unicast MAC address. The type is 6u8 (array). This is used as SMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL: The configured MEP unicast MD level. The type is u32. It must be in the range 1-7. No CFM frames are passing through this MEP on lower levels. IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID: The configured MEP ID. The type is u32. It must be in the range 0-0x1FFF. This MEP ID is inserted in any transmitted CCM frame. IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE: The Continuity Check (CC) functionality is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL: The CC expected receive interval of CCM frames. The type is u32 (br_cfm_ccm_interval). This is also the transmission interval of CCM frames when enabled. IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID: The CC expected receive MAID in CCM frames. The type is CFM_MAID_LENGTHu8. This is MAID is also inserted in transmitted CCM frames. IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_PEER_MEPID: The CC Peer MEP ID added. The type is u32. When a Peer MEP ID is added and CC is enabled it is expected to receive CCM frames from that Peer MEP. IFLA_BRIDGE_CFM_CC_RDI_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_RDI_RDI: The RDI that is inserted in transmitted CCM PDU. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE: The configured MEP instance number. The type is u32. IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC: The transmitted CCM frame destination MAC address. The type is 6*u8 (array). This is used as DMAC in all transmitted CFM frames. IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE: The transmitted CCM frame update (increment) of sequence number is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD: The period of time where CCM frame are transmitted. The type is u32. The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX must be done before timeout to keep transmission alive. When period is zero any ongoing CCM frame transmission will be stopped. IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV: The transmitted CCM frame update with Interface Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE: The transmitted Interface Status TLV value field. The type is u8. IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV: The transmitted CCM frame update with Port Status TLV is enabled or disabled. The type is u32 (bool). IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE: The transmitted Port Status TLV value field. The type is u8. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	dc32cbb3db	bridge: cfm: Kernel space implementation of CFM. CCM frame RX added. This is the third commit of the implementation of the CFM protocol according to 802.1Q section 12.14. Functionality is extended with CCM frame reception. The MEP instance now contains CCM based status information. Most important is the CCM defect status indicating if correct CCM frames are received with the expected interval. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	a806ad8ee2	bridge: cfm: Kernel space implementation of CFM. CCM frame TX added. This is the second commit of the implementation of the CFM protocol according to 802.1Q section 12.14. Functionality is extended with CCM frame transmission. Interface is extended with these functions: br_cfm_cc_rdi_set() br_cfm_cc_ccm_tx() br_cfm_cc_config_set() A MEP Continuity Check feature can be configured by br_cfm_cc_config_set() The Continuity Check parameters can be configured to be used when transmitting CCM. A MEP can be configured to start or stop transmission of CCM frames by br_cfm_cc_ccm_tx() The CCM will be transmitted for a selected period in seconds. Must call this function before timeout to keep transmission alive. A MEP transmitting CCM can be configured with inserted RDI in PDU by br_cfm_cc_rdi_set() Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	86a14b79e1	bridge: cfm: Kernel space implementation of CFM. MEP create/delete. This is the first commit of the implementation of the CFM protocol according to 802.1Q section 12.14. It contains MEP instance create, delete and configuration. Connectivity Fault Management (CFM) comprises capabilities for detecting, verifying, and isolating connectivity failures in Virtual Bridged Networks. These capabilities can be used in networks operated by multiple independent organizations, each with restricted management access to each others equipment. CFM functions are partitioned as follows: - Path discovery - Fault detection - Fault verification and isolation - Fault notification - Fault recovery Interface consists of these functions: br_cfm_mep_create() br_cfm_mep_delete() br_cfm_mep_config_set() br_cfm_cc_config_set() br_cfm_cc_peer_mep_add() br_cfm_cc_peer_mep_remove() A MEP instance is created by br_cfm_mep_create() -It is the Maintenance association End Point described in 802.1Q section 19.2. -It is created on a specific level (1-7) and is assuring that no CFM frames are passing through this MEP on lower levels. -It initiates and validates CFM frames on its level. -It can only exist on a port that is related to a bridge. -Attributes given cannot be changed until the instance is deleted. A MEP instance can be deleted by br_cfm_mep_delete(). A created MEP instance has attributes that can be configured by br_cfm_mep_config_set(). A MEP Continuity Check feature can be configured by br_cfm_cc_config_set() The Continuity Check Receiver state machine can be enabled and disabled. According to 802.1Q section 19.2.8 A MEP can have Peer MEPs added and removed by br_cfm_cc_peer_mep_add() and br_cfm_cc_peer_mep_remove() The Continuity Check feature can maintain connectivity status on each added Peer MEP. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	f323aa54be	bridge: cfm: Add BRIDGE_CFM to Kconfig. This makes it possible to include or exclude the CFM protocol according to 802.1Q section 12.14. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Henrik Bjoernlund	90c628dd47	net: bridge: extend the process of special frames This patch extends the processing of frames in the bridge. Currently MRP frames needs special processing and the current implementation doesn't allow a nice way to process different frame types. Therefore try to improve this by adding a list that contains frame types that need special processing. This list is iterated for each input frame and if there is a match based on frame type then these functions will be called and decide what to do with the frame. It can process the frame then the bridge doesn't need to do anything or don't process so then the bridge will do normal forwarding. Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-29 18:39:43 -07:00
Timothée COCAULT	63137bc588	netfilter: ebtables: Fixes dropping of small packets in bridge nat Fixes an error causing small packets to get dropped. skb_ensure_writable expects the second parameter to be a length in the ethernet payload.=20 If we want to write the ethernet header (src, dst), we should pass 0. Otherwise, packets with small payloads (< ETH_ALEN) will get dropped. Fixes: `c1a8311679` ("netfilter: bridge: convert skb_make_writable to skb_ensure_writable") Signed-off-by: Timothée COCAULT <timothee.cocault@orange.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-10-20 13:54:53 +02:00
Heiner Kallweit	f3f04f0f3a	net: bridge: use new function dev_fetch_sw_netstats Simplify the code by using new function dev_fetch_sw_netstats(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/d1c3ff29-5691-9d54-d164-16421905fa59@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-13 17:33:49 -07:00
Jakub Kicinski	9d49aea13f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Small conflict around locking in rxrpc_process_event() - channel_lock moved to bundle in next, while state lock needs _bh() from net. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-08 15:44:50 -07:00
Henrik Bjoernlund	b6c02ef549	bridge: Netlink interface fix. This commit is correcting NETLINK br_fill_ifinfo() to be able to handle 'filter_mask' with multiple flags asserted. Fixes: `36a8e8e265` ("bridge: Extend br_fill_ifinfo to return MPR status") Signed-off-by: Henrik Bjoernlund <henrik.bjoernlund@microchip.com> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Suggested-by: Nikolay Aleksandrov <nikolay@nvidia.com> Tested-by: Horatiu Vultur <horatiu.vultur@microchip.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-10-08 12:05:07 -07:00
David S. Miller	8b0308fe31	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Rejecting non-native endian BTF overlapped with the addition of support for it. The rest were more simple overlapping changes, except the renesas ravb binding update, which had to follow a file move as well as a YAML conversion. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-10-05 18:40:01 -07:00
Taehee Yoo	eff7423365	net: core: introduce struct netdev_nested_priv for nested interface infrastructure Functions related to nested interface infrastructure such as netdev_walk_all_{ upper \| lower }_dev() pass both private functions and "data" pointer to handle their own things. At this point, the data pointer type is void *. In order to make it easier to expand common variables and functions, this new netdev_nested_priv structure is added. In the following patch, a new member variable will be added into this struct to fix the lockdep issue. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 15:00:15 -07:00
Nikolay Aleksandrov	f2f3729fb6	net: bridge: fdb: don't flush ext_learn entries When a user-space software manages fdb entries externally it should set the ext_learn flag which marks the fdb entry as externally managed and avoids expiring it (they're treated as static fdbs). Unfortunately on events where fdb entries are flushed (STP down, netlink fdb flush etc) these fdbs are also deleted automatically by the bridge. That in turn causes trouble for the managing user-space software (e.g. in MLAG setups we lose remote fdb entries on port flaps). These entries are completely externally managed so we should avoid automatically deleting them, the only exception are offloaded entries (i.e. BR_FDB_ADDED_BY_EXT_LEARN + BR_FDB_OFFLOADED). They are flushed as before. Fixes: `eb100e0e24` ("net: bridge: allow to add externally learned entries from user-space") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-28 12:47:43 -07:00
Nikolay Aleksandrov	7470558240	net: bridge: mcast: remove only S,G port groups from sg_port hash We should remove a group from the sg_port hash only if it's an S,G entry. This makes it correct and more symmetric with group add. Also since *,G groups are not added to that hash we can hide a bug. Fixes: `085b53c8be` ("net: bridge: mcast: add sg_port rhashtable") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-25 16:50:19 -07:00
Nikolay Aleksandrov	36cfec7359	net: bridge: mcast: when forwarding handle filter mode and blocked flag We need to avoid forwarding to ports in MCAST_INCLUDE filter mode when the mdst entry is a *,G or when the port has the blocked flag. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:35 -07:00
Nikolay Aleksandrov	094b82fd53	net: bridge: mcast: handle host state Since host joins are considered as EXCLUDE {} joins we need to reflect that in all of *,G ports' S,G entries. Since the S,Gs can have host_joined == true only set automatically we can safely set it to false when removing all automatically added entries upon S,G delete. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	9116ffbf1d	net: bridge: mcast: add support for blocked port groups When excluding S,G entries we need a way to block a particular S,G,port. The new port group flag is managed based on the source's timer as per RFCs 3376 and 3810. When a source expires and its port group is in EXCLUDE mode, it will be blocked. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	8266a0491e	net: bridge: mcast: handle port group filter modes We need to handle group filter mode transitions and initial state. To change a port group's INCLUDE -> EXCLUDE mode (or when we have added a new port group in EXCLUDE mode) we need to add that port to all of ,G ports' S,G entries for proper replication. When the EXCLUDE state is changed from IGMPv3 report, br_multicast_fwd_filter_exclude() must be called after the source list processing because the assumption is that all of the group's S,G entries will be created before transitioning to EXCLUDE mode, i.e. most importantly its blocked entries will already be added so it will not get automatically added to them. The transition EXCLUDE -> INCLUDE happens only when a port group timer expires, it requires us to remove that port from all of ,G ports' S,G entries where it was automatically added previously. Finally when we are adding a new S,G entry we must add all of ,G's EXCLUDE ports to it. In order to distinguish automatically added ,G EXCLUDE ports we have a new port group flag - MDB_PG_FLAGS_STAR_EXCL. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	b08123684b	net: bridge: mcast: install S,G entries automatically based on reports This patch adds support for automatic install of S,G mdb entries based on the port group's source list and the source entry's timer. Once installed the S,G will be used when forwarding packets if the approprate multicast/mld versions are set. A new source flag called BR_SGRP_F_INSTALLED denotes if the source has a forwarding mdb entry installed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	085b53c8be	net: bridge: mcast: add sg_port rhashtable To speedup S,G forward handling we need to be able to quickly find out if a port is a member of an S,G group. To do that add a global S,G port rhashtable with key: source addr, group addr, protocol, vid (all br_ip fields) and port pointer. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	8f8cb77e0b	net: bridge: mcast: add rt_protocol field to the port group struct We need to be able to differentiate between pg entries created by user-space and the kernel when we start generating S,G entries for IGMPv3/MLDv2's fast path. User-space entries are created by default as RTPROT_STATIC and the kernel entries are RTPROT_KERNEL. Later we can allow user-space to provide the entry rt_protocol so we can differentiate between who added the entries specifically (e.g. clag, admin, frr etc). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	7d07a68c25	net: bridge: mcast: when igmpv3/mldv2 are enabled lookup (S,G) first, then (,G) If (S,G) entries are enabled (igmpv3/mldv2) then look them up first. If there isn't a present (S,G) entry then try to find (,G). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	88d4bd1804	net: bridge: mdb: add support for add/del/dump of entries with source Add new mdb attributes (MDBE_ATTR_SOURCE for setting, MDBA_MDB_EATTR_SOURCE for dumping) to allow add/del and dump of mdb entries with a source address (S,G). New S,G entries are created with filter mode of MCAST_INCLUDE. The same attributes are used for IPv4 and IPv6, they're validated and parsed based on their protocol. S,G host joined entries which are added by user are not allowed yet. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	9c4258c78a	net: bridge: mdb: add support to extend add/del commands Since the MDB add/del code expects an exact struct br_mdb_entry we can't really add any extensions, thus add a new nested attribute at the level of MDBA_SET_ENTRY called MDBA_SET_ENTRY_ATTRS which will be used to pass all new options via netlink attributes. This patch doesn't change anything functionally since the new attribute is not used yet, only parsed. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	eab3227b12	net: bridge: mcast: rename br_ip's u member to dst Since now we have src in br_ip, u no longer makes sense so rename it to dst. No functional changes. v2: fix build with CONFIG_BATMAN_ADV_MCAST CC: Marek Lindner <mareklindner@neomailbox.ch> CC: Simon Wunderlich <sw@simonwunderlich.de> CC: Antonio Quartulli <a@unstable.cc> CC: Sven Eckelmann <sven@narfation.org> CC: b.a.t.m.a.n@lists.open-mesh.org Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	deb965662d	net: bridge: mcast: use br_ip's src for src groups and querier address Now that we have src and dst in br_ip it is logical to use the src field for the cases where we need to work with a source address such as querier source address and group source address. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	83f7398ea5	net: bridge: mdb: use extack in br_mdb_add() and br_mdb_add_group() Pass and use extack all the way down to br_mdb_add_group(). Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	7eea629d07	net: bridge: mdb: move all port and bridge checks to br_mdb_add To avoid doing duplicate device checks and searches (the same were done in br_mdb_add and __br_mdb_add) pass the already found port to __br_mdb_add and pull the bridge's netif_running and enabled multicast checks to br_mdb_add. This would also simplify the future extack errors. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
Nikolay Aleksandrov	2ac95dfe25	net: bridge: mdb: use extack in br_mdb_parse() We can drop the pr_info() calls and just use extack to return a meaningful error to user-space when br_mdb_parse() fails. Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-23 13:24:34 -07:00
David S. Miller	3ab0a7a0c3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Two minor conflicts: 1) net/ipv4/route.c, adding a new local variable while moving another local variable and removing it's initial assignment. 2) drivers/net/dsa/microchip/ksz9477.c, overlapping changes. One pretty prints the port mode differently, whilst another changes the driver to try and obtain the port mode from the port node rather than the switch node. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-22 16:45:34 -07:00
Vladimir Oltean	99f62a7460	net: bridge: br_vlan_get_pvid_rcu() should dereference the VLAN group under RCU When calling the RCU brother of br_vlan_get_pvid(), lockdep warns: ============================= WARNING: suspicious RCU usage 5.9.0-rc3-01631-g13c17acb8e38-dirty #814 Not tainted ----------------------------- net/bridge/br_private.h:1054 suspicious rcu_dereference_protected() usage! Call trace: lockdep_rcu_suspicious+0xd4/0xf8 __br_vlan_get_pvid+0xc0/0x100 br_vlan_get_pvid_rcu+0x78/0x108 The warning is because br_vlan_get_pvid_rcu() calls nbp_vlan_group() which calls rtnl_dereference() instead of rcu_dereference(). In turn, rtnl_dereference() calls rcu_dereference_protected() which assumes operation under an RCU write-side critical section, which obviously is not the case here. So, when the incorrect primitive is used to access the RCU-protected VLAN group pointer, READ_ONCE() is not used, which may cause various unexpected problems. I'm sad to say that br_vlan_get_pvid() and br_vlan_get_pvid_rcu() cannot share the same implementation. So fix the bug by splitting the 2 functions, and making br_vlan_get_pvid_rcu() retrieve the VLAN groups under proper locking annotations. Fixes: `7582f5b70f` ("bridge: add br_vlan_get_pvid_rcu()") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-21 17:37:44 -07:00
Randy Dunlap	4bbd026cb9	net: bridge: delete duplicated words Drop repeated words in net/bridge/. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: Nikolay Aleksandrov <nikolay@nvidia.com> Cc: bridge@lists.linux-foundation.org Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-18 14:12:43 -07:00
Nikolay Aleksandrov	d5bf31ddd8	net: bridge: mcast: don't ignore return value of __grp_src_toex_excl When we're handling TO_EXCLUDE report in EXCLUDE filter mode we should not ignore the return value of __grp_src_toex_excl() as we'll miss sending notifications about group changes. Fixes: `5bf1e00b68` ("net: bridge: mcast: support for IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-16 17:13:25 -07:00
Alexandra Winter	d05e8e68b0	bridge: Add SWITCHDEV_FDB_FLUSH_TO_BRIDGE notifier so the switchdev can notifiy the bridge to flush non-permanent fdb entries for this port. This is useful whenever the hardware fdb of the switchdev is reset, but the netdev and the bridgeport are not deleted. Note that this has the same effect as the IFLA_BRPORT_FLUSH attribute. CC: Jiri Pirko <jiri@resnulli.us> CC: Ivan Vecera <ivecera@redhat.com> CC: Roopa Prabhu <roopa@nvidia.com> CC: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-15 13:21:47 -07:00
Ido Schimmel	12913f7459	bridge: mcast: Fix incomplete MDB dump Each MDB entry is encoded in a nested netlink attribute called 'MDBA_MDB_ENTRY'. In turn, this attribute contains another nested attributed called 'MDBA_MDB_ENTRY_INFO', which encodes a single port group entry within the MDB entry. The cited commit added the ability to restart a dump from a specific port group entry. However, on failure to add a port group entry to the dump the entire MDB entry (stored in 'nest2') is removed, resulting in missing port group entries. Fix this by finalizing the MDB entry with the partial list of already encoded port group entries. Fixes: `5205e919c9` ("net: bridge: mcast: add support for src list and filter mode dumping") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-11 14:49:47 -07:00
David S. Miller	d85427e3c8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Rewrite inner header IPv6 in ICMPv6 messages in ip6t_NPT, from Michael Zhou. 2) do_ip_vs_set_ctl() dereferences uninitialized value, from Peilin Ye. 3) Support for userdata in tables, from Jose M. Guisado. 4) Do not increment ct error and invalid stats at the same time, from Florian Westphal. 5) Remove ct ignore stats, also from Florian. 6) Add ct stats for clash resolution, from Florian Westphal. 7) Bump reference counter bump on ct clash resolution only, this is safe because bucket lock is held, again from Florian. 8) Use ip_is_fragment() in xt_HMARK, from YueHaibing. 9) Add wildcard support for nft_socket, from Balazs Scheidler. 10) Remove superfluous IPVS dependency on iptables, from Yaroslav Bolyukin. 11) Remove unused definition in ebt_stp, from Wang Hai. 12) Replace CONFIG_NFT_CHAIN_NAT_{IPV4,IPV6} by CONFIG_NFT_NAT in selftests/net, from Fabian Frederick. 13) Add userdata support for nft_object, from Jose M. Guisado. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-09 11:21:19 -07:00
Nikolay Aleksandrov	071445c605	net: bridge: mcast: fix unused br var when lockdep isn't defined Stephen reported the following warning: net/bridge/br_multicast.c: In function 'br_multicast_find_port': net/bridge/br_multicast.c:1818:21: warning: unused variable 'br' [-Wunused-variable] 1818 \| struct net_bridge *br = mp->br; \| ^~ It happens due to bridge's mlock_dereference() when lockdep isn't defined. Silence the warning by annotating the variable as __maybe_unused. Fixes: `0436862e41` ("net: bridge: mcast: support for IGMPv3/MLDv2 ALLOW_NEW_SOURCES report") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-09-08 20:11:57 -07:00
Wang Hai	36c3be8a2c	netfilter: ebt_stp: Remove unused macro BPDU_TYPE_TCN BPDU_TYPE_TCN is never used after it was introduced. So better to remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2020-09-08 12:56:38 +02:00
Nikolay Aleksandrov	e12cec65b5	net: bridge: mcast: destroy all entries via gc Since each entry type has timers that can be running simultaneously we need to make sure that entries are not freed before their timers have finished. In order to do that generalize the src gc work to mcast gc work and use a callback to free the entries (mdb, port group or src). v3: add IPv6 support v2: force mcast gc on port del to make sure all port group timers have finished before freeing the bridge port Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:36 -07:00
Nikolay Aleksandrov	23550b8313	net: bridge: mcast: improve IGMPv3/MLDv2 query processing When an IGMPv3/MLDv2 query is received and we're operating in such mode then we need to avoid updating group timers if the suppress flag is set. Also we should update only timers for groups in exclude mode. v3: add IPv6/MLDv2 support Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:36 -07:00
Nikolay Aleksandrov	109865fe12	net: bridge: mcast: support for IGMPV3/MLDv2 BLOCK_OLD_SOURCES report We already have all necessary helpers, so process IGMPV3/MLDv2 BLOCK_OLD_SOURCES as per the RFCs. v3: add IPv6/MLDv2 support v2: directly do flag bit operations Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:36 -07:00
Nikolay Aleksandrov	5bf1e00b68	net: bridge: mcast: support for IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report In order to process IGMPV3/MLDv2 CHANGE_TO_INCLUDE/EXCLUDE report types we need new helpers which allow us to mark entries based on their timer state and to query only marked entries. v3: add IPv6/MLDv2 support, fix other_query checks v2: directly do flag bit operations Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00
Nikolay Aleksandrov	e6231bca6a	net: bridge: mcast: support for IGMPV3/MLDv2 MODE_IS_INCLUDE/EXCLUDE report In order to process IGMPV3/MLDv2_MODE_IS_INCLUDE/EXCLUDE report types we need some new helpers which allow us to set/clear flags for all current entries and later delete marked entries after the report sources have been processed. v3: add IPv6/MLDv2 support v2: drop flag helpers and directly do flag bit operations Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2020-09-07 13:16:35 -07:00

1 2 3 4 5 ...

2288 Commits