android11-5.4
164 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Srinivasarao Pathipati
|
f01f08906a |
Merge android11-5.4.180+ (598165f ) into msm-5.4
* refs/heads/tmp-598165f:
Revert "arm: extend pfn_valid to take into account freed memory map alignment"
UPSTREAM: usb: gadget: clear related members when goto fail
UPSTREAM: usb: gadget: don't release an existing dev->buf
UPSTREAM: usb: gadget: Fix use-after-free bug by not setting udc->dev.driver
UPSTREAM: usb: gadget: rndis: prevent integer overflow in rndis_set_response()
UPSTREAM: fixup for "arm64 entry: Add macro for reading symbol address from the trampoline"
UPSTREAM: arm64: Use the clearbhb instruction in mitigations
UPSTREAM: KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
UPSTREAM: arm64: Mitigate spectre style branch history side channels
UPSTREAM: KVM: arm64: Add templates for BHB mitigation sequences
UPSTREAM: arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
UPSTREAM: arm64: Add percpu vectors for EL1
UPSTREAM: arm64: entry: Add macro for reading symbol addresses from the trampoline
UPSTREAM: arm64: entry: Add vectors that have the bhb mitigation sequences
UPSTREAM: arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
UPSTREAM: arm64: entry: Allow the trampoline text to occupy multiple pages
UPSTREAM: arm64: entry: Make the kpti trampoline's kpti sequence optional
UPSTREAM: arm64: entry: Move trampoline macros out of ifdef'd section
UPSTREAM: arm64: entry: Don't assume tramp_vectors is the start of the vectors
UPSTREAM: arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
UPSTREAM: arm64: entry: Move the trampoline data page before the text page
UPSTREAM: arm64: entry: Free up another register on kpti's tramp_exit path
UPSTREAM: arm64: entry: Make the trampoline cleanup optional
UPSTREAM: arm64: entry.S: Add ventry overflow sanity checks
UPSTREAM: arm64: Add Cortex-X2 CPU part definition
UPSTREAM: arm64: add ID_AA64ISAR2_EL1 sys register
UPSTREAM: arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
UPSTREAM: arm64: Add part number for Arm Cortex-A77
UPSTREAM: sctp: fix the processing for INIT chunk
ANDROID: dm-bow: Protect Ranges fetched and erased from the RB tree
UPSTREAM: ARM: fix Thumb2 regression with Spectre BHB
UPSTREAM: ARM: Spectre-BHB: provide empty stub for non-config
UPSTREAM: ARM: fix build warning in proc-v7-bugs.c
UPSTREAM: ARM: Do not use NOCROSSREFS directive with ld.lld
UPSTREAM: ARM: fix co-processor register typo
UPSTREAM: ARM: fix build error when BPF_SYSCALL is disabled
UPSTREAM: ARM: include unprivileged BPF status in Spectre V2 reporting
UPSTREAM: ARM: Spectre-BHB workaround
UPSTREAM: ARM: use LOADADDR() to get load address of sections
UPSTREAM: ARM: early traps initialisation
UPSTREAM: ARM: report Spectre v2 status through sysfs
UPSTREAM: arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
UPSTREAM: arm/arm64: Provide a wrapper for SMCCC 1.1 calls
UPSTREAM: x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
UPSTREAM: x86/speculation: Warn about Spectre v2 LFENCE mitigation
UPSTREAM: x86/speculation: Update link to AMD speculation whitepaper
UPSTREAM: x86/speculation: Use generic retpoline by default on AMD
UPSTREAM: x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
UPSTREAM: Documentation/hw-vuln: Update spectre doc
UPSTREAM: x86/speculation: Add eIBRS + Retpoline options
UPSTREAM: x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
UPSTREAM: x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
UPSTREAM: x86/speculation: Merge one test in spectre_v2_user_select_mitigation()
UPSTREAM: bpf: Add kconfig knob for disabling unpriv bpf by default
UPSTREAM: mmc: block: fix read single on recovery logic
Linux 5.4.180
ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE
perf: Fix list corruption in perf_cgroup_switch()
scsi: lpfc: Remove NVMe support if kernel has NVME_FC disabled
hwmon: (dell-smm) Speed up setting of fan speed
seccomp: Invalidate seccomp mode to catch death failures
USB: serial: cp210x: add CPI Bulk Coin Recycler id
USB: serial: cp210x: add NCR Retail IO box id
USB: serial: ch341: add support for GW Instek USB2.0-Serial devices
USB: serial: option: add ZTE MF286D modem
USB: serial: ftdi_sio: add support for Brainboxes US-159/235/320
usb: gadget: f_uac2: Define specific wTerminalType
usb: gadget: rndis: check size of RNDIS_MSG_SET command
USB: gadget: validate interface OS descriptor requests
usb: gadget: udc: renesas_usb3: Fix host to USB_ROLE_NONE transition
usb: dwc3: gadget: Prevent core from processing stale TRBs
usb: ulpi: Call of_node_put correctly
usb: ulpi: Move of_node_put to ulpi_dev_release
net: usb: ax88179_178a: Fix out-of-bounds accesses in RX fixup
eeprom: ee1004: limit i2c reads to I2C_SMBUS_BLOCK_MAX
n_tty: wake up poll(POLLRDNORM) on receiving data
vt_ioctl: add array_index_nospec to VT_ACTIVATE
vt_ioctl: fix array_index_nospec in vt_setactivate
net: amd-xgbe: disable interrupts during pci removal
tipc: rate limit warning for received illegal binding update
net: mdio: aspeed: Add missing MODULE_DEVICE_TABLE
veth: fix races around rq->rx_notify_masked
net: fix a memleak when uncloning an skb dst and its metadata
net: do not keep the dst cache when uncloning an skb dst and its metadata
nfp: flower: fix ida_idx not being released
ipmr,ip6mr: acquire RTNL before calling ip[6]mr_free_table() on failure path
bonding: pair enable_port with slave_arr_updates
ixgbevf: Require large buffers for build_skb on 82599VF
misc: fastrpc: avoid double fput() on failed usercopy
usb: f_fs: Fix use-after-free for epfile
ARM: dts: imx6qdl-udoo: Properly describe the SD card detect
staging: fbtft: Fix error path in fbtft_driver_module_init()
ARM: dts: meson: Fix the UART compatible strings
perf probe: Fix ppc64 'perf probe add events failed' case
net: bridge: fix stale eth hdr pointer in br_dev_xmit
PM: s2idle: ACPI: Fix wakeup interrupts handling
ACPI/IORT: Check node revision for PMCG resources
nvme-tcp: fix bogus request completion when failing to send AER
ARM: socfpga: fix missing RESET_CONTROLLER
ARM: dts: imx23-evk: Remove MX23_PAD_SSP1_DETECT from hog group
riscv: fix build with binutils 2.38
bpf: Add kconfig knob for disabling unpriv bpf by default
KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
net: stmmac: dwmac-sun8i: use return val of readl_poll_timeout()
usb: dwc2: gadget: don't try to disable ep0 in dwc2_hsotg_suspend
PM: hibernate: Remove register_nosave_region_late()
scsi: myrs: Fix crash in error case
scsi: qedf: Fix refcount issue when LOGO is received during TMF
scsi: target: iscsi: Make sure the np under each tpg is unique
net: sched: Clarify error message when qdisc kind is unknown
drm: panel-orientation-quirks: Add quirk for the 1Netbook OneXPlayer
NFSv4 expose nfs_parse_server_name function
NFSv4 remove zero number of fs_locations entries error check
NFSv4.1: Fix uninitialised variable in devicenotify
nfs: nfs4clinet: check the return value of kstrdup()
NFSv4 only print the label when its queried
nvme: Fix parsing of ANA log page
NFSD: Fix offset type in I/O trace points
NFSD: Clamp WRITE offsets
NFS: Fix initialisation of nfs_client cl_flags field
net: phy: marvell: Fix MDI-x polarity setting in 88e1118-compatible PHYs
net: phy: marvell: Fix RGMII Tx/Rx delays setting in 88e1121-compatible PHYs
mmc: sdhci-of-esdhc: Check for error num after setting mask
ima: Do not print policy rule with inactive LSM labels
ima: Allow template selection with ima_template[_fmt]= after ima_hash=
ima: Remove ima_policy file before directory
integrity: check the return value of audit_log_start()
Linux 5.4.179
tipc: improve size validations for received domain records
moxart: fix potential use-after-free on remove path
Linux 5.4.178
cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
ext4: fix error handling in ext4_restore_inline_data()
EDAC/xgene: Fix deferred probing
EDAC/altera: Fix deferred probing
rtc: cmos: Evaluate century appropriate
selftests: futex: Use variable MAKE instead of make
nfsd: nfsd4_setclientid_confirm mistakenly expires confirmed client.
scsi: bnx2fc: Make bnx2fc_recv_frame() mp safe
pinctrl: bcm2835: Fix a few error paths
ASoC: max9759: fix underflow in speaker_gain_control_put()
ASoC: cpcap: Check for NULL pointer after calling of_get_child_by_name
ASoC: xilinx: xlnx_formatter_pcm: Make buffer bytes multiple of period bytes
ASoC: fsl: Add missing error handling in pcm030_fabric_probe
drm/i915/overlay: Prevent divide by zero bugs in scaling
net: stmmac: ensure PTP time register reads are consistent
net: stmmac: dump gmac4 DMA registers correctly
net: macsec: Verify that send_sci is on when setting Tx sci explicitly
net: ieee802154: Return meaningful error codes from the netlink helpers
net: ieee802154: ca8210: Stop leaking skb's
net: ieee802154: mcr20a: Fix lifs/sifs periods
net: ieee802154: hwsim: Ensure proper channel selection at probe time
spi: meson-spicc: add IRQ check in meson_spicc_probe
spi: mediatek: Avoid NULL pointer crash in interrupt
spi: bcm-qspi: check for valid cs before applying chip select
iommu/amd: Fix loop timeout issue in iommu_ga_log_enable()
iommu/vt-d: Fix potential memory leak in intel_setup_irq_remapping()
RDMA/mlx4: Don't continue event handler after memory allocation failure
RDMA/siw: Fix broken RDMA Read Fence/Resume logic.
IB/rdmavt: Validate remote_addr during loopback atomic tests
memcg: charge fs_context and legacy_fs_context
Revert "ASoC: mediatek: Check for error clk pointer"
block: bio-integrity: Advance seed correctly for larger interval sizes
mm/kmemleak: avoid scanning potential huge holes
drm/nouveau: fix off by one in BIOS boundary checking
btrfs: fix deadlock between quota disable and qgroup rescan worker
ALSA: hda/realtek: Fix silent output on Gigabyte X570 Aorus Xtreme after reboot from Windows
ALSA: hda/realtek: Fix silent output on Gigabyte X570S Aorus Master (newer chipset)
ALSA: hda/realtek: Add missing fixup-model entry for Gigabyte X570 ALC1220 quirks
ALSA: hda/realtek: Add quirk for ASUS GU603
ALSA: usb-audio: Simplify quirk entries with a macro
ASoC: ops: Reject out of bounds values in snd_soc_put_xr_sx()
ASoC: ops: Reject out of bounds values in snd_soc_put_volsw_sx()
ASoC: ops: Reject out of bounds values in snd_soc_put_volsw()
audit: improve audit queue handling when "audit=1" on cmdline
Revert "net: fix information leakage in /proc/net/ptype"
Linux 5.4.177
af_packet: fix data-race in packet_setsockopt / packet_setsockopt
cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink()
net: sched: fix use-after-free in tc_new_tfilter()
net: amd-xgbe: Fix skb data length underflow
net: amd-xgbe: ensure to reset the tx_timer_active flag
ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
cgroup-v1: Require capabilities to set release_agent
psi: Fix uaf issue when psi trigger is destroyed while being polled
PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
Linux 5.4.176
mtd: rawnand: mpc5121: Remove unused variable in ads5121_select_chip()
block: Fix wrong offset in bio_truncate()
fsnotify: invalidate dcache before IN_DELETE event
dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
ipv4: remove sparse error in ip_neigh_gw4()
ipv4: tcp: send zero IPID in SYNACK messages
ipv4: raw: lock the socket in raw_bind()
net: hns3: handle empty unknown interrupt for VF
yam: fix a memory leak in yam_siocdevprivate()
drm/msm/hdmi: Fix missing put_device() call in msm_hdmi_get_phy
ibmvnic: don't spin in tasklet
ibmvnic: init ->running_cap_crqs early
hwmon: (lm90) Mark alert as broken for MAX6654
rxrpc: Adjust retransmission backoff
phylib: fix potential use-after-free
net: phy: broadcom: hook up soft_reset for BCM54616S
netfilter: conntrack: don't increment invalid counter on NF_REPEAT
NFS: Ensure the server has an up to date ctime before renaming
NFS: Ensure the server has an up to date ctime before hardlinking
ipv6: annotate accesses to fn->fn_sernum
drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable
drm/msm/dsi: Fix missing put_device() call in dsi_get_phy
drm/msm: Fix wrong size calculation
net-procfs: show net devices bound packet types
NFSv4: nfs_atomic_open() can race when looking up a non-regular file
NFSv4: Handle case where the lookup of a directory fails
hwmon: (lm90) Reduce maximum conversion rate for G781
ipv4: avoid using shared IP generator for connected sockets
ping: fix the sk_bound_dev_if match in ping_lookup
hwmon: (lm90) Mark alert as broken for MAX6680
hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649
net: fix information leakage in /proc/net/ptype
ipv6_tunnel: Rate limit warning messages
scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
rpmsg: char: Fix race between the release of rpmsg_eptdev and cdev
rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev
i40e: fix unsigned stat widths
i40e: Fix queues reservation for XDP
i40e: Fix issue when maximum queues is exceeded
i40e: Increase delay to 1 s after global EMP reset
powerpc/32: Fix boot failure with GCC latent entropy plugin
net: sfp: ignore disabled SFP node
ucsi_ccg: Check DEV_INT bit only when starting CCG4
usb: typec: tcpm: Do not disconnect while receiving VBUS off
USB: core: Fix hang in usb_kill_urb by adding memory barriers
usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS
usb: common: ulpi: Fix crash in ulpi_match()
usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
tty: Add support for Brainboxes UC cards.
tty: n_gsm: fix SW flow control encoding/handling
serial: stm32: fix software flow control transfer
serial: 8250: of: Fix mapped region size when using reg-offset property
netfilter: nft_payload: do not update layer 4 checksum when mangling fragments
arm64: errata: Fix exec handling in erratum
|
||
Nikolay Aleksandrov
|
b4a59eafcb |
net: bridge: fix stale eth hdr pointer in br_dev_xmit
commit 823d81b0fa2cd83a640734e74caee338b5d3c093 upstream.
In br_dev_xmit() we perform vlan filtering in br_allowed_ingress() but
if the packet has the vlan header inside (e.g. bridge with disabled
tx-vlan-offload) then the vlan filtering code will use skb_vlan_untag()
to extract the vid before filtering which in turn calls pskb_may_pull()
and we may end up with a stale eth pointer. Moreover the cached eth header
pointer will generally be wrong after that operation. Remove the eth header
caching and just use eth_hdr() directly, the compiler does the right thing
and calculates it only once so we don't lose anything.
Fixes:
|
||
Pooventhiran G
|
af60234af1 |
bridge: Port Hy-Fi bridging hooks
Changes to port Hy-Fi bridging hooks and to export br_fdb_find_rcu symbol. Change-Id: Ife2a9ac76e02fff56b8c4acaa97794df7e98863d Signed-off-by: Muthuchamy Kumar <quic_muthucha@quicinc.com> Signed-off-by: Pooventhiran G <quic_pooventh@quicinc.com> |
||
Joseph Huang
|
d8d39e1366 |
bridge: Fix a deadlock when enabling multicast snooping
[ Upstream commit 851d0a73c90e6c8c63fef106c6c1e73df7e05d9d ]
When enabling multicast snooping, bridge module deadlocks on multicast_lock
if 1) IPv6 is enabled, and 2) there is an existing querier on the same L2
network.
The deadlock was caused by the following sequence: While holding the lock,
br_multicast_open calls br_multicast_join_snoopers, which eventually causes
IP stack to (attempt to) send out a Listener Report (in igmp6_join_group).
Since the destination Ethernet address is a multicast address, br_dev_xmit
feeds the packet back to the bridge via br_multicast_rcv, which in turn
calls br_multicast_add_group, which then deadlocks on multicast_lock.
The fix is to move the call br_multicast_join_snoopers outside of the
critical section. This works since br_multicast_join_snoopers only deals
with IP and does not modify any multicast data structures of the bridge,
so there's no need to hold the lock.
Steps to reproduce:
1. sysctl net.ipv6.conf.all.force_mld_version=1
2. have another querier
3. ip link set dev bridge type bridge mcast_snooping 0 && \
ip link set dev bridge type bridge mcast_snooping 1 < deadlock >
A typical call trace looks like the following:
[ 936.251495] _raw_spin_lock+0x5c/0x68
[ 936.255221] br_multicast_add_group+0x40/0x170 [bridge]
[ 936.260491] br_multicast_rcv+0x7ac/0xe30 [bridge]
[ 936.265322] br_dev_xmit+0x140/0x368 [bridge]
[ 936.269689] dev_hard_start_xmit+0x94/0x158
[ 936.273876] __dev_queue_xmit+0x5ac/0x7f8
[ 936.277890] dev_queue_xmit+0x10/0x18
[ 936.281563] neigh_resolve_output+0xec/0x198
[ 936.285845] ip6_finish_output2+0x240/0x710
[ 936.290039] __ip6_finish_output+0x130/0x170
[ 936.294318] ip6_output+0x6c/0x1c8
[ 936.297731] NF_HOOK.constprop.0+0xd8/0xe8
[ 936.301834] igmp6_send+0x358/0x558
[ 936.305326] igmp6_join_group.part.0+0x30/0xf0
[ 936.309774] igmp6_group_added+0xfc/0x110
[ 936.313787] __ipv6_dev_mc_inc+0x1a4/0x290
[ 936.317885] ipv6_dev_mc_inc+0x10/0x18
[ 936.321677] br_multicast_open+0xbc/0x110 [bridge]
[ 936.326506] br_multicast_toggle+0xec/0x140 [bridge]
Fixes:
|
||
Heiner Kallweit
|
805dfdb26e |
net: bridge: add missing counters to ndo_get_stats64 callback
[ Upstream commit 7a30ecc9237681bb125cbd30eee92bef7e86293d ]
In br_forward.c and br_input.c fields dev->stats.tx_dropped and
dev->stats.multicast are populated, but they are ignored in
ndo_get_stats64.
Fixes:
|
||
Nikolay Aleksandrov
|
c1780f088f |
net: bridge: deny dev_set_mac_address() when unregistering
[ Upstream commit c4b4c421857dc7b1cf0dccbd738472360ff2cd70 ]
We have an interesting memory leak in the bridge when it is being
unregistered and is a slave to a master device which would change the
mac of its slaves on unregister (e.g. bond, team). This is a very
unusual setup but we do end up leaking 1 fdb entry because
dev_set_mac_address() would cause the bridge to insert the new mac address
into its table after all fdbs are flushed, i.e. after dellink() on the
bridge has finished and we call NETDEV_UNREGISTER the bond/team would
release it and will call dev_set_mac_address() to restore its original
address and that in turn will add an fdb in the bridge.
One fix is to check for the bridge dev's reg_state in its
ndo_set_mac_address callback and return an error if the bridge is not in
NETREG_REGISTERED.
Easy steps to reproduce:
1. add bond in mode != A/B
2. add any slave to the bond
3. add bridge dev as a slave to the bond
4. destroy the bridge device
Trace:
unreferenced object 0xffff888035c4d080 (size 128):
comm "ip", pid 4068, jiffies 4296209429 (age 1413.753s)
hex dump (first 32 bytes):
41 1d c9 36 80 88 ff ff 00 00 00 00 00 00 00 00 A..6............
d2 19 c9 5e 3f d7 00 00 00 00 00 00 00 00 00 00 ...^?...........
backtrace:
[<00000000ddb525dc>] kmem_cache_alloc+0x155/0x26f
[<00000000633ff1e0>] fdb_create+0x21/0x486 [bridge]
[<0000000092b17e9c>] fdb_insert+0x91/0xdc [bridge]
[<00000000f2a0f0ff>] br_fdb_change_mac_address+0xb3/0x175 [bridge]
[<000000001de02dbd>] br_stp_change_bridge_id+0xf/0xff [bridge]
[<00000000ac0e32b1>] br_set_mac_address+0x76/0x99 [bridge]
[<000000006846a77f>] dev_set_mac_address+0x63/0x9b
[<00000000d30738fc>] __bond_release_one+0x3f6/0x455 [bonding]
[<00000000fc7ec01d>] bond_netdev_event+0x2f2/0x400 [bonding]
[<00000000305d7795>] notifier_call_chain+0x38/0x56
[<0000000028885d4a>] call_netdevice_notifiers+0x1e/0x23
[<000000008279477b>] rollback_registered_many+0x353/0x6a4
[<0000000018ef753a>] unregister_netdevice_many+0x17/0x6f
[<00000000ba854b7a>] rtnl_delete_link+0x3c/0x43
[<00000000adf8618d>] rtnl_dellink+0x1dc/0x20a
[<000000009b6395fd>] rtnetlink_rcv_msg+0x23d/0x268
Fixes:
|
||
Taehee Yoo
|
ab92d68fc2 |
net: core: add generic lockdep keys
Some interface types could be nested. (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..) These interface types should set lockdep class because, without lockdep class key, lockdep always warn about unexisting circular locking. In the current code, these interfaces have their own lockdep class keys and these manage itself. So that there are so many duplicate code around the /driver/net and /net/. This patch adds new generic lockdep keys and some helper functions for it. This patch does below changes. a) Add lockdep class keys in struct net_device - qdisc_running, xmit, addr_list, qdisc_busylock - these keys are used as dynamic lockdep key. b) When net_device is being allocated, lockdep keys are registered. - alloc_netdev_mqs() c) When net_device is being free'd llockdep keys are unregistered. - free_netdev() d) Add generic lockdep key helper function - netdev_register_lockdep_key() - netdev_unregister_lockdep_key() - netdev_update_lockdep_key() e) Remove unnecessary generic lockdep macro and functions f) Remove unnecessary lockdep code of each interfaces. After this patch, each interface modules don't need to maintain their lockdep keys. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
David S. Miller
|
a6cdeeb16b |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Some ISDN files that got removed in net-next had some changes done in mainline, take the removals. Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Pablo Neira Ayuso
|
3c171f496e |
netfilter: bridge: add connection tracking system
This patch adds basic connection tracking support for the bridge, including initial IPv4 support. This patch register two hooks to deal with the bridge forwarding path, one from the bridge prerouting hook to call nf_conntrack_in(); and another from the bridge postrouting hook to confirm the entry. The conntrack bridge prerouting hook defragments packets before passing them to nf_conntrack_in() to look up for an existing entry, otherwise a new entry is allocated and it is attached to the skbuff. The conntrack bridge postrouting hook confirms new conntrack entries, ie. if this is the first packet seen, then it adds the entry to the hashtable and (if needed) it refragments the skbuff into the original fragments, leaving the geometry as is if possible. Exceptions are linearized skbuffs, eg. skbuffs that are passed up to nfqueue and conntrack helpers, as well as cloned skbuff for the local delivery (eg. tcpdump), also in case of bridge port flooding (cloned skbuff too). The packet defragmentation is done through the ip_defrag() call. This forces us to save the bridge control buffer, reset the IP control buffer area and then restore it after call. This function also bumps the IP fragmentation statistics, it would be probably desiderable to have independent statistics for the bridge defragmentation/refragmentation. The maximum fragment length is stored in the control buffer and it is used to refragment the skbuff from the postrouting path. The new fraglist splitter and fragment transformer APIs are used to implement the bridge refragmentation code. The br_ip_fragment() function drops the packet in case the maximum fragment size seen is larger than the output port MTU. This patchset follows the principle that conntrack should not drop packets, so users can do it through policy via invalid state matching. Like br_netfilter, there is no refragmentation for packets that are passed up for local delivery, ie. prerouting -> input path. There are calls to nf_reset() already in several spots in the stack since time ago already, eg. af_packet, that show that skbuff fraglist handling from the netif_rx path is supported already. The helpers are called from the postrouting hook, before confirmation, from there we may see packet floods to bridge ports. Then, although unlikely, this may result in exercising the helpers many times for each clone. It would be good to explore how to pass all the packets in a list to the conntrack hook to do this handle only once for this case. Thanks to Florian Westphal for handing me over an initial patchset version to add support for conntrack bridge. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Thomas Gleixner
|
2874c5fd28 |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Roopa Prabhu
|
4767456212 |
bridge: support for ndo_fdb_get
This patch implements ndo_fdb_get for the bridge fdb. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Nikolay Aleksandrov
|
19e3a9c90c |
net: bridge: convert multicast to generic rhashtable
The bridge multicast code currently uses a custom resizable hashtable which predates the generic rhashtable interface. It has many shortcomings compared and duplicates functionality that is presently available via the generic rhashtable, so this patch removes the custom rhashtable implementation in favor of the kernel's generic rhashtable. The hash maximum is kept and the rhashtable's size is used to do a loose check if it's reached in which case we revert to the old behaviour and disable further bridge multicast processing. Also now we can support any hash maximum, doesn't need to be a power of 2. v3: add non-rcu br_mdb_get variant and use it where multicast_lock is held to avoid RCU splat, drop hash_max function and just set it directly v2: handle when IGMP snooping is undefined, add br_mdb_init/uninit placeholders Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Debabrata Banerjee
|
c9fbd71f73 |
netpoll: allow cleanup to be synchronous
This fixes a problem introduced by:
commit
|
||
Nikolay Aleksandrov
|
3341d91702 |
net: bridge: convert mtu_set_by_user to a bit
Convert the last remaining bool option to a bit thus reducing the overall net_bridge size further by 8 bytes. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Nikolay Aleksandrov
|
c69c2cd444 |
net: bridge: convert neigh_suppress_enabled option to a bit
Convert the neigh_suppress_enabled option to a bit. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Nikolay Aleksandrov
|
804b854d37 |
net: bridge: disable bridge MTU auto tuning if it was set manually
As Roopa noted today the biggest source of problems when configuring bridge and ports is that the bridge MTU keeps changing automatically on port events (add/del/changemtu). That leads to inconsistent behaviour and network config software needs to chase the MTU and fix it on each such event. Let's improve on that situation and allow for the user to set any MTU within ETH_MIN/MAX limits, but once manually configured it is the user's responsibility to keep it correct afterwards. In case the MTU isn't manually set - the behaviour reverts to the previous and the bridge follows the minimum MTU. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Nikolay Aleksandrov
|
f40aa23339 |
net: bridge: set min MTU on port events and allow user to set max
Recently the bridge was changed to automatically set maximum MTU on port events (add/del/changemtu) when vlan filtering is enabled, but that actually changes behaviour in a way which breaks some setups and can lead to packet drops. In order to still allow that maximum to be set while being compatible, we add the ability for the user to tune the bridge MTU up to the maximum when vlan filtering is enabled, but that has to be done explicitly and all port events (add/del/changemtu) lead to resetting that MTU to the minimum as before. Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Chas Williams
|
419d14af9e |
bridge: Allow max MTU when multiple VLANs present
If the bridge is allowing multiple VLANs, some VLANs may have different MTUs. Instead of choosing the minimum MTU for the bridge interface, choose the maximum MTU of the bridge members. With this the user only needs to set a larger MTU on the member ports that are participating in the large MTU VLANS. Signed-off-by: Chas Williams <3chas3@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Nikolay Aleksandrov
|
eb7935830d |
net: bridge: use rhashtable for fdbs
Before this patch the bridge used a fixed 256 element hash table which was fine for small use cases (in my tests it starts to degrade above 1000 entries), but it wasn't enough for medium or large scale deployments. Modern setups have thousands of participants in a single bridge, even only enabling vlans and adding a few thousand vlan entries will cause a few thousand fdbs to be automatically inserted per participating port. So we need to scale the fdb table considerably to cope with modern workloads, and this patch converts it to use a rhashtable for its operations thus improving the bridge scalability. Tests show the following results (10 runs each), at up to 1000 entries rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it is 2 times faster and at 30000 it is 50 times faster. Obviously this happens because of the properties of the two constructs and is expected, rhashtable keeps pretty much a constant time even with 10000000 entries (tested), while the fixed hash table struggles considerably even above 10000. As a side effect this also reduces the net_bridge struct size from 3248 bytes to 1344 bytes. Also note that the key struct is 8 bytes. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Egil Hjelmeland
|
054287295b |
net: Define eth_stp_addr in linux/etherdevice.h
The lan9303 driver defines eth_stp_addr as a synonym to eth_reserved_addr_base to get the STP ethernet address 01:80:c2:00:00:00. eth_reserved_addr_base is also used to define the start of Bridge Reserved ethernet address range, which happen to be the STP address. br_dev_setup refer to eth_reserved_addr_base as a definition of STP address. Clean up by: - Move the eth_stp_addr definition to linux/etherdevice.h - Use eth_stp_addr instead of eth_reserved_addr_base in br_dev_setup. Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Roopa Prabhu
|
ed842faeb2 |
bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports
This patch avoids flooding and proxies ndisc packets for BR_NEIGH_SUPPRESS ports. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Roopa Prabhu
|
057658cb33 |
bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports
This patch avoids flooding and proxies arp packets for BR_NEIGH_SUPPRESS ports. Moves existing br_do_proxy_arp to br_do_proxy_suppress_arp to support both proxy arp and neigh suppress. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
David Ahern
|
ca752be006 |
net: bridge: Pass extack to down to netdev_master_upper_dev_link
Pass extack arg to br_add_if. Add messages for a couple of failures and pass arg to netdev_master_upper_dev_link. Signed-off-by: David Ahern <dsahern@gmail.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
David Ahern
|
33eaf2a6eb |
net: Add extack to ndo_add_slave
Pass extack to do_set_master and down to ndo_add_slave Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Ido Schimmel
|
f1c2eddf4c |
bridge: switchdev: Use an helper to clear forward mark
Instead of using ifdef in the C file. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Yotam Gigi <yotamg@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Ido Schimmel
|
79e99bdd60 |
bridge: switchdev: Clear forward mark when transmitting packet
Commit |
||
Nikolay Aleksandrov
|
31a4562d74 |
net: bridge: fix dest lookup when vlan proto doesn't match
With 802.1ad support the vlan_ingress code started checking for vlan
protocol mismatch which causes the current tag to be inserted and the
bridge vlan protocol & pvid to be set. The vlan tag insertion changes
the skb mac_header and thus the lookup mac dest pointer which was loaded
prior to calling br_allowed_ingress in br_handle_frame_finish is VLAN_HLEN
bytes off now, pointing to the last two bytes of the destination mac and
the first four of the source mac causing lookups to always fail and
broadcasting all such packets to all ports. Same thing happens for locally
originated packets when passing via br_dev_xmit. So load the dest pointer
after the vlan checks and possible skb change.
Fixes:
|
||
David S. Miller
|
cf124db566 |
net: Fix inconsistent teardown and release of private netdev state.
Network devices can allocate reasources and private memory using netdev_ops->ndo_init(). However, the release of these resources can occur in one of two different places. Either netdev_ops->ndo_uninit() or netdev->destructor(). The decision of which operation frees the resources depends upon whether it is necessary for all netdev refs to be released before it is safe to perform the freeing. netdev_ops->ndo_uninit() presumably can occur right after the NETDEV_UNREGISTER notifier completes and the unicast and multicast address lists are flushed. netdev->destructor(), on the other hand, does not run until the netdev references all go away. Further complicating the situation is that netdev->destructor() almost universally does also a free_netdev(). This creates a problem for the logic in register_netdevice(). Because all callers of register_netdevice() manage the freeing of the netdev, and invoke free_netdev(dev) if register_netdevice() fails. If netdev_ops->ndo_init() succeeds, but something else fails inside of register_netdevice(), it does call ndo_ops->ndo_uninit(). But it is not able to invoke netdev->destructor(). This is because netdev->destructor() will do a free_netdev() and then the caller of register_netdevice() will do the same. However, this means that the resources that would normally be released by netdev->destructor() will not be. Over the years drivers have added local hacks to deal with this, by invoking their destructor parts by hand when register_netdevice() fails. Many drivers do not try to deal with this, and instead we have leaks. Let's close this hole by formalizing the distinction between what private things need to be freed up by netdev->destructor() and whether the driver needs unregister_netdevice() to perform the free_netdev(). netdev->priv_destructor() performs all actions to free up the private resources that used to be freed by netdev->destructor(), except for free_netdev(). netdev->needs_free_netdev is a boolean that indicates whether free_netdev() should be done at the end of unregister_netdevice(). Now, register_netdevice() can sanely release all resources after ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit() and netdev->priv_destructor(). And at the end of unregister_netdevice(), we invoke netdev->priv_destructor() and optionally call free_netdev(). Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Xin Long
|
b1b9d36602 |
bridge: move bridge multicast cleanup to ndo_uninit
During removing a bridge device, if the bridge is still up, a new mdb entry still can be added in br_multicast_add_group() after all mdb entries are removed in br_multicast_dev_del(). Like the path: mld_ifc_timer_expire -> mld_sendpack -> ... br_multicast_rcv -> br_multicast_add_group The new mp's timer will be set up. If the timer expires after the bridge is freed, it may cause use-after-free panic in br_multicast_group_expired. BUG: unable to handle kernel NULL pointer dereference at 0000000000000048 IP: [<ffffffffa07ed2c8>] br_multicast_group_expired+0x28/0xb0 [bridge] Call Trace: <IRQ> [<ffffffff81094536>] call_timer_fn+0x36/0x110 [<ffffffffa07ed2a0>] ? br_mdb_free+0x30/0x30 [bridge] [<ffffffff81096967>] run_timer_softirq+0x237/0x340 [<ffffffff8108dcbf>] __do_softirq+0xef/0x280 [<ffffffff8169889c>] call_softirq+0x1c/0x30 [<ffffffff8102c275>] do_softirq+0x65/0xa0 [<ffffffff8108e055>] irq_exit+0x115/0x120 [<ffffffff81699515>] smp_apic_timer_interrupt+0x45/0x60 [<ffffffff81697a5d>] apic_timer_interrupt+0x6d/0x80 Nikolay also found it would cause a memory leak - the mdb hash is reallocated and not freed due to the mdb rehash. unreferenced object 0xffff8800540ba800 (size 2048): backtrace: [<ffffffff816e2287>] kmemleak_alloc+0x67/0xc0 [<ffffffff81260bea>] __kmalloc+0x1ba/0x3e0 [<ffffffffa05c60ee>] br_mdb_rehash+0x5e/0x340 [bridge] [<ffffffffa05c74af>] br_multicast_new_group+0x43f/0x6e0 [bridge] [<ffffffffa05c7aa3>] br_multicast_add_group+0x203/0x260 [bridge] [<ffffffffa05ca4b5>] br_multicast_rcv+0x945/0x11d0 [bridge] [<ffffffffa05b6b10>] br_dev_xmit+0x180/0x470 [bridge] [<ffffffff815c781b>] dev_hard_start_xmit+0xbb/0x3d0 [<ffffffff815c8743>] __dev_queue_xmit+0xb13/0xc10 [<ffffffff815c8850>] dev_queue_xmit+0x10/0x20 [<ffffffffa02f8d7a>] ip6_finish_output2+0x5ca/0xac0 [ipv6] [<ffffffffa02fbfc6>] ip6_finish_output+0x126/0x2c0 [ipv6] [<ffffffffa02fc245>] ip6_output+0xe5/0x390 [ipv6] [<ffffffffa032b92c>] NF_HOOK.constprop.44+0x6c/0x240 [ipv6] [<ffffffffa032bd16>] mld_sendpack+0x216/0x3e0 [ipv6] [<ffffffffa032d5eb>] mld_ifc_timer_expire+0x18b/0x2b0 [ipv6] This could happen when ip link remove a bridge or destroy a netns with a bridge device inside. With Nikolay's suggestion, this patch is to clean up bridge multicast in ndo_uninit after bridge dev is shutdown, instead of br_dev_delete, so that netif_running check in br_multicast_add_group can avoid this issue. v1->v2: - fix this issue by moving br_multicast_dev_del to ndo_uninit, instead of calling dev_close in br_dev_delete. (NOTE: Depends upon |
||
Ido Schimmel
|
b6fe0440c6 |
bridge: implement missing ndo_uninit()
While the bridge driver implements an ndo_init(), it was missing a
symmetric ndo_uninit(), causing the different de-initialization
operations to be scattered around its dellink() and destructor().
Implement a symmetric ndo_uninit() and remove the overlapping operations
from its dellink() and destructor().
This is a prerequisite for the next patch, as it allows us to have a
proper cleanup upon changelink() failure during the bridge's newlink().
Fixes:
|
||
Nikolay Aleksandrov
|
bfd0aeac52 |
bridge: fdb: converge fdb searching functions into one
Before this patch we had 3 different fdb searching functions which was confusing. This patch reduces all of them to one - fdb_find_rcu(), and two flavors: br_fdb_find() which requires hash_lock and br_fdb_find_rcu which requires RCU. This makes it clear what needs to be used, we also remove two abusers of __br_fdb_get which called it under hash_lock and replace them with br_fdb_find(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Nikolay Aleksandrov
|
f7cdee8a79 |
bridge: move to workqueue gc
Move the fdb garbage collector to a workqueue which fires at least 10 milliseconds apart and cleans chain by chain allowing for other tasks to run in the meantime. When having thousands of fdbs the system is much more responsive. Most importantly remove the need to check if the matched entry has expired in __br_fdb_get that causes false-sharing and is completely unnecessary if we cleanup entries, at worst we'll get 10ms of traffic for that entry before it gets deleted. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Ido Schimmel
|
a8eca32615 |
net: remove ndo_neigh_{construct, destroy} from stacked devices
In commit
|
||
stephen hemminger
|
bc1f44709c |
net: make ndo_get_stats64 a void function
The network device operation for reading statistics is only called in one place, and it ignores the return value. Having a structure return value is potentially confusing because some future driver could incorrectly assume that the return value was used. Fix all drivers with ndo_get_stats64 to have a void function. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Linus Torvalds
|
7c0f6ba682 |
Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vivien Didelot
|
34d8acd8aa |
net: bridge: shorten ageing time on topology change
802.1D [1] specifies that the bridges must use a short value to age out dynamic entries in the Filtering Database for a period, once a topology change has been communicated by the root bridge. Add a bridge_ageing_time member in the net_bridge structure to store the bridge ageing time value configured by the user (ioctl/netlink/sysfs). If we are using in-kernel STP, shorten the ageing time value to twice the forward delay used by the topology when the topology change flag is set. When the flag is cleared, restore the configured ageing time. [1] "8.3.5 Notifying topology changes ", http://profesores.elo.utfsm.cl/~agv/elo309/doc/802.1D-1998.pdf Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Jarod Wilson
|
91572088e3 |
net: use core MTU range checking in core net infra
geneve: - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu - This one isn't quite as straight-forward as others, could use some closer inspection and testing macvlan: - set min/max_mtu tun: - set min/max_mtu, remove tun_net_change_mtu vxlan: - Merge __vxlan_change_mtu back into vxlan_change_mtu - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in change_mtu function - This one is also not as straight-forward and could use closer inspection and testing from vxlan folks bridge: - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in change_mtu function openvswitch: - set min/max_mtu, remove internal_dev_change_mtu - note: max_mtu wasn't checked previously, it's been set to 65535, which is the largest possible size supported sch_teql: - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535) macsec: - min_mtu = 0, max_mtu = 65535 macvlan: - min_mtu = 0, max_mtu = 65535 ntb_netdev: - min_mtu = 0, max_mtu = 65535 veth: - min_mtu = 68, max_mtu = 65535 8021q: - min_mtu = 0, max_mtu = 65535 CC: netdev@vger.kernel.org CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Tom Herbert <tom@herbertland.com> CC: Daniel Borkmann <daniel@iogearbox.net> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Paolo Abeni <pabeni@redhat.com> CC: Jiri Benc <jbenc@redhat.com> CC: WANG Cong <xiyou.wangcong@gmail.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Pravin B Shelar <pshelar@ovn.org> CC: Sabrina Dubroca <sd@queasysnail.net> CC: Patrick McHardy <kaber@trash.net> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Pravin Shelar <pshelar@nicira.com> CC: Maxim Krasnyansky <maxk@qti.qualcomm.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Nikolay Aleksandrov
|
8addd5e7d3 |
net: bridge: change unicast boolean to exact pkt_type
Remove the unicast flag and introduce an exact pkt_type. That would help us for the upcoming per-port multicast flood flag and also slightly reduce the tests in the input fast path. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Nikolay Aleksandrov
|
37b090e6be |
net: bridge: remove _deliver functions and consolidate forward code
Before this patch we had two flavors of most forwarding functions - _forward and _deliver, the difference being that the latter are used when the packets are locally originated. Instead of all this function pointer passing and code duplication, we can just pass a boolean noting that the packet was locally originated and use that to perform the necessary checks in __br_forward. This gives a minor performance improvement but more importantly consolidates the forwarding paths. Also add a kernel doc comment to explain the exported br_forward()'s arguments. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Jiri Pirko
|
18bfb924f0 |
net: introduce default neigh_construct/destroy ndo calls for L2 upper devices
L2 upper device needs to propagate neigh_construct/destroy calls down to lower devices. Do this by defining default ndo functions and use them in team, bond, bridge and vlan. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Nikolay Aleksandrov
|
1080ab95e3 |
net: bridge: add support for IGMP/MLD stats and export them via netlink
This patch adds stats support for the currently used IGMP/MLD types by the bridge. The stats are per-port (plus one stat per-bridge) and per-direction (RX/TX). The stats are exported via netlink via the new linkxstats API (RTM_GETSTATS). In order to minimize the performance impact, a new option is used to enable/disable the stats - multicast_stats_enabled, similar to the recent vlan stats. Also in order to avoid multiple IGMP/MLD type lookups and checks, we make use of the current "igmp" member of the bridge private skb->cb region to record the type on Rx (both host-generated and external packets pass by multicast_rcv()). We can do that since the igmp member was used as a boolean and all the valid IGMP/MLD types are positive values. The normal bridge fast-path is not affected at all, the only affected paths are the flooding ones and since we make use of the IGMP/MLD type, we can quickly determine if the packet should be counted using cache-hot data (cb's igmp member). We add counters for: * IGMP Queries * IGMP Leaves * IGMP v1/v2/v3 reports * MLD Queries * MLD Leaves * MLD v1/v2 reports These are invaluable when monitoring or debugging complex multicast setups with bridges. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Nikolay Aleksandrov
|
c6894dec8e |
bridge: fix lockdep addr_list_lock false positive splat
After promisc mode management was introduced a bridge device could do
dev_set_promiscuity from its ndo_change_rx_flags() callback which in
turn can be called after the bridge's addr_list_lock has been taken
(e.g. by dev_uc_add). This causes a false positive lockdep splat because
the port interfaces' addr_list_lock is taken when br_manage_promisc()
runs after the bridge's addr list lock was already taken.
To remove the false positive introduce a custom bridge addr_list_lock
class and set it on bridge init.
A simple way to reproduce this is with the following:
$ brctl addbr br0
$ ip l add l br0 br0.100 type vlan id 100
$ ip l set br0 up
$ ip l set br0.100 up
$ echo 1 > /sys/class/net/br0/bridge/vlan_filtering
$ brctl addif br0 eth0
Splat:
[ 43.684325] =============================================
[ 43.684485] [ INFO: possible recursive locking detected ]
[ 43.684636] 4.4.0-rc8+ #54 Not tainted
[ 43.684755] ---------------------------------------------
[ 43.684906] brctl/1187 is trying to acquire lock:
[ 43.685047] (_xmit_ETHER){+.....}, at: [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
[ 43.685460] but task is already holding lock:
[ 43.685618] (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
[ 43.686015] other info that might help us debug this:
[ 43.686316] Possible unsafe locking scenario:
[ 43.686743] CPU0
[ 43.686967] ----
[ 43.687197] lock(_xmit_ETHER);
[ 43.687544] lock(_xmit_ETHER);
[ 43.687886] *** DEADLOCK ***
[ 43.688438] May be due to missing lock nesting notation
[ 43.688882] 2 locks held by brctl/1187:
[ 43.689134] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81510317>] rtnl_lock+0x17/0x20
[ 43.689852] #1: (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80
[ 43.690575] stack backtrace:
[ 43.690970] CPU: 0 PID: 1187 Comm: brctl Not tainted 4.4.0-rc8+ #54
[ 43.691270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
[ 43.691770] ffffffff826a25c0 ffff8800369fb8e0 ffffffff81360ceb ffffffff826a25c0
[ 43.692425] ffff8800369fb9b8 ffffffff810d0466 ffff8800369fb968 ffffffff81537139
[ 43.693071] ffff88003a08c880 0000000000000000 00000000ffffffff 0000000002080020
[ 43.693709] Call Trace:
[ 43.693931] [<ffffffff81360ceb>] dump_stack+0x4b/0x70
[ 43.694199] [<ffffffff810d0466>] __lock_acquire+0x1e46/0x1e90
[ 43.694483] [<ffffffff81537139>] ? netlink_broadcast_filtered+0x139/0x3e0
[ 43.694789] [<ffffffff8153b5da>] ? nlmsg_notify+0x5a/0xc0
[ 43.695064] [<ffffffff810d10f5>] lock_acquire+0xe5/0x1f0
[ 43.695340] [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40
[ 43.695623] [<ffffffff815edea5>] _raw_spin_lock_bh+0x45/0x80
[ 43.695901] [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40
[ 43.696180] [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40
[ 43.696460] [<ffffffff8150189c>] dev_set_promiscuity+0x3c/0x50
[ 43.696750] [<ffffffffa0586845>] br_port_set_promisc+0x25/0x50 [bridge]
[ 43.697052] [<ffffffffa05869aa>] br_manage_promisc+0x8a/0xe0 [bridge]
[ 43.697348] [<ffffffffa05826ee>] br_dev_change_rx_flags+0x1e/0x20 [bridge]
[ 43.697655] [<ffffffff81501532>] __dev_set_promiscuity+0x132/0x1f0
[ 43.697943] [<ffffffff81501672>] __dev_set_rx_mode+0x82/0x90
[ 43.698223] [<ffffffff815072de>] dev_uc_add+0x5e/0x80
[ 43.698498] [<ffffffffa05b3c62>] vlan_device_event+0x542/0x650 [8021q]
[ 43.698798] [<ffffffff8109886d>] notifier_call_chain+0x5d/0x80
[ 43.699083] [<ffffffff810988b6>] raw_notifier_call_chain+0x16/0x20
[ 43.699374] [<ffffffff814f456e>] call_netdevice_notifiers_info+0x6e/0x80
[ 43.699678] [<ffffffff814f4596>] call_netdevice_notifiers+0x16/0x20
[ 43.699973] [<ffffffffa05872be>] br_add_if+0x47e/0x4c0 [bridge]
[ 43.700259] [<ffffffffa058801e>] add_del_if+0x6e/0x80 [bridge]
[ 43.700548] [<ffffffffa0588b5f>] br_dev_ioctl+0xaf/0xc0 [bridge]
[ 43.700836] [<ffffffff8151a7ac>] dev_ifsioc+0x30c/0x3c0
[ 43.701106] [<ffffffff8151aac9>] dev_ioctl+0xf9/0x6f0
[ 43.701379] [<ffffffff81254345>] ? mntput_no_expire+0x5/0x450
[ 43.701665] [<ffffffff812543ee>] ? mntput_no_expire+0xae/0x450
[ 43.701947] [<ffffffff814d7b02>] sock_do_ioctl+0x42/0x50
[ 43.702219] [<ffffffff814d8175>] sock_ioctl+0x1e5/0x290
[ 43.702500] [<ffffffff81242d0b>] do_vfs_ioctl+0x2cb/0x5c0
[ 43.702771] [<ffffffff81243079>] SyS_ioctl+0x79/0x90
[ 43.703033] [<ffffffff815eebb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Stephen Hemminger <stephen@networkplumber.org>
CC: Bridge list <bridge@lists.linux-foundation.org>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Fixes:
|
||
Nikolay Aleksandrov
|
907b1e6e83 |
bridge: vlan: use proper rcu for the vlgrp member
The bridge and port's vlgrp member is already used in RCU way, currently we rely on the fact that it cannot disappear while the port exists but that is error-prone and we might miss places with improper locking (either RCU or RTNL must be held to walk the vlan_list). So make it official and use RCU for vlgrp to catch offenders. Introduce proper vlgrp accessors and use them consistently throughout the code. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Nikolay Aleksandrov
|
77751ee8ae |
bridge: vlan: move pvid inside net_bridge_vlan_group
One obvious way to converge more code (which was also used by the previous vlan code) is to move pvid inside net_bridge_vlan_group. This allows us to simplify some and remove other port-specific functions. Also gives us the ability to simply pass the vlan group and use all of the contained information. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Nikolay Aleksandrov
|
2594e9064a |
bridge: vlan: add per-vlan struct and move to rhashtables
This patch changes the bridge vlan implementation to use rhashtables instead of bitmaps. The main motivation behind this change is that we need extensible per-vlan structures (both per-port and global) so more advanced features can be introduced and the vlan support can be extended. I've tried to break this up but the moment net_port_vlans is changed and the whole API goes away, thus this is a larger patch. A few short goals of this patch are: - Extensible per-vlan structs stored in rhashtables and a sorted list - Keep user-visible behaviour (compressed vlans etc) - Keep fastpath ingress/egress logic the same (optimizations to come later) Here's a brief list of some of the new features we'd like to introduce: - per-vlan counters - vlan ingress/egress mapping - per-vlan igmp configuration - vlan priorities - avoid fdb entries replication (e.g. local fdb scaling issues) The structure is kept single for both global and per-port entries so to avoid code duplication where possible and also because we'll soon introduce "port0 / aka bridge as port" which should simplify things further (thanks to Vlad for the suggestion!). Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port rhashtable, if an entry is added to a port it'll get a pointer to its global context so it can be quickly accessed later. There's also a sorted vlan list which is used for stable walks and some user-visible behaviour such as the vlan ranges, also for error paths. VLANs are stored in a "vlan group" which currently contains the rhashtable, sorted vlan list and the number of "real" vlan entries. A good side-effect of this change is that it resembles how hw keeps per-vlan data. One important note after this change is that if a VLAN is being looked up in the bridge's rhashtable for filtering purposes (or to check if it's an existing usable entry, not just a global context) then the new helper br_vlan_should_use() needs to be used if the vlan is found. In case the lookup is done only with a port's vlan group, then this check can be skipped. Things tested so far: - basic vlan ingress/egress - pvids - untagged vlans - undef CONFIG_BRIDGE_VLAN_FILTERING - adding/deleting vlans in different scenarios (with/without global ctx, while transmitting traffic, in ranges etc) - loading/removing the module while having/adding/deleting vlans - extracting bridge vlan information (user ABI), compressed requests - adding/deleting fdbs on vlans - bridge mac change, promisc mode - default pvid change - kmemleak ON during the whole time Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Scott Feldman
|
a79e88d9fb |
bridge: define some min/max/default ageing time constants
Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Phil Sutter
|
ccecb2a47c |
net: bridge: convert to using IFF_NO_QUEUE
Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Toshiaki Makita
|
6678053092 |
bridge: Don't segment multiple tagged packets on bridge device
Bridge devices don't need to segment multiple tagged packets since thier ports can segment them. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Pablo Neira Ayuso
|
1a4ba64d16 |
netfilter: bridge: use rcu hook to resolve br_netfilter dependency
|
||
Pablo Neira Ayuso
|
e5de75bf88 |
netfilter: bridge: move DNAT helper to br_netfilter
Only one caller, there is no need to keep this in a header. Move it to br_netfilter.c where this belongs to. Based on patch from Florian Westphal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> |