Commit Graph

57484 Commits

Author SHA1 Message Date
Vivien Didelot
c5335d737f net: dsa: check bridge VLAN in slave operations
The bridge VLANs are not offloaded by dsa_port_vlan_* if the port is
not bridged or if its bridge is not VLAN aware.

This is a good thing but other corners of DSA, such as the tag_8021q
driver, may need to program VLANs regardless the bridge state.

And also because bridge_dev is specific to user ports anyway, move
these checks were it belongs, one layer up in the slave code.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27 20:17:27 -07:00
Vivien Didelot
bdcff080f7 net: dsa: add slave VLAN helpers
Add dsa_slave_vlan_add and dsa_slave_vlan_del helpers to handle
SWITCHDEV_OBJ_ID_PORT_VLAN switchdev objects. Also copy the
switchdev_obj_port_vlan structure on add since we will modify it in
future patches.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27 20:17:27 -07:00
Vivien Didelot
cf360866b1 net: dsa: do not skip -EOPNOTSUPP in dsa_port_vid_add
Currently dsa_port_vid_add returns 0 if the switch returns -EOPNOTSUPP.

This function is used in the tag_8021q.c code to offload the PVID of
ports, which would simply not work if .port_vlan_add is not supported
by the underlying switch.

Do not skip -EOPNOTSUPP in dsa_port_vid_add but only when necessary,
that is to say in dsa_slave_vlan_rx_add_vid.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27 20:17:27 -07:00
Vivien Didelot
e65d45cc35 net: dsa: remove bitmap operations
The bitmap operations were introduced to simplify the switch drivers
in the future, since most of them could implement the common VLAN and
MDB operations (add, del, dump) with simple functions taking all target
ports at once, and thus limiting the number of hardware accesses.

Programming an MDB or VLAN this way in a single operation would clearly
simplify the drivers a lot but would require a new get-set interface
in DSA. The usage of such bitmap from the stack also raised concerned
in the past, leading to the dynamic allocation of a new ds->_bitmap
member in the dsa_switch structure. So let's get rid of them for now.

This commit nicely wraps the ds->ops->port_{mdb,vlan}_{prepare,add}
switch operations into new dsa_switch_{mdb,vlan}_{prepare,add}
variants not using any bitmap argument anymore.

New dsa_switch_{mdb,vlan}_match helpers have been introduced to make
clear which local port of a switch must be programmed with the target
object. While the targeted user port is an obvious candidate, the
DSA links must also be programmed, as well as the CPU port for VLANs.

While at it, also remove local variables that are only used once.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27 20:17:27 -07:00
Cong Wang
981471bd3a net_sched: fix a NULL pointer deref in ipt action
The net pointer in struct xt_tgdtor_param is not explicitly
initialized therefore is still NULL when dereferencing it.
So we have to find a way to pass the correct net pointer to
ipt_destroy_target().

The best way I find is just saving the net pointer inside the per
netns struct tcf_idrinfo, which could make this patch smaller.

Fixes: 0c66dc1ea3 ("netfilter: conntrack: register hooks in netns when needed by ruleset")
Reported-and-tested-by: itugrok@yahoo.com
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27 15:05:58 -07:00
David S. Miller
68aaf44595 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Minor conflict in r8169, bug fix had two versions in net
and net-next, take the net-next hunks.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-27 14:23:31 -07:00
Linus Torvalds
9e8312f5e1 NFS client bugfixes for Linux 5.3
Highlights include:
 
 Stable fixes:
 - Fix a page lock leak in nfs_pageio_resend()
 - Ensure O_DIRECT reports an error if the bytes read/written is 0
 - Don't handle errors if the bind/connect succeeded
 - Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidat
 ed"
 
 Bugfixes:
 - Don't refresh attributes with mounted-on-file information
 - Fix return values for nfs4_file_open() and nfs_finish_open()
 - Fix pnfs layoutstats reporting of I/O errors
 - Don't use soft RPC calls for pNFS/flexfiles I/O, and don't abort for
   soft I/O errors when the user specifies a hard mount.
 - Various fixes to the error handling in sunrpc
 - Don't report writepage()/writepages() errors twice.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl1lgz4ACgkQZwvnipYK
 APIsHhAApqaVaGzwfeR87zq+QaaVOzYzejyvFgs3wh/Lc5xPH+SlQ6NxLbs8ppdT
 srrOHV9E2MA4JgqoHaIBMTqWacQ0UfQQ/6qLEFCrps9/0QHs7fg0CAHS5emmgk2v
 rD6Mezr5Nx8h5/QJCBEZXfas5lxsICz1EYJ4Pk8QT6IoyeC+fvarGZKvzIQJ3KDN
 8yrdv5kCVtN7noREf1KDIqIlYvFbIEoOoglNA40G49e1ffT9Oz6qzTcg19HFO50x
 eAIxc9u4KCUY/ASCvcv9biQ5200l7QSCqmR7/Xlj/+4aClKp6Ay058j0awxtHHDy
 NlZt6V3XGlm1/SVpvtU/XXWcyJmQwX7kOVIEYOFmt+lEqC7ZBzWEpAaJ8h4DMLLc
 PIxIWBSmXNxp6LPNI0dZFf7O6UZ3ZMRacav+HHu7mjWolEB22f4jQJs+RxNhnfLU
 fg180YWBMX4V/98S7iigxZkRd+qqQhddYtku+o+bp3h4m6mVrrYNm11J0o0GWQWf
 Lio9nlkLq9hkYpdBwkH4PtIv3b+O5f9yhfEYn15eF27Ru0Bob0+DiBkzlflcrJve
 W2VfNAj+jxP3Wg0QAI40BSqUB3b+zVtZW5FenAUEK7NxhhPi6jrIsVhhVgGFZIAd
 i1xwYUg6fDjielhGOxMTF66ilvduA9uBCFAnTD3iSBoZmF63vew=
 =YHhU
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - Fix a page lock leak in nfs_pageio_resend()

   - Ensure O_DIRECT reports an error if the bytes read/written is 0

   - Don't handle errors if the bind/connect succeeded

   - Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was
     invalidat ed"

  Bugfixes:

   - Don't refresh attributes with mounted-on-file information

   - Fix return values for nfs4_file_open() and nfs_finish_open()

   - Fix pnfs layoutstats reporting of I/O errors

   - Don't use soft RPC calls for pNFS/flexfiles I/O, and don't abort
     for soft I/O errors when the user specifies a hard mount.

   - Various fixes to the error handling in sunrpc

   - Don't report writepage()/writepages() errors twice"

* tag 'nfs-for-5.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: remove set but not used variable 'mapping'
  NFSv2: Fix write regression
  NFSv2: Fix eof handling
  NFS: Fix writepage(s) error handling to not report errors twice
  NFS: Fix spurious EIO read errors
  pNFS/flexfiles: Don't time out requests on hard mounts
  SUNRPC: Handle connection breakages correctly in call_status()
  Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"
  SUNRPC: Handle EADDRINUSE and ENOBUFS correctly
  pNFS/flexfiles: Turn off soft RPC calls
  SUNRPC: Don't handle errors if the bind/connect succeeded
  NFS: On fatal writeback errors, we need to call nfs_inode_remove_request()
  NFS: Fix initialisation of I/O result struct in nfs_pgio_rpcsetup
  NFS: Ensure O_DIRECT reports an error if the bytes read/written is 0
  NFSv4/pnfs: Fix a page lock leak in nfs_pageio_resend()
  NFSv4: Fix return value in nfs_finish_open()
  NFSv4: Fix return values for nfs4_file_open()
  NFS: Don't refresh attributes with mounted-on-file information
2019-08-27 13:22:57 -07:00
Linus Torvalds
452a04441b Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Use 32-bit index for tails calls in s390 bpf JIT, from Ilya
    Leoshkevich.

 2) Fix missed EPOLLOUT events in TCP, from Eric Dumazet. Same fix for
    SMC from Jason Baron.

 3) ipv6_mc_may_pull() should return 0 for malformed packets, not
    -EINVAL. From Stefano Brivio.

 4) Don't forget to unpin umem xdp pages in error path of
    xdp_umem_reg(). From Ivan Khoronzhuk.

 5) Fix sta object leak in mac80211, from Johannes Berg.

 6) Fix regression by not configuring PHYLINK on CPU port of bcm_sf2
    switches. From Florian Fainelli.

 7) Revert DMA sync removal from r8169 which was causing regressions on
    some MIPS Loongson platforms. From Heiner Kallweit.

 8) Use after free in flow dissector, from Jakub Sitnicki.

 9) Fix NULL derefs of net devices during ICMP processing across
    collect_md tunnels, from Hangbin Liu.

10) proto_register() memory leaks, from Zhang Lin.

11) Set NLM_F_MULTI flag in multipart netlink messages consistently,
    from John Fastabend.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
  r8152: Set memory to all 0xFFs on failed reg reads
  openvswitch: Fix conntrack cache with timeout
  ipv4: mpls: fix mpls_xmit for iptunnel
  nexthop: Fix nexthop_num_path for blackhole nexthops
  net: rds: add service level support in rds-info
  net: route dump netlink NLM_F_MULTI flag missing
  s390/qeth: reject oversized SNMP requests
  sock: fix potential memory leak in proto_register()
  MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT
  xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode
  ipv4/icmp: fix rt dst dev null pointer dereference
  openvswitch: Fix log message in ovs conntrack
  bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0
  bpf: fix use after free in prog symbol exposure
  bpf: fix precision tracking in presence of bpf2bpf calls
  flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH
  Revert "r8169: remove not needed call to dma_sync_single_for_device"
  ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev
  net/ncsi: Fix the payload copying for the request coming from Netlink
  qed: Add cleanup in qed_slowpath_start()
  ...
2019-08-27 10:12:48 -07:00
Florian Westphal
478553fd1b netfilter: conntrack: make sysctls per-namespace again
When I merged the extension sysctl tables with the main one I forgot to
reset them on netns creation.  They currently read/write init_net settings.

Fixes: d912dec124 ("netfilter: conntrack: merge acct and helper sysctl table with main one")
Fixes: cb2833ed00 ("netfilter: conntrack: merge ecache and timestamp sysctl tables with main one")
Reported-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-27 17:46:13 +02:00
Thomas Jarosch
3a069024d3 netfilter: nf_conntrack_ftp: Fix debug output
The find_pattern() debug output was printing the 'skip' character.
This can be a NULL-byte and messes up further pr_debug() output.

Output without the fix:
kernel: nf_conntrack_ftp: Pattern matches!
kernel: nf_conntrack_ftp: Skipped up to `<7>nf_conntrack_ftp: find_pattern `PORT': dlen = 8
kernel: nf_conntrack_ftp: find_pattern `EPRT': dlen = 8

Output with the fix:
kernel: nf_conntrack_ftp: Pattern matches!
kernel: nf_conntrack_ftp: Skipped up to 0x0 delimiter!
kernel: nf_conntrack_ftp: Match succeeded!
kernel: nf_conntrack_ftp: conntrack_ftp: match `172,17,0,100,200,207' (20 bytes at 4150681645)
kernel: nf_conntrack_ftp: find_pattern `PORT': dlen = 8

Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-27 13:01:05 +02:00
Todd Seidelmann
3cf2f450ff netfilter: xt_physdev: Fix spurious error message in physdev_mt_check
Simplify the check in physdev_mt_check() to emit an error message
only when passed an invalid chain (ie, NF_INET_LOCAL_OUT).
This avoids cluttering up the log with errors against valid rules.

For large/heavily modified rulesets, current behavior can quickly
overwhelm the ring buffer, because this function gets called on
every change, regardless of the rule that was changed.

Signed-off-by: Todd Seidelmann <tseidelmann@linode.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-27 12:58:28 +02:00
David Howells
d0d5c0cd1e rxrpc: Use skb_unshare() rather than skb_cow_data()
The in-place decryption routines in AF_RXRPC's rxkad security module
currently call skb_cow_data() to make sure the data isn't shared and that
the skb can be written over.  This has a problem, however, as the softirq
handler may be still holding a ref or the Rx ring may be holding multiple
refs when skb_cow_data() is called in rxkad_verify_packet() - and so
skb_shared() returns true and __pskb_pull_tail() dislikes that.  If this
occurs, something like the following report will be generated.

	kernel BUG at net/core/skbuff.c:1463!
	...
	RIP: 0010:pskb_expand_head+0x253/0x2b0
	...
	Call Trace:
	 __pskb_pull_tail+0x49/0x460
	 skb_cow_data+0x6f/0x300
	 rxkad_verify_packet+0x18b/0xb10 [rxrpc]
	 rxrpc_recvmsg_data.isra.11+0x4a8/0xa10 [rxrpc]
	 rxrpc_kernel_recv_data+0x126/0x240 [rxrpc]
	 afs_extract_data+0x51/0x2d0 [kafs]
	 afs_deliver_fs_fetch_data+0x188/0x400 [kafs]
	 afs_deliver_to_call+0xac/0x430 [kafs]
	 afs_wait_for_call_to_complete+0x22f/0x3d0 [kafs]
	 afs_make_call+0x282/0x3f0 [kafs]
	 afs_fs_fetch_data+0x164/0x300 [kafs]
	 afs_fetch_data+0x54/0x130 [kafs]
	 afs_readpages+0x20d/0x340 [kafs]
	 read_pages+0x66/0x180
	 __do_page_cache_readahead+0x188/0x1a0
	 ondemand_readahead+0x17d/0x2e0
	 generic_file_read_iter+0x740/0xc10
	 __vfs_read+0x145/0x1a0
	 vfs_read+0x8c/0x140
	 ksys_read+0x4a/0xb0
	 do_syscall_64+0x43/0xf0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix this by using skb_unshare() instead in the input path for DATA packets
that have a security index != 0.  Non-DATA packets don't need in-place
encryption and neither do unencrypted DATA packets.

Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Reported-by: Julian Wollrath <jwollrath@web.de>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27 10:13:46 +01:00
David Howells
987db9f7cd rxrpc: Use the tx-phase skb flag to simplify tracing
Use the previously-added transmit-phase skbuff private flag to simplify the
socket buffer tracing a bit.  Which phase the skbuff comes from can now be
divined from the skb rather than having to be guessed from the call state.

We can also reduce the number of rxrpc_skb_trace values by eliminating the
difference between Tx and Rx in the symbols.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27 10:04:18 +01:00
David Howells
b311e68420 rxrpc: Add a private skb flag to indicate transmission-phase skbs
Add a flag in the private data on an skbuff to indicate that this is a
transmission-phase buffer rather than a receive-phase buffer.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27 10:03:26 +01:00
David Howells
a641fd00d0 rxrpc: Abstract out rxtx ring cleanup
Abstract out rxtx ring cleanup into its own function from its two callers.
This makes it easier to apply the same changes to both.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27 10:03:26 +01:00
David Howells
4858e40303 rxrpc: Pass the input handler's data skb reference to the Rx ring
Pass the reference held on a DATA skb in the rxrpc input handler into the
Rx ring rather than getting an additional ref for this and then dropping
the original ref at the end.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27 09:59:19 +01:00
David Howells
e2de6c4048 rxrpc: Use info in skbuff instead of reparsing a jumbo packet
Use the information now cached in the skbuff private data to avoid the need
to reparse a jumbo packet.  We can find all the subpackets by dead
reckoning, so it's only necessary to note how many there are, whether the
last one is flagged as LAST_PACKET and whether any have the REQUEST_ACK
flag set.

This is necessary as once recvmsg() can see the packet, it can start
modifying it, such as doing in-place decryption.

Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-08-27 09:51:30 +01:00
David Howells
c3c9e3df49 rxrpc: Improve jumbo packet counting
Improve the information stored about jumbo packets so that we don't need to
reparse them so much later.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
2019-08-27 09:48:37 +01:00
Vlad Buslov
918190f50e net: sched: flower: don't take rtnl lock for cls hw offloads API
Don't manually take rtnl lock in flower classifier before calling cls
hardware offloads API. Instead, pass rtnl lock status via 'rtnl_held'
parameter.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
1444c175a3 net: sched: copy tunnel info when setting flow_action entry->tunnel
In order to remove dependency on rtnl lock, modify tc_setup_flow_action()
to copy tunnel info, instead of just saving pointer to tunnel_key action
tunnel info. This is necessary to prevent concurrent action overwrite from
releasing tunnel info while it is being used by rtnl-unlocked driver.

Implement helper tcf_tunnel_info_copy() that is used to copy tunnel info
with all its options to dynamically allocated memory block. Modify
tc_cleanup_flow_action() to free dynamically allocated tunnel info.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
5a6ff4b13d net: sched: take reference to action dev before calling offloads
In order to remove dependency on rtnl lock when calling hardware offload
API, take reference to action mirred dev when initializing flow_action
structure in tc_setup_flow_action(). Implement function
tc_cleanup_flow_action(), use it to release the device after hardware
offload API is done using it.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
9838b20a7f net: sched: take rtnl lock in tc_setup_flow_action()
In order to allow using new flow_action infrastructure from unlocked
classifiers, modify tc_setup_flow_action() to accept new 'rtnl_held'
argument. Take rtnl lock before accessing tc_action data. This is necessary
to protect from concurrent action replace.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
11bd634da2 net: sched: conditionally obtain rtnl lock in cls hw offloads API
In order to remove dependency on rtnl lock from offloads code of
classifiers, take rtnl lock conditionally before executing driver
callbacks. Only obtain rtnl lock if block is bound to devices that require
it.

Block bind/unbind code is rtnl-locked and obtains block->cb_lock while
holding rtnl lock. Obtain locks in same order in tc_setup_cb_*() functions
to prevent deadlock.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
c9f14470d0 net: sched: add API for registering unlocked offload block callbacks
Extend struct flow_block_offload with "unlocked_driver_cb" flag to allow
registering and unregistering block hardware offload callbacks that do not
require caller to hold rtnl lock. Extend tcf_block with additional
lockeddevcnt counter that is incremented for each non-unlocked driver
callback attached to device. This counter is necessary to conditionally
obtain rtnl lock before calling hardware callbacks in following patches.

Register mlx5 tc block offload callbacks as "unlocked".

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
a449a3e77a net: sched: notify classifier on successful offload add/delete
To remove dependency on rtnl lock, extend classifier ops with new
ops->hw_add() and ops->hw_del() callbacks. Call them from cls API while
holding cb_lock every time filter if successfully added to or deleted from
hardware.

Implement the new API in flower classifier. Use it to manage hw_filters
list under cb_lock protection, instead of relying on rtnl lock to
synchronize with concurrent fl_reoffload() call.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
4011921137 net: sched: refactor block offloads counter usage
Without rtnl lock protection filters can no longer safely manage block
offloads counter themselves. Refactor cls API to protect block offloadcnt
with tcf_block->cb_lock that is already used to protect driver callback
list and nooffloaddevcnt counter. The counter can be modified by concurrent
tasks by new functions that execute block callbacks (which is safe with
previous patch that changed its type to atomic_t), however, block
bind/unbind code that checks the counter value takes cb_lock in write mode
to exclude any concurrent modifications. This approach prevents race
conditions between bind/unbind and callback execution code but allows for
concurrency for tc rule update path.

Move block offload counter, filter in hardware counter and filter flags
management from classifiers into cls hardware offloads API. Make functions
tcf_block_offload_{inc|dec}() and tc_cls_offload_cnt_update() to be cls API
private. Implement following new cls API to be used instead:

  tc_setup_cb_add() - non-destructive filter add. If filter that wasn't
  already in hardware is successfully offloaded, increment block offloads
  counter, set filter in hardware counter and flag. On failure, previously
  offloaded filter is considered to be intact and offloads counter is not
  decremented.

  tc_setup_cb_replace() - destructive filter replace. Release existing
  filter block offload counter and reset its in hardware counter and flag.
  Set new filter in hardware counter and flag. On failure, previously
  offloaded filter is considered to be destroyed and offload counter is
  decremented.

  tc_setup_cb_destroy() - filter destroy. Unconditionally decrement block
  offloads counter.

  tc_setup_cb_reoffload() - reoffload filter to single cb. Execute cb() and
  call tc_cls_offload_cnt_update() if cb() didn't return an error.

Refactor all offload-capable classifiers to atomically offload filters to
hardware, change block offload counter, and set filter in hardware counter
and flag by means of the new cls API functions.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
97394bef56 net: sched: change tcf block offload counter type to atomic_t
As a preparation for running proto ops functions without rtnl lock, change
offload counter type to atomic. This is necessary to allow updating the
counter by multiple concurrent users when offloading filters to hardware
from unlocked classifiers.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Vlad Buslov
4f8116c850 net: sched: protect block offload-related fields with rw_semaphore
In order to remove dependency on rtnl lock, extend tcf_block with 'cb_lock'
rwsem and use it to protect flow_block->cb_list and related counters from
concurrent modification. The lock is taken in read mode for read-only
traversal of cb_list in tc_setup_cb_call() and write mode in all other
cases. This approach ensures that:

- cb_list is not changed concurrently while filters is being offloaded on
  block.

- block->nooffloaddevcnt is checked while holding the lock in read mode,
  but is only changed by bind/unbind code when holding the cb_lock in write
  mode to prevent concurrent modification.

Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-26 14:17:43 -07:00
Trond Myklebust
c82e5472c9 SUNRPC: Handle connection breakages correctly in call_status()
If the connection breaks while we're waiting for a reply from the
server, then we want to immediately try to reconnect.

Fixes: ec6017d903 ("SUNRPC fix regression in umount of a secure mount")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26 15:31:29 -04:00
Trond Myklebust
d5711920ec Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated"
This reverts commit a79f194aa4.
The mechanism for aborting I/O is racy, since we are not guaranteed that
the request is asleep while we're changing both task->tk_status and
task->tk_action.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v5.1
2019-08-26 15:31:29 -04:00
Trond Myklebust
80f455da6c SUNRPC: Handle EADDRINUSE and ENOBUFS correctly
If a connect or bind attempt returns EADDRINUSE, that means we want to
retry with a different port. It is not a fatal connection error.
Similarly, ENOBUFS is not fatal, but just indicates a memory allocation
issue. Retry after a short delay.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-08-26 15:31:29 -04:00
Trond Myklebust
bd736ed3e2 SUNRPC: Don't handle errors if the bind/connect succeeded
Don't handle errors in call_bind_status()/call_connect_status()
if it turns out that a previous call caused it to succeed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v5.1+
2019-08-26 15:31:29 -04:00
Yi-Hung Wei
7177895154 openvswitch: Fix conntrack cache with timeout
This patch addresses a conntrack cache issue with timeout policy.
Currently, we do not check if the timeout extension is set properly in the
cached conntrack entry.  Thus, after packet recirculate from conntrack
action, the timeout policy is not applied properly.  This patch fixes the
aforementioned issue.

Fixes: 06bd2bdf19 ("openvswitch: Add timeout support to ct action")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-25 14:48:43 -07:00
Alexey Kodanev
803f3e22ae ipv4: mpls: fix mpls_xmit for iptunnel
When using mpls over gre/gre6 setup, rt->rt_gw4 address is not set, the
same for rt->rt_gw_family.  Therefore, when rt->rt_gw_family is checked
in mpls_xmit(), neigh_xmit() call is skipped. As a result, such setup
doesn't work anymore.

This issue was found with LTP mpls03 tests.

Fixes: 1550c17193 ("ipv4: Prepare rtable for IPv6 gateway")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-25 14:34:08 -07:00
Zhu Yanjun
e0e6d06282 net: rds: add service level support in rds-info
>From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
is used to identify different flows within an IBA subnet.
It is carried in the local route header of the packet.

Before this commit, run "rds-info -I". The outputs are as
below:
"
RDS IB Connections:
 LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
192.2.95.3  192.2.95.1  2   0  fe80::21:28:1a:39  fe80::21:28:10:b9
192.2.95.3  192.2.95.1  1   0  fe80::21:28:1a:39  fe80::21:28:10:b9
192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
"
After this commit, the output is as below:
"
RDS IB Connections:
 LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
192.2.95.3  192.2.95.1  2   2  fe80::21:28:1a:39  fe80::21:28:10:b9
192.2.95.3  192.2.95.1  1   1  fe80::21:28:1a:39  fe80::21:28:10:b9
192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
"

The commit fe3475af3b ("net: rds: add per rds connection cache
statistics") adds cache_allocs in struct rds_info_rdma_connection
as below:
struct rds_info_rdma_connection {
...
        __u32           rdma_mr_max;
        __u32           rdma_mr_size;
        __u8            tos;
        __u32           cache_allocs;
 };
The peer struct in rds-tools of struct rds_info_rdma_connection is as
below:
struct rds_info_rdma_connection {
...
        uint32_t        rdma_mr_max;
        uint32_t        rdma_mr_size;
        uint8_t         tos;
        uint8_t         sl;
        uint32_t        cache_allocs;
};
The difference between userspace and kernel is the member variable sl.
In the kernel struct, the member variable sl is missing. This will
introduce risks. So it is necessary to use this commit to avoid this risk.

Fixes: fe3475af3b ("net: rds: add per rds connection cache statistics")
CC: Joe Jin <joe.jin@oracle.com>
CC: JUNXIAO_BI <junxiao.bi@oracle.com>
Suggested-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 16:55:25 -07:00
John Fastabend
e93fb3e952 net: route dump netlink NLM_F_MULTI flag missing
An excerpt from netlink(7) man page,

  In multipart messages (multiple nlmsghdr headers with associated payload
  in one byte stream) the first and all following headers have the
  NLM_F_MULTI flag set, except for the last  header  which  has the type
  NLMSG_DONE.

but, after (ee28906) there is a missing NLM_F_MULTI flag in the middle of a
FIB dump. The result is user space applications following above man page
excerpt may get confused and may stop parsing msg believing something went
wrong.

In the golang netlink lib [0] the library logic stops parsing believing the
message is not a multipart message. Found this running Cilium[1] against
net-next while adding a feature to auto-detect routes. I noticed with
multiple route tables we no longer could detect the default routes on net
tree kernels because the library logic was not returning them.

Fix this by handling the fib_dump_info_fnhe() case the same way the
fib_dump_info() handles it by passing the flags argument through the
call chain and adding a flags argument to rt_fill_info().

Tested with Cilium stack and auto-detection of routes works again. Also
annotated libs to dump netlink msgs and inspected NLM_F_MULTI and
NLMSG_DONE flags look correct after this.

Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same
as is done for fib_dump_info() so this looks correct to me.

[0] https://github.com/vishvananda/netlink/
[1] https://github.com/cilium/

Fixes: ee28906fd7 ("ipv4: Dump route exceptions if requested")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 16:49:48 -07:00
zhanglin
b45ce32135 sock: fix potential memory leak in proto_register()
If protocols registered exceeded PROTO_INUSE_NR, prot will be
added to proto_list, but no available bit left for prot in
proto_inuse_idx.

Changes since v2:
* Propagate the error code properly

Signed-off-by: zhanglin <zhang.lin16@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 16:33:14 -07:00
Markus Elfring
dd016aca28 net/core/skmsg: Delete an unnecessary check before the function call “consume_skb”
The consume_skb() function performs also input parameter validation.
Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 16:24:53 -07:00
Hangbin Liu
c3b4c3a47e xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode
In decode_session{4,6} there is a possibility that the skb dst dev is NULL,
e,g, with tunnel collect_md mode, which will cause kernel crash.
Here is what the code path looks like, for GRE:

- ip6gre_tunnel_xmit
  - ip6gre_xmit_ipv6
    - __gre6_xmit
      - ip6_tnl_xmit
        - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
    - icmpv6_send
      - icmpv6_route_lookup
        - xfrm_decode_session_reverse
          - decode_session4
            - oif = skb_dst(skb)->dev->ifindex; <-- here
          - decode_session6
            - oif = skb_dst(skb)->dev->ifindex; <-- here

The reason is __metadata_dst_init() init dst->dev to NULL by default.
We could not fix it in __metadata_dst_init() as there is no dev supplied.
On the other hand, the skb_dst(skb)->dev is actually not needed as we
called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not
used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif;

So make a dst dev check here should be clean and safe.

v4: No changes.

v3: No changes.

v2: fix the issue in decode_session{4,6} instead of updating shared dst dev
in {ip_md, ip6}_tunnel_xmit.

Fixes: 8d79266bc4 ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 14:49:35 -07:00
Hangbin Liu
e2c6939341 ipv4/icmp: fix rt dst dev null pointer dereference
In __icmp_send() there is a possibility that the rt->dst.dev is NULL,
e,g, with tunnel collect_md mode, which will cause kernel crash.
Here is what the code path looks like, for GRE:

- ip6gre_tunnel_xmit
  - ip6gre_xmit_ipv4
    - __gre6_xmit
      - ip6_tnl_xmit
        - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
    - icmp_send
      - net = dev_net(rt->dst.dev); <-- here

The reason is __metadata_dst_init() init dst->dev to NULL by default.
We could not fix it in __metadata_dst_init() as there is no dev supplied.
On the other hand, the reason we need rt->dst.dev is to get the net.
So we can just try get it from skb->dev when rt->dst.dev is NULL.

v4: Julian Anastasov remind skb->dev also could be NULL. We'd better
still use dst.dev and do a check to avoid crash.

v3: No changes.

v2: fix the issue in __icmp_send() instead of updating shared dst dev
in {ip_md, ip6}_tunnel_xmit.

Fixes: c8b34e680a ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 14:49:35 -07:00
Yi-Hung Wei
12c6bc38f9 openvswitch: Fix log message in ovs conntrack
Fixes: 06bd2bdf19 ("openvswitch: Add timeout support to ct action")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 14:18:59 -07:00
David S. Miller
12e2e15d83 Merge branch 'ieee802154-for-davem-2019-08-24' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:

====================
pull-request: ieee802154 for net 2019-08-24

An update from ieee802154 for your *net* tree.

Yue Haibing fixed two bugs discovered by KASAN in the hwsim driver for
ieee802154 and Colin Ian King cleaned up a redundant variable assignment.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-24 13:46:57 -07:00
David S. Miller
211c462452 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2019-08-24

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei.

2) Fix a use-after-free in prog symbol exposure, from Daniel.

3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya.

4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan.

5) Fix a potential use-after-free on flow dissector detach, from Jakub.

6) Fix bpftool to close prog fd after showing metadata, from Quentin.

7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-23 17:34:11 -07:00
Ilya Leoshkevich
2c238177bd bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0
test_select_reuseport fails on s390 due to verifier rejecting
test_select_reuseport_kern.o with the following message:

	; data_check.eth_protocol = reuse_md->eth_protocol;
	18: (69) r1 = *(u16 *)(r6 +22)
	invalid bpf_context access off=22 size=2

This is because on big-endian machines casts from __u32 to __u16 are
generated by referencing the respective variable as __u16 with an offset
of 2 (as opposed to 0 on little-endian machines).

The verifier already has all the infrastructure in place to allow such
accesses, it's just that they are not explicitly enabled for
eth_protocol field. Enable them for eth_protocol field by using
bpf_ctx_range instead of offsetof.

Ditto for ip_protocol, bind_inany and len, since they already allow
narrowing, and the same problem can arise when working with them.

Fixes: 2dbb9b9e6d ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-24 01:25:41 +02:00
Jakub Sitnicki
db38de3968 flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH
Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to
free the program once a grace period has elapsed. The callback can run
together with new RCU readers that started after the last grace period.
New RCU readers can potentially see the "old" to-be-freed or already-freed
pointer to the program object before the RCU update-side NULLs it.

Reorder the operations so that the RCU update-side resets the protected
pointer before the end of the grace period after which the program will be
freed.

Fixes: d58e468b11 ("flow_dissector: implements flow dissector BPF hook")
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Petar Penkov <ppenkov@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-24 01:15:34 +02:00
Ido Schimmel
bd1200b795 drop_monitor: Make timestamps y2038 safe
Timestamps are currently communicated to user space as 'struct
timespec', which is not considered y2038 safe since it uses a 32-bit
signed value for seconds.

Fix this while the API is still not part of any official kernel release
by using 64-bit nanoseconds timestamps instead.

Fixes: ca30707dee ("drop_monitor: Add packet alert mode")
Fixes: 5e58109b1e ("drop_monitor: Add support for packet alert mode for hardware drops")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-23 14:58:07 -07:00
Dag Moxnes
bf1867db9b net/rds: Whitelist rdma_cookie and rx_tstamp for usercopy
Add the RDMA cookie and RX timestamp to the usercopy whitelist.

After the introduction of hardened usercopy whitelisting
(https://lwn.net/Articles/727322/), a warning is displayed when the
RDMA cookie or RX timestamp is copied to userspace:

kernel: WARNING: CPU: 3 PID: 5750 at
mm/usercopy.c:81 usercopy_warn+0x8e/0xa6
[...]
kernel: Call Trace:
kernel: __check_heap_object+0xb8/0x11b
kernel: __check_object_size+0xe3/0x1bc
kernel: put_cmsg+0x95/0x115
kernel: rds_recvmsg+0x43d/0x620 [rds]
kernel: sock_recvmsg+0x43/0x4a
kernel: ___sys_recvmsg+0xda/0x1e6
kernel: ? __handle_mm_fault+0xcae/0xf79
kernel: __sys_recvmsg+0x51/0x8a
kernel: SyS_recvmsg+0x12/0x1c
kernel: do_syscall_64+0x79/0x1ae

When the whitelisting feature was introduced, the memory for the RDMA
cookie and RX timestamp in RDS was not added to the whitelist, causing
the warning above.

Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
Tested-by: Jenny <jenny.x.xu@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-23 14:55:52 -07:00
Sabrina Dubroca
db0b99f59a ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev
Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails,
ignoring the specific error value. This results in addrconf_add_dev
returning ENOBUFS in all cases, which is unfortunate in cases such as:

    # ip link add dummyX type dummy
    # ip link set dummyX mtu 1200 up
    # ip addr add 2000::/64 dev dummyX
    RTNETLINK answers: No buffer space available

Commit a317a2f19d ("ipv6: fail early when creating netdev named all
or default") introduced error returns in ipv6_add_dev. Before that,
that function would simply return NULL for all failures.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-23 14:53:06 -07:00
Xin Long
c7a42eb492 net: ipv6: fix listify ip6_rcv_finish in case of forwarding
We need a similar fix for ipv6 as Commit 0761680d52 ("net: ipv4: fix
listify ip_rcv_finish in case of forwarding") does for ipv4.

This issue can be reprocuded by syzbot since Commit 323ebb61e3 ("net:
use listified RX for handling GRO_NORMAL skbs") on net-next. The call
trace was:

  kernel BUG at include/linux/skbuff.h:2225!
  RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
  RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
  Call Trace:
    sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
    sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
    sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
    sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
    sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
    ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
    ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
    NF_HOOK include/linux/netfilter.h:305 [inline]
    NF_HOOK include/linux/netfilter.h:299 [inline]
    ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
    dst_input include/net/dst.h:442 [inline]
    ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
    ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline]
    ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282
    ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316
    __netif_receive_skb_list_ptype net/core/dev.c:5049 [inline]
    __netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097
    __netif_receive_skb_list net/core/dev.c:5149 [inline]
    netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244
    gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757
    gro_normal_list net/core/dev.c:5755 [inline]
    gro_normal_one net/core/dev.c:5769 [inline]
    napi_frags_finish net/core/dev.c:5782 [inline]
    napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855
    tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974
    tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020

Fixes: d8269e2cbf ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
Fixes: 323ebb61e3 ("net: use listified RX for handling GRO_NORMAL skbs")
Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
Reported-by: syzbot+4a0643a653ac375612d1@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-23 14:42:14 -07:00
Sven Eckelmann
0ff0f15a32 batman-adv: Only read OGM2 tvlv_len after buffer len check
Multiple batadv_ogm2_packet can be stored in an skbuff. The functions
batadv_v_ogm_send_to_if() uses batadv_v_ogm_aggr_packet() to check if there
is another additional batadv_ogm2_packet in the skb or not before they
continue processing the packet.

The length for such an OGM2 is BATADV_OGM2_HLEN +
batadv_ogm2_packet->tvlv_len. The check must first check that at least
BATADV_OGM2_HLEN bytes are available before it accesses tvlv_len (which is
part of the header. Otherwise it might try read outside of the currently
available skbuff to get the content of tvlv_len.

Fixes: 9323158ef9 ("batman-adv: OGMv2 - implement originators logic")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2019-08-23 18:20:31 +02:00
Sven Eckelmann
a15d56a607 batman-adv: Only read OGM tvlv_len after buffer len check
Multiple batadv_ogm_packet can be stored in an skbuff. The functions
batadv_iv_ogm_send_to_if()/batadv_iv_ogm_receive() use
batadv_iv_ogm_aggr_packet() to check if there is another additional
batadv_ogm_packet in the skb or not before they continue processing the
packet.

The length for such an OGM is BATADV_OGM_HLEN +
batadv_ogm_packet->tvlv_len. The check must first check that at least
BATADV_OGM_HLEN bytes are available before it accesses tvlv_len (which is
part of the header. Otherwise it might try read outside of the currently
available skbuff to get the content of tvlv_len.

Fixes: ef26157747 ("batman-adv: tvlv - basic infrastructure")
Reported-by: syzbot+355cab184197dbbfa384@syzkaller.appspotmail.com
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Acked-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2019-08-23 18:20:17 +02:00
Linus Torvalds
4e56394490 Three important fixes tagged for stable (an indefinite hang, a crash on
an assert and a NULL pointer dereference) plus a small series from Luis
 fixing instances of vfree() under spinlock.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl1f2fITHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi83fB/0a+TnNY8Q2aEeB9Y/0sckSpRCsMGMV
 syt2krwKC0EYM1f2dkJdgCjlSjMzMcHPseP3g5odRXgyPKJt5O9oE7l3vGDC4Oyt
 chqhEh86UzG6Kcptx6tIzsAGYS9S4NzxR5sfXF6oRu8m1bwk1n5IhKxYjQDTvAMd
 RxwvpdguNA9xvHeUvLMTpy2R3qE3uQ2dxierutW67GeyeCPkvyBmazzi72Q36hlL
 y1w8DWaPBemBk5QEM9vmz5i2xQeLO4h4ejhP4LcXyVjJtfvAPl0JWOsHMK4uWRJf
 6XjbGDaGYvID0hTQLlEw/k73976HmRxSbaXRtCZN+IG3yWGTL8ID6GqI
 =kaFB
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Three important fixes tagged for stable (an indefinite hang, a crash
  on an assert and a NULL pointer dereference) plus a small series from
  Luis fixing instances of vfree() under spinlock"

* tag 'ceph-for-5.3-rc6' of git://github.com/ceph/ceph-client:
  libceph: fix PG split vs OSD (re)connect race
  ceph: don't try fill file_lock on unsuccessful GETFILELOCK reply
  ceph: clear page dirty before invalidate page
  ceph: fix buffer free while holding i_ceph_lock in fill_inode()
  ceph: fix buffer free while holding i_ceph_lock in __ceph_build_xattrs_blob()
  ceph: fix buffer free while holding i_ceph_lock in __ceph_setxattr()
  libceph: allow ceph_buffer_put() to receive a NULL ceph_buffer
2019-08-23 09:19:38 -07:00
Ben Wei
6d24e14140 net/ncsi: update response packet length for GCPS/GNS/GNPTS commands
Update response packet length for the following commands per NC-SI spec
- Get Controller Packet Statistics
- Get NC-SI Statistics
- Get NC-SI Pass-through Statistics command

Signed-off-by: Ben Wei <benwei@fb.com>
Reviewed-by: Justin Lee <justin.lee1@dell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-22 19:28:04 -07:00
Justin.Lee1@Dell.com
f6edbf2d61 net/ncsi: Fix the payload copying for the request coming from Netlink
The request coming from Netlink should use the OEM generic handler.

The standard command handler expects payload in bytes/words/dwords
but the actual payload is stored in data if the request is coming from Netlink.

Signed-off-by: Justin Lee <justin.lee1@dell.com>
Reviewed-by: Vijay Khemka <vijaykhemka@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-22 19:27:02 -07:00
Colin Ian King
c76c992525 nexthops: remove redundant assignment to variable err
Variable err is initialized to a value that is never read and it is
re-assigned later. The initialization is redundant and can be removed.

Addresses-Coverity: ("Unused Value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-22 12:14:05 -07:00
Ilya Dryomov
a561372405 libceph: fix PG split vs OSD (re)connect race
We can't rely on ->peer_features in calc_target() because it may be
called both when the OSD session is established and open and when it's
not.  ->peer_features is not valid unless the OSD session is open.  If
this happens on a PG split (pg_num increase), that could mean we don't
resend a request that should have been resent, hanging the client
indefinitely.

In userspace this was fixed by looking at require_osd_release and
get_xinfo[osd].features fields of the osdmap.  However these fields
belong to the OSD section of the osdmap, which the kernel doesn't
decode (only the client section is decoded).

Instead, let's drop this feature check.  It effectively checks for
luminous, so only pre-luminous OSDs would be affected in that on a PG
split the kernel might resend a request that should not have been
resent.  Duplicates can occur in other scenarios, so both sides should
already be prepared for them: see dup/replay logic on the OSD side and
retry_attempt check on the client side.

Cc: stable@vger.kernel.org
Fixes: 7de030d6b1 ("libceph: resend on PG splits if OSD has RESEND_ON_SPLIT")
Link: https://tracker.ceph.com/issues/41162
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Jerry Lee <leisurelysw24@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2019-08-22 10:47:41 +02:00
Li RongQing
0f404bbdaf net: fix icmp_socket_deliver argument 2 input
it expects a unsigned int, but got a __be32

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-21 20:43:22 -07:00
Hangbin Liu
f17f7648a4 ipv6/addrconf: allow adding multicast addr if IFA_F_MCAUTOJOIN is set
In commit 93a714d6b5 ("multicast: Extend ip address command to enable
multicast group join/leave on") we added a new flag IFA_F_MCAUTOJOIN
to make user able to add multicast address on ethernet interface.

This works for IPv4, but not for IPv6. See the inet6_addr_add code.

static int inet6_addr_add()
{
	...
	if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
		ipv6_mc_config(net->ipv6.mc_autojoin_sk, true...)
	}

	ifp = ipv6_add_addr(idev, cfg, true, extack); <- always fail with maddr
	if (!IS_ERR(ifp)) {
		...
	} else if (cfg->ifa_flags & IFA_F_MCAUTOJOIN) {
		ipv6_mc_config(net->ipv6.mc_autojoin_sk, false...)
	}
}

But in ipv6_add_addr() it will check the address type and reject multicast
address directly. So this feature is never worked for IPv6.

We should not remove the multicast address check totally in ipv6_add_addr(),
but could accept multicast address only when IFA_F_MCAUTOJOIN flag supplied.

v2: update commit description

Fixes: 93a714d6b5 ("multicast: Extend ip address command to enable multicast group join/leave on")
Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-21 20:39:29 -07:00
David S. Miller
6e2866a9df Here is a batman-adv bugfix:
- fix uninit-value in batadv_netlink_get_ifindex(), by Eric Dumazet
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAl1dRtEWHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoeN6D/9IExhEX0OZQFf6fkHJ71yKAN0j
 FjBBmumtu33brjFCtieix+jBRGny9qAO80I2q3X2yRQlx/SpHVuXUyHPrN+azNMJ
 w2mb7Mw1l3+Y4KbdhM1dz+CTafmdJzgZoFsKIAamD+IP5uIUg1sSL4mLcgka35ol
 GFZa82LE7hjTY1sR0azKCbbeAIlwMkoLWy5YsUmIDzJZ6eWJSEt/kEuNIWuhIhl4
 VZeT6VH96xWSY1rqwgIiv0fXBuiE/n3//nMY5WquM0yFA1XJBG9jRFhmNcca8oPi
 ehodBtZBlHyBXnAnDkSEJY7nux3+jsj4N4X+LDjmv6yWH5EmqMvM3NysXalM6sBj
 SZeZyoxJZxl+WE658ZPoVawg3NuABOdOuY6tw/FLvFtBH3rKQzvlz0nXp19m1kef
 TrGjCsJsVO/pnrXlQMB0+JnQooRId/UGUTHfgqqP2hySn6a2KcGrxCd7tWxBQ5oW
 a3DSmVo7QEwxvEPnIPpxPFv698Rkheb+gQZVrOBjmE4YmdmuaCFUDt04zDRwFjsz
 dH+tX3kgQvHZEwVqWylikNqkFDWuako7+N1fZ9DxWZh8tnDe6FXXgQ+DL80nuATS
 DGI7ojpAhe8YUyCqOHsCD3lW1hHUmtmzj/N2OrN+U2ROO745RCqs9Co+Kn2ugRhq
 YOdgbYJL6aRzu/D52w==
 =9t+O
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-for-davem-20190821' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
Here is a batman-adv bugfix:

 - fix uninit-value in batadv_netlink_get_ifindex(), by Eric Dumazet
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-21 13:49:20 -07:00
Felix Fietkau
48cb39522a mac80211: minstrel_ht: improve rate probing for devices with static fallback
On some devices that only support static rate fallback tables sending rate
control probing packets can be really expensive.
Probing lower rates can already hurt throughput quite a bit. What hurts even
more is the fact that on mt76x0/mt76x2, single probing packets can only be
forced by directing packets at a different internal hardware queue, which
causes some heavy reordering and extra latency.
The reordering issue is mainly problematic while pushing lots of packets to
a particular station. If there is little activity, the overhead of probing is
neglegible.

The static fallback behavior is designed to pretty much only handle rate
control algorithms that use only a very limited set of rates on which the
algorithm switches up/down based on packet error rate.

In order to better support that kind of hardware, this patch implements a
different approach to rate probing where it switches to a slightly higher rate,
waits for tx status feedback, then updates the stats and switches back to
the new max throughput rate. This only triggers above a packet rate of 100
per stats interval (~50ms).
For that kind of probing, the code has to reduce the set of probing rates
a lot more compared to single packet probing, so it uses only one packet
per MCS group which is either slightly faster, or as close as possible to
the max throughput rate.
This allows switching between similar rates with different numbers of
streams. The algorithm assumes that the hardware will work its way lower
within an MCS group in case of retransmissions, so that lower rates don't
have to be probed by the high packets per second rate probing code.

To further reduce the search space, it also does not probe rates with lower
channel bandwidth than the max throughput rate.

At the moment, these changes will only affect mt76x0/mt76x2.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20190820095449.45255-4-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 11:10:13 +02:00
Felix Fietkau
21f7981b4b mac80211: minstrel_ht: fix default max throughput rate indexes
Use the first supported rate instead of 0 (which can be invalid)

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20190820095449.45255-3-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 11:09:37 +02:00
Felix Fietkau
f793c7eedd mac80211: minstrel_ht: reduce unnecessary rate probing attempts
On hardware with static fallback tables (e.g. mt76x2), rate probing attempts
can be very expensive.
On such devices, avoid sampling rates slower than the per-group max throughput
rate, based on the assumption that the fallback table will take care of probing
lower rates within that group if the higher rates fail.

To further reduce unnecessary probing attempts, skip duplicate attempts on
rates slower than the max throughput rate.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20190820095449.45255-2-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 11:09:20 +02:00
Felix Fietkau
56dd918ff0 mac80211: minstrel_ht: fix per-group max throughput rate initialization
The group number needs to be multiplied by the number of rates per group
to get the full rate index

Fixes: 5935839ad7 ("mac80211: improve minstrel_ht rate sorting by throughput & probability")
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20190820095449.45255-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 11:08:21 +02:00
Alexei Avshalom Lazar
2a38075cd0 nl80211: Add support for EDMG channels
802.11ay specification defines Enhanced Directional Multi-Gigabit
(EDMG) STA and AP which allow channel bonding of 2 channels and more.

Introduce new NL attributes that are needed for enabling and
configuring EDMG support.

Two new attributes are used by kernel to publish driver's EDMG
capabilities to the userspace:
NL80211_BAND_ATTR_EDMG_CHANNELS - bitmap field that indicates the 2.16
GHz channel(s) that are supported by the driver.
When this attribute is not set it means driver does not support EDMG.
NL80211_BAND_ATTR_EDMG_BW_CONFIG - represent the channel bandwidth
configurations supported by the driver.

Additional two new attributes are used by the userspace for connect
command and for AP configuration:
NL80211_ATTR_WIPHY_EDMG_CHANNELS
NL80211_ATTR_WIPHY_EDMG_BW_CONFIG

New rate info flag - RATE_INFO_FLAGS_EDMG, can be reported from driver
and used for bitrate calculation that will take into account EDMG
according to the 802.11ay specification.

Signed-off-by: Alexei Avshalom Lazar <ailizaro@codeaurora.org>
Link: https://lore.kernel.org/r/1566138918-3823-2-git-send-email-ailizaro@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 11:07:35 +02:00
John Crispin
5db16ba82f mac80211: fix possible NULL pointerderef in obss pd code
he_spr_ie_elem is dereferenced before the NULL check. fix this by moving
the assignment after the check.

fixes commit 697f6c507c ("mac80211: propagate HE operation info into
bss_conf")

This was reported by the static code checker.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190813070712.25509-1-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:58:32 +02:00
Ben Greear
9cf0233888 mac80211: add assoc-at support
Report timestamp for when sta becomes associated.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20190809180001.26393-2-greearb@candelatech.com
[fix ktime_get_boot_ns() to ktime_get_boottime_ns(), assoc_at type to u64]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:58:10 +02:00
Ben Greear
6c7a00339e cfg80211: Support assoc-at timer in sta-info
Report timestamp of when sta became associated.

This is the boottime clock, units are nano-seconds.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20190809180001.26393-1-greearb@candelatech.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:56:42 +02:00
Arend van Spriel
62524a5857 cfg80211: apply same mandatory rate flags for 5GHz and 6GHz
For the new 6GHz band the same rules apply for mandatory rates so
add it to set_mandatory_flags_band() function.

Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Leon Zegers <leon.zegers@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://lore.kernel.org/r/1564745465-21234-9-git-send-email-arend.vanspriel@broadcom.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:55:09 +02:00
Arend van Spriel
5ea4e7802c cfg80211: ibss: use 11a mandatory rates for 6GHz band operation
The default mandatory rates, ie. when not specified by user-space, is
determined by the band. Select 11a rateset for 6GHz band.

Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Leon Zegers <leon.zegers@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://lore.kernel.org/r/1564745465-21234-8-git-send-email-arend.vanspriel@broadcom.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:54:57 +02:00
Arend van Spriel
0816e6b117 cfg80211: use same IR permissive rules for 6GHz band
The function cfg80211_ir_permissive_chan() is applicable for
6GHz band as well so make sure it is handled.

Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Leon Zegers <leon.zegers@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://lore.kernel.org/r/1564745465-21234-7-git-send-email-arend.vanspriel@broadcom.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:54:35 +02:00
Arend van Spriel
e548a1c36b cfg80211: add 6GHz in code handling array with NUM_NL80211_BANDS entries
In nl80211.c there is a policy for all bands in NUM_NL80211_BANDS and
in trace.h there is a callback trace for multicast rates which is per
band in NUM_NL80211_BANDS. Both need to be extended for the new
NL80211_BAND_6GHZ.

Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Leon Zegers <leon.zegers@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://lore.kernel.org/r/1564745465-21234-6-git-send-email-arend.vanspriel@broadcom.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:54:12 +02:00
Arend van Spriel
852f04620e cfg80211: extend ieee80211_operating_class_to_band() for 6GHz
Add 6GHz operating class range as defined in 802.11ax D4.1 Annex E.

Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Leon Zegers <leon.zegers@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://lore.kernel.org/r/1564745465-21234-5-git-send-email-arend.vanspriel@broadcom.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:53:50 +02:00
Arend van Spriel
fa1f1085bc cfg80211: util: add 6GHz channel to freq conversion and vice versa
Extend the functions ieee80211_channel_to_frequency() and
ieee80211_frequency_to_channel() to support 6GHz band according
specification in 802.11ax D4.1 27.3.22.2.

Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Leon Zegers <leon.zegers@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://lore.kernel.org/r/1564745465-21234-4-git-send-email-arend.vanspriel@broadcom.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:53:31 +02:00
Arend van Spriel
f89769cfdd cfg80211: add 6GHz UNII band definitions
For the new 6GHz there are new UNII band definitions as listed
in the FCC notice [1].

[1] https://docs.fcc.gov/public/attachments/FCC-18-147A1_Rcd.pdf

Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Leon Zegers <leon.zegers@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://lore.kernel.org/r/1564745465-21234-3-git-send-email-arend.vanspriel@broadcom.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:53:11 +02:00
Arend van Spriel
c5b9a7f826 nl80211: add 6GHz band definition to enum nl80211_band
In the 802.11ax specification a new band is introduced, which
is also proposed by FCC for unlicensed use. This band is referred
to as 6GHz spanning frequency range from 5925 to 7125 MHz.

Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Leon Zegers <leon.zegers@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://lore.kernel.org/r/1564745465-21234-2-git-send-email-arend.vanspriel@broadcom.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:50:20 +02:00
Hodaszi, Robert
0d31d4dbf3 Revert "cfg80211: fix processing world regdomain when non modular"
This reverts commit 96cce12ff6 ("cfg80211: fix processing world
regdomain when non modular").

Re-triggering a reg_process_hint with the last request on all events,
can make the regulatory domain fail in case of multiple WiFi modules. On
slower boards (espacially with mdev), enumeration of the WiFi modules
can end up in an intersected regulatory domain, and user cannot set it
with 'iw reg set' anymore.

This is happening, because:
- 1st module enumerates, queues up a regulatory request
- request gets processed by __reg_process_hint_driver():
  - checks if previous was set by CORE -> yes
    - checks if regulator domain changed -> yes, from '00' to e.g. 'US'
      -> sends request to the 'crda'
- 2nd module enumerates, queues up a regulator request (which triggers
  the reg_todo() work)
- reg_todo() -> reg_process_pending_hints() sees, that the last request
  is not processed yet, so it tries to process it again.
  __reg_process_hint driver() will run again, and:
  - checks if the last request's initiator was the core -> no, it was
    the driver (1st WiFi module)
  - checks, if the previous initiator was the driver -> yes
    - checks if the regulator domain changed -> yes, it was '00' (set by
      core, and crda call did not return yet), and should be changed to 'US'

------> __reg_process_hint_driver calls an intersect

Besides, the reg_process_hint call with the last request is meaningless
since the crda call has a timeout work. If that timeout expires, the
first module's request will lost.

Cc: stable@vger.kernel.org
Fixes: 96cce12ff6 ("cfg80211: fix processing world regdomain when non modular")
Signed-off-by: Robert Hodaszi <robert.hodaszi@digi.com>
Link: https://lore.kernel.org/r/20190614131600.GA13897@a1-hr
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:43:03 +02:00
John Crispin
3a00f08140 mac80211: add missing length field increment when generating Radiotap header
The code generating the Tx Radiotap header when using tx_status_ext was
missing a field increment after setting the VHT bandwidth.

Fixes: 3d07ffcaf3 ("mac80211: add struct ieee80211_tx_status support to ieee80211_add_tx_radiotap_header")
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190807075949.32414-4-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:33:45 +02:00
John Crispin
a027c3041f mac80211: 80Mhz was not reported properly when using tx_status_ext
When reporting 80MHz, we need to set 4 and not 2 inside the corresponding
field inside the Tx Radiotap header.

Fixes: 3d07ffcaf3 ("mac80211: add struct ieee80211_tx_status support to ieee80211_add_tx_radiotap_header")
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190807075949.32414-3-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:33:26 +02:00
John Crispin
8db6e7367d mac80211: fix bad guard when reporting legacy rates
When reporting legacy rates inside the TX Radiotap header we need to split
the check between "uses tx_statua_ext" and "is legacy rate". Not doing so
would make the code drop into the !tx_status_ext path.

Fixes: 3d07ffcaf3 ("mac80211: add struct ieee80211_tx_status support to ieee80211_add_tx_radiotap_header")
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190807075949.32414-2-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:32:57 +02:00
John Crispin
dc649d649a mac80211: fix TX legacy rate reporting when tx_status_ext is used
The RX Radiotap header length was not calculated properly when reporting
legacy rates using tx_status_ext.

Fixes: 3d07ffcaf3 ("mac80211: add struct ieee80211_tx_status support to ieee80211_add_tx_radiotap_header")
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20190807075949.32414-1-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:32:06 +02:00
Alexander Wetzel
b67fd72e84 cfg80211: Fix Extended Key ID key install checks
Fix two shortcomings in the Extended Key ID API:

 1) Allow the userspace to install pairwise keys using keyid 1 without
    NL80211_KEY_NO_TX set. This allows the userspace to install and
    activate pairwise keys with keyid 1 in the same way as for keyid 0,
    simplifying the API usage for e.g. FILS and FT key installs.

 2) IEEE 802.11 - 2016 restricts Extended Key ID usage to CCMP/GCMP
    ciphers in IEEE 802.11 - 2016 "9.4.2.25.4 RSN capabilities".
    Enforce that when installing a key.

Cc: stable@vger.kernel.org # 5.2
Fixes: 6cdd3979a2 ("nl80211/cfg80211: Extended Key ID support")
Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
Link: https://lore.kernel.org/r/20190805123400.51567-1-alexander@wetzel-home.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:19:10 +02:00
Johannes Berg
5fd2f91ad4 mac80211: fix possible sta leak
If TDLS station addition is rejected, the sta memory is leaked.
Avoid this by moving the check before the allocation.

Cc: stable@vger.kernel.org
Fixes: 7ed5285396 ("mac80211: don't initiate TDLS connection if station is not associated to AP")
Link: https://lore.kernel.org/r/20190801073033.7892-1-johannes@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2019-08-21 10:15:33 +02:00
Li RongQing
a1c4cd6784 net: fix __ip_mc_inc_group usage
in ip_mc_inc_group, memory allocation flag, not mcast mode, is expected
by __ip_mc_inc_group

similar issue in __ip_mc_join_group, both mcase mode and gfp_t are needed
here, so use ____ip_mc_inc_group(...)

Fixes: 9fb20801da ("net: Fix ip_mc_{dec,inc}_group allocation context")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20 12:48:06 -07:00
Terry S. Duncan
96a1b033ac net/ncsi: Ensure 32-bit boundary for data cksum
The NCSI spec indicates that if the data does not end on a 32 bit
boundary, one to three padding bytes equal to 0x00 shall be present to
align the checksum field to a 32-bit boundary.

Signed-off-by: Terry S. Duncan <terry.s.duncan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20 12:42:40 -07:00
Vivien Didelot
0394a63acf net: dsa: enable and disable all ports
Call the .port_enable and .port_disable functions for all ports,
not only the user ports, so that drivers may optimize the power
consumption of all ports after a successful setup.

Unused ports are now disabled on setup. CPU and DSA ports are now
enabled on setup and disabled on teardown. User ports were already
enabled at slave creation and disabled at slave destruction.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20 12:33:49 -07:00
Vivien Didelot
955222ca52 net: dsa: use a single switch statement for port setup
It is currently difficult to read the different steps involved in the
setup and teardown of ports in the DSA code. Keep it simple with a
single switch statement for each port type: UNUSED, CPU, DSA, or USER.

Also no need to call devlink_port_unregister from within dsa_port_setup
as this step is inconditionally handled by dsa_port_teardown on error.

Signed-off-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20 12:33:49 -07:00
Jason Baron
4651d1802f net/smc: make sure EPOLLOUT is raised
Currently, we are only explicitly setting SOCK_NOSPACE on a write timeout
for non-blocking sockets. Epoll() edge-trigger mode relies on SOCK_NOSPACE
being set when -EAGAIN is returned to ensure that EPOLLOUT is raised.
Expand the setting of SOCK_NOSPACE to non-blocking sockets as well that can
use SO_SNDTIMEO to adjust their write timeout. This mirrors the behavior
that Eric Dumazet introduced for tcp sockets.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Ursula Braun <ubraun@linux.ibm.com>
Cc: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-20 12:25:14 -07:00
Ivan Khoronzhuk
fb89c39455 xdp: unpin xdp umem pages in error path
Fix mem leak caused by missed unpin routine for umem pages.

Fixes: 8aef7340ae ("xsk: introduce xdp_umem_page")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-08-20 16:39:48 +02:00
Xin Long
2f7576347c sctp: remove net sctp.x_enable working as a global switch
The netns sctp feature flags shouldn't work as a global switch,
which is mostly like a firewall/netfilter's job. Also, it will
break asoc as it discard or accept chunks incorrectly when net
sctp.x_enable is changed after the asoc is created.

Since each type of chunk's processing function will check the
corresp asoc's feature flag, this 'global switch' should be
removed, and net sctp.x_enable will only work as the default
feature flags for the future sctp sockets/endpoints.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 18:27:29 -07:00
Xin Long
56dd525abd sctp: add SCTP_AUTH_SUPPORTED sockopt
SCTP_AUTH_SUPPORTED sockopt is used to set enpoint's auth
flag. With this feature, each endpoint will have its own
flag for its future asoc's auth_capable, instead of netns
auth flag.

Note that when both ep's auth_enable is enabled, endpoint
auth related data should be initialized. If asconf_enable
is also set, SCTP_CID_ASCONF/SCTP_CID_ASCONF_ACK should
be added into auth_chunk_list.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 18:27:29 -07:00
Xin Long
03f961270f sctp: add sctp_auth_init and sctp_auth_free
This patch is to factor out sctp_auth_init and sctp_auth_free
functions, and sctp_auth_init will also be used in the next
patch for SCTP_AUTH_SUPPORTED sockopt.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 18:27:29 -07:00
Xin Long
219f9ea4d3 sctp: use ep and asoc auth_enable properly
sctp has per endpoint auth flag and per asoc auth flag, and
the asoc one should be checked when coming to asoc and the
endpoint one should be checked when coming to endpoint.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 18:27:29 -07:00
Xin Long
df2c71ffdf sctp: add SCTP_ASCONF_SUPPORTED sockopt
SCTP_ASCONF_SUPPORTED sockopt is used to set enpoint's asconf
flag. With this feature, each endpoint will have its own flag
for its future asoc's asconf_capable, instead of netns asconf
flag.

Note that when both ep's asconf_enable and auth_enable are
enabled, SCTP_CID_ASCONF and SCTP_CID_ASCONF_ACK should be
added into auth_chunk_list.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 18:27:28 -07:00
Xin Long
a2eeacc830 sctp: check asoc peer.asconf_capable before processing asconf
asconf chunks should be dropped when the asoc doesn't support
asconf feature.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 18:27:28 -07:00
Xin Long
bb2ded2602 sctp: not set peer.asconf_capable in sctp_association_init
asoc->peer.asconf_capable is to be set during handshake, and its
value should be initialized to 0. net->sctp.addip_noauth will be
checked in sctp_process_init when processing INIT_ACK on client
and COOKIE_ECHO on server.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 18:27:28 -07:00
Xin Long
4e27428fb5 sctp: add asconf_enable in struct sctp_endpoint
This patch is to make addip/asconf flag per endpoint,
and its value is initialized by the per netns flag,
net->sctp.addip_enable.

It also replaces the checks of net->sctp.addip_enable
with ep->asconf_enable in some places.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 18:27:28 -07:00
Li RongQing
af809709e9 net: remove empty inet_exit_net
Pointer members of an object with static storage duration, if not
explicitly initialized, will be initialized to a NULL pointer. The
net namespace API checks if this pointer is not NULL before using it,
it are safe to remove the function.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 18:22:54 -07:00
David S. Miller
e15dbcdeb9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Remove IP MASQUERADING record in MAINTAINERS file,
   from Denis Efremov.

2) Counter arguments are swapped in ebtables, from
   Todd Seidelmann.

3) Missing netlink attribute validation in flow_offload
   extension.

4) Incorrect alignment in xt_nfacct that breaks 32-bits
   userspace / 64-bits kernels, from Juliana Rodrigueiro.

5) Missing include guard in nf_conntrack_h323_types.h,
   from Masahiro Yamada.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 13:16:07 -07:00
Eric Dumazet
ef8d8ccdc2 tcp: make sure EPOLLOUT wont be missed
As Jason Baron explained in commit 790ba4566c ("tcp: set SOCK_NOSPACE
under memory pressure"), it is crucial we properly set SOCK_NOSPACE
when needed.

However, Jason patch had a bug, because the 'nonblocking' status
as far as sk_stream_wait_memory() is concerned is governed
by MSG_DONTWAIT flag passed at sendmsg() time :

    long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);

So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
value.

This patch removes the 'noblock' variable since we must always
set SOCK_NOSPACE if -EAGAIN is returned.

It also renames the do_nonblock label since we might reach this
code path even if we were in blocking mode.

Fixes: 790ba4566c ("tcp: set SOCK_NOSPACE under memory pressure")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Reported-by: Vladimir Rutsky  <rutsky@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 13:07:43 -07:00
Vlad Buslov
607f625b86 net: flow_offload: convert block_ing_cb_list to regular list type
RCU list block_ing_cb_list is protected by rcu read lock in
flow_block_ing_cmd() and with flow_indr_block_ing_cb_lock mutex in all
functions that use it. However, flow_block_ing_cmd() needs to call blocking
functions while iterating block_ing_cb_list which leads to following
suspicious RCU usage warning:

[  401.510948] =============================
[  401.510952] WARNING: suspicious RCU usage
[  401.510993] 5.3.0-rc3+ #589 Not tainted
[  401.510996] -----------------------------
[  401.511001] include/linux/rcupdate.h:265 Illegal context switch in RCU read-side critical section!
[  401.511004]
               other info that might help us debug this:

[  401.511008]
               rcu_scheduler_active = 2, debug_locks = 1
[  401.511012] 7 locks held by test-ecmp-add-v/7576:
[  401.511015]  #0: 00000000081d71a5 (sb_writers#4){.+.+}, at: vfs_write+0x166/0x1d0
[  401.511037]  #1: 000000002bd338c3 (&of->mutex){+.+.}, at: kernfs_fop_write+0xef/0x1b0
[  401.511051]  #2: 00000000c921c634 (kn->count#317){.+.+}, at: kernfs_fop_write+0xf7/0x1b0
[  401.511062]  #3: 00000000a19cdd56 (&dev->mutex){....}, at: sriov_numvfs_store+0x6b/0x130
[  401.511079]  #4: 000000005425fa52 (pernet_ops_rwsem){++++}, at: unregister_netdevice_notifier+0x30/0x140
[  401.511092]  #5: 00000000c5822793 (rtnl_mutex){+.+.}, at: unregister_netdevice_notifier+0x35/0x140
[  401.511101]  #6: 00000000c2f3507e (rcu_read_lock){....}, at: flow_block_ing_cmd+0x5/0x130
[  401.511115]
               stack backtrace:
[  401.511121] CPU: 21 PID: 7576 Comm: test-ecmp-add-v Not tainted 5.3.0-rc3+ #589
[  401.511124] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[  401.511127] Call Trace:
[  401.511138]  dump_stack+0x85/0xc0
[  401.511146]  ___might_sleep+0x100/0x180
[  401.511154]  __mutex_lock+0x5b/0x960
[  401.511162]  ? find_held_lock+0x2b/0x80
[  401.511173]  ? __tcf_get_next_chain+0x1d/0xb0
[  401.511179]  ? mark_held_locks+0x49/0x70
[  401.511194]  ? __tcf_get_next_chain+0x1d/0xb0
[  401.511198]  __tcf_get_next_chain+0x1d/0xb0
[  401.511251]  ? uplink_rep_async_event+0x70/0x70 [mlx5_core]
[  401.511261]  tcf_block_playback_offloads+0x39/0x160
[  401.511276]  tcf_block_setup+0x1b0/0x240
[  401.511312]  ? mlx5e_rep_indr_setup_tc_cb+0xca/0x290 [mlx5_core]
[  401.511347]  ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core]
[  401.511359]  tc_indr_block_get_and_ing_cmd+0x11b/0x1e0
[  401.511404]  ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core]
[  401.511414]  flow_block_ing_cmd+0x7e/0x130
[  401.511453]  ? mlx5e_rep_indr_tc_block_unbind+0x50/0x50 [mlx5_core]
[  401.511462]  __flow_indr_block_cb_unregister+0x7f/0xf0
[  401.511502]  mlx5e_nic_rep_netdevice_event+0x75/0xb0 [mlx5_core]
[  401.511513]  unregister_netdevice_notifier+0xe9/0x140
[  401.511554]  mlx5e_cleanup_rep_tx+0x6f/0xe0 [mlx5_core]
[  401.511597]  mlx5e_detach_netdev+0x4b/0x60 [mlx5_core]
[  401.511637]  mlx5e_vport_rep_unload+0x71/0xc0 [mlx5_core]
[  401.511679]  esw_offloads_disable+0x5b/0x90 [mlx5_core]
[  401.511724]  mlx5_eswitch_disable.cold+0xdf/0x176 [mlx5_core]
[  401.511759]  mlx5_device_disable_sriov+0xab/0xb0 [mlx5_core]
[  401.511794]  mlx5_core_sriov_configure+0xaf/0xd0 [mlx5_core]
[  401.511805]  sriov_numvfs_store+0xf8/0x130
[  401.511817]  kernfs_fop_write+0x122/0x1b0
[  401.511826]  vfs_write+0xdb/0x1d0
[  401.511835]  ksys_write+0x65/0xe0
[  401.511847]  do_syscall_64+0x5c/0xb0
[  401.511857]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  401.511862] RIP: 0033:0x7fad892d30f8
[  401.511868] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 25 96 0d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 60 c3 0f 1f 80 00 00 00 00 48 83
 ec 28 48 89
[  401.511871] RSP: 002b:00007ffca2a9fad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  401.511875] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fad892d30f8
[  401.511878] RDX: 0000000000000002 RSI: 000055afeb072a90 RDI: 0000000000000001
[  401.511881] RBP: 000055afeb072a90 R08: 00000000ffffffff R09: 000000000000000a
[  401.511884] R10: 000055afeb058710 R11: 0000000000000246 R12: 0000000000000002
[  401.511887] R13: 00007fad893a8780 R14: 0000000000000002 R15: 00007fad893a3740

To fix the described incorrect RCU usage, convert block_ing_cb_list from
RCU list to regular list and protect it with flow_indr_block_ing_cb_lock
mutex in flow_block_ing_cmd().

Fixes: 1150ab0f1b ("flow_offload: support get multi-subsystem block")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-19 13:02:38 -07:00