723741 Commits

Author SHA1 Message Date
c156618e15 nfs: fix a deadlock in nfs client initialization
The following deadlock can occur between a process waiting for a client
to initialize in while walking the client list during nfsv4 server trunking
detection and another process waiting for the nfs_clid_init_mutex so it
can initialize that client:

Process 1                               Process 2
---------                               ---------
spin_lock(&nn->nfs_client_lock);
list_add_tail(&CLIENTA->cl_share_link,
        &nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
                                        spin_lock(&nn->nfs_client_lock);
                                        list_add_tail(&CLIENTB->cl_share_link,
                                                &nn->nfs_client_list);
                                        spin_unlock(&nn->nfs_client_lock);
                                        mutex_lock(&nfs_clid_init_mutex);
                                        nfs41_walk_client_list(clp, result, cred);
                                        nfs_wait_client_init_complete(CLIENTA);
(waiting for nfs_clid_init_mutex)

Make sure nfs_match_client() only evaluates clients that have completed
initialization in order to prevent that deadlock.

This patch also fixes v4.0 trunking behavior by not marking the client
NFS_CS_READY until the clientid has been confirmed.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-12-15 14:31:49 -05:00
c05fad5713 ip_gre: fix wrong return value of erspan_rcv
If pskb_may_pull return failed, return PACKET_REJECT instead of -ENOMEM.

Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN")
Cc: William Tu <u9012063@gmail.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 14:10:39 -05:00
b3b8707739 Merge branch 'sctp-stream-interleave'
Xin Long says:

====================
sctp: Implement Stream Interleave: Interaction with Other SCTP Extensions

Stream Interleave would be implemented in two Parts:

   1. The I-DATA Chunk Supporting User Message Interleaving
   2. Interaction with Other SCTP Extensions

Overview in section 2.3 of RFC8260 for Part 2:

   The usage of the I-DATA chunk might interfere with other SCTP
   extensions.  Future SCTP extensions MUST describe if and how they
   interfere with the usage of I-DATA chunks.  For the SCTP extensions
   already defined when this document was published, the details are
   given in the following subsections.

As the 2nd part of Stream Interleave Implementation, this patchset mostly
adds the support for SCTP Partial Reliability Extension with I-FORWARD-TSN
chunk. Then adjusts stream scheduler and stream reconfig to make them work
properly with I-DATA chunks.

In the last patch, all stream interleave codes will be enabled by adding
sysctl to allow users to use this feature.

v1 -> v2:
  - removed the intl_enable check from sctp_chunk_event_lookup, as Marcelo's
    suggestion.
  - fixed a typo in changelog.
====================

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:46 -05:00
463118c34a sctp: support sysctl to allow users to use stream interleave
This is the last patch for support of stream interleave, after this patch,
users could enable stream interleave by systcl -w net.sctp.intl_enable=1.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:22 -05:00
107e242569 sctp: update mid instead of ssn when doing stream and asoc reset
When using idata and doing stream and asoc reset, setting ssn with
0 could only clear the 1st 16 bits of mid.

So to make this work for both data and idata, it sets mid with 0
instead of ssn, and also mid_uo for unordered idata also need to
be cleared, as said in section 2.3.2 of RFC8260.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:22 -05:00
ef4775e340 sctp: add stream interleave support in stream scheduler
As Marcelo said in the stream scheduler patch:

  Support for I-DATA chunks, also described in RFC8260, with user message
  interleaving is straightforward as it just requires the schedulers to
  probe for the feature and ignore datamsg boundaries when dequeueing.

All needs to do is just to ignore datamsg boundaries when dequeueing.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:22 -05:00
de60fe9105 sctp: implement handle_ftsn for sctp_stream_interleave
handle_ftsn is added as a member of sctp_stream_interleave, used to skip
ssn for data or mid for idata, called for SCTP_CMD_PROCESS_FWDTSN cmd.

sctp_handle_iftsn works for ifwdtsn, and sctp_handle_fwdtsn works for
fwdtsn. Note that different from sctp_handle_fwdtsn, sctp_handle_iftsn
could do stream abort pd.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:22 -05:00
47b20a8856 sctp: implement report_ftsn for sctp_stream_interleave
report_ftsn is added as a member of sctp_stream_interleave, used to
skip tsn from tsnmap, remove old events from reasm or lobby queue,
and abort pd for data or idata, called for SCTP_CMD_REPORT_FWDTSN
cmd and asoc reset.

sctp_report_iftsn works for ifwdtsn, and sctp_report_fwdtsn works
for fwdtsn. Note that sctp_report_iftsn doesn't do asoc abort_pd,
as stream abort_pd will be done when handling ifwdtsn. But when
ftsn is equal with ftsn, which means asoc reset, asoc abort_pd has
to be done.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:22 -05:00
0fc2ea922c sctp: implement validate_ftsn for sctp_stream_interleave
validate_ftsn is added as a member of sctp_stream_interleave, used to
validate ssn/chunk type for fwdtsn or mid (message id)/chunk type for
ifwdtsn, called in sctp_sf_eat_fwd_tsn, just as validate_data.

If this check fails, an abort packet will be sent, as said in section
2.3.1 of RFC8260.

As ifwdtsn and fwdtsn chunks have different length, it also defines
ftsn_chunk_len for sctp_stream_interleave to describe the chunk size.
Then it replaces all sizeof(struct sctp_fwdtsn_chunk) with
sctp_ftsnchk_len.

It also adds the process for ifwdtsn in rx path. As Marcelo pointed
out, there's no need to add event table for ifwdtsn, but just share
prsctp_chunk_event_table with fwdtsn's. It would drop fwdtsn chunk
for ifwdtsn and drop ifwdtsn chunk for fwdtsn by calling validate_ftsn
in sctp_sf_eat_fwd_tsn.

After this patch, the ifwdtsn can be accepted.

Note that this patch also removes the sctp.intl_enable check for
idata chunks in sctp_chunk_event_lookup, as it will do this check
in validate_data later.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:22 -05:00
8e0c3b73ce sctp: implement generate_ftsn for sctp_stream_interleave
generate_ftsn is added as a member of sctp_stream_interleave, used to
create fwdtsn or ifwdtsn chunk according to abandoned chunks, called
in sctp_retransmit and sctp_outq_sack.

sctp_generate_iftsn works for ifwdtsn, and sctp_generate_fwdtsn is
still used for making fwdtsn.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:21 -05:00
2d07a49ade sctp: add basic structures and make chunk function for ifwdtsn
sctp_ifwdtsn_skip, sctp_ifwdtsn_hdr and sctp_ifwdtsn_chunk are used to
define and parse I-FWD TSN chunk format, and sctp_make_ifwdtsn is a
function to build the chunk.

The I-FORWARD-TSN Chunk Format is defined in section 2.3.1 of RFC8260.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo R. Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:52:21 -05:00
024778095a net: phy: phylink: Handle NULL fwnode_handle
Unlike the various of_* routines to fetch properties, fwnode_* routines can
have an early check against a NULL fwnode_handle reference which makes them
return -EINVAL (see fwnode_call_int_op), thus making it virtually impossible to
differentiate what type of error is going on.

Have an early check in phylink_register_sfp() so we can keep proceeding with
the initialization, there is not much we can do without a valid fwnode_handle
except return early and treat this similarly to -ENOENT.

Fixes: 8fa7b9b6af25 ("phylink: convert to fwnode")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:50:49 -05:00
245d21190a qmi_wwan: set FLAG_SEND_ZLP to avoid network initiated disconnect
It has been reported that the dummy byte we add to avoid
ZLPs can be forwarded by the modem to the PGW/GGSN, and that
some operators will drop the connection if this happens.

In theory, QMI devices are based on CDC ECM and should as such
both support ZLPs and silently ignore the dummy byte.  The latter
assumption failed.  Let's test out the first.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:50:13 -05:00
c647c0d62c net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
This patch adds support for Telit ME910 PID 0x1101.

Signed-off-by: Daniele Palmas <dnlplm@gmail.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:46:10 -05:00
d1fca67fee Merge branch 'net-sched-Make-qdisc-offload-uapi-uniform'
Yuval Mintz says:

====================
net: sched: Make qdisc offload uapi uniform

Several qdiscs can already be offloaded to hardware, but there's an
inconsistecy in regard to the uapi through which they indicate such
an offload is taking place - indication is passed to the user via
TCA_OPTIONS where each qdisc retains private logic for setting it.

The recent addition of offloading to RED in
602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc") caused
the addition of yet another uapi field for this purpose -
TC_RED_OFFLOADED.

For clarity and prevention of bloat in the uapi we want to eliminate
said added uapi, replacing it with a common mechanism that can be used
to reflect offload status of the various qdiscs.

The first patch introduces TCA_HW_OFFLOAD as the generic message meant
for this purpose. The second changes the current RED implementation into
setting the internal bits necessary for passing it, and the third removes
TC_RED_OFFLOADED as its no longer needed.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:35:37 -05:00
4a98795bc8 pkt_sched: Remove TC_RED_OFFLOADED from uapi
Following the previous patch, RED is now using the new uniform uapi
for indicating it's offloaded. As a result, TC_RED_OFFLOADED is no
longer utilized by kernel and can be removed [as it's still not
part of any stable release].

Fixes: 602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:35:37 -05:00
428a68af3a net: sched: Move to new offload indication in RED
Let RED utilize the new internal flag, TCQ_F_OFFLOADED,
to mark a given qdisc as offloaded instead of using a dedicated
indication.

Also, change internal logic into looking at said flag when possible.

Fixes: 602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:35:36 -05:00
7a4fa29106 net: sched: Add TCA_HW_OFFLOAD
Qdiscs can be offloaded to HW, but current implementation isn't uniform.
Instead, qdiscs either pass information about offload status via their
TCA_OPTIONS or omit it altogether.

Introduce a new attribute - TCA_HW_OFFLOAD that would form a uniform
uAPI for the offloading status of qdiscs.

Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:35:36 -05:00
82f67bc6be net: alteon: acenic: clean up indentation issue
There is a hunk of code that is incorrectly indented with spaces
and rather than a tab.  Clean this up.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:28:30 -05:00
f3ac015346 Merge branch 'sfp-SFF-module-support'
Russell King says:

====================
Add SFF module support

Add support for SFF modules.  SFF modules are similar to SFP modules,
but they have fewer control signals, and are soldered down rather than
pluggable.

They also have different IDs in the EEPROM to identify as soldered down
SFF modules.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:23:22 -05:00
259c8618b0 sfp: add sff module support
Add support for SFF modules, which are soldered down SFP modules.
These have a different phys_id value, and also have the present and
rate select signals omitted compared with their socketed counter-parts.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:23:22 -05:00
512dc8fed9 dt-bindings: add sff,sff binding for SFP support
Add "sff,sff" for SFF module support with SFP.  These have a different
phys_id value, and also have the present and rate select signals omitted
compared with their socketed counter-parts.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 13:23:22 -05:00
b4cf6a0a23 Merge branch 'nfp-fix-rtsym-and-XPB-register-handling-in-debug-dump'
Simon Horman says:

====================
nfp: fix rtsym and XPB register handling in debug dump

this series resolves two problems in the recently added debug dump facility.

* Correctly handle reading absolute rtysms
* Correctly handle special-case PB register reads

These fixes are for code only present in net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:48:46 -05:00
28b2d7d04b nfp: fix XPB register reads in debug dump
For XPB registers reads, some island IDs require special handling (e.g.
ARM island), which is already taken care of in nfp_xpb_readl(), so use
that instead of a straight CPP read.

Without this fix all "xpbm:ArmIsldXpbmMap.*" registers are reported as
0xffffffff. It has also been observed to cause a system reboot.

With this fix correct values are reported, none of which are 0xffffffff.

The values may be read using ethtool debug level 2.
 # ethtool -W <netdev> 2
 # ethtool -w <netdev> data dump.dat

Fixes: 0e6c4955e149 ("nfp: dump CPP, XPB and direct ME CSRs")
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:48:45 -05:00
da762863ed nfp: fix absolute rtsym handling in debug dump
In TLV-based ethtool debug dumps, don't do a CPP read for absolute
rtsyms, use the addr field in the symbol table directly as the value.

Without this fix rtsym gro_release_ring_0 is 4 bytes of zeros.
With this fix the correct value, 0x0000004a 0x00000000 is reported.

The values may be read using ethtool debug level 2.
 # ethtool -W <netdev> 2
 # ethtool -w <netdev> data dump.dat

Fixes: e1e798e3fd93 ("nfp: dump rtsyms")
Signed-off-by: Carl Heymann <carl.heymann@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:48:45 -05:00
0a0606970f Merge branch 'aquantia-fixes'
Igor Russkikh says:

====================
net: aquantia: Atlantic driver 12/2017 updates

The patchset contains important hardware fix for machines with large MRRS
and couple of improvement in stats and capabilities reporting

patch v3:
 - Fixed patch #7 after Andrew's finding. NIC level stats actually
   have to be cleaned only on hw struct creation (and this is done
   in kzalloc). On each hwinit we only have to reset link state
   to make sure hw stats update will not increment nic stats during init.

patch v2:
 - split into more detailed commits

Comment from David on wrong defines case will be submitted separately later
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:46:43 -05:00
d4c242d4ba net: aquantia: Increment driver version
Add a suffix to distinguish kernel mainline version and aquantia releases

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:46:42 -05:00
98bc036de4 net: aquantia: Fix typo in ethtool statistics names
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:46:42 -05:00
f3e2778429 net: aquantia: Update hw counters on hw init
On very first start we should read out current HW counter values
to make diff based calculations later.
This also should be done each time NIC gets down/up or wakes up
after sleep state. We reset link state explicitly to prevent diffs
from being summed this first time.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:46:42 -05:00
fdb4a0830e net: aquantia: Improve link state and statistics check interval callback
Reduce timeout from 2 secs to 1 sec. If link is down,
reduce it to 500msec. This speeds up link detection.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:46:42 -05:00
45cc1c7ad4 net: aquantia: Fill in multicast counter in ndev stats from hardware
This metric comes from HW and is also diff-calculated, like other counters

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:46:42 -05:00
9f8a2203a5 net: aquantia: Fill ndev stat couters from hardware
Originally they were filled from ring sw counters.
These sometimes incorrectly calculate byte and packet amounts
when using LRO/LSO and jumboframes. Filling ndev counters from
hardware makes them precise.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:46:42 -05:00
be08d839d9 net: aquantia: Extend stat counters to 64bit values
Device hardware provides only 32bit counters. Using these directly
causes byte counters to overflow soon. A separate nic level structure
with 64 bit counters is now used to collect incrementally all the stats
and report these counters to ethtool stats and ndev stats.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:46:41 -05:00
1e36616151 net: aquantia: Fix hardware DMA stream overload on large MRRS
Systems with large MRRS on device (2K, 4K) with high data rates and/or
large MTU, atlantic observes DMA packet buffer overflow. On some systems
that causes PCIe transaction errors, hardware NMIs or datapath freeze.
This patch
1) Limits MRRS from device side to 2K (thats maximum our hardware supports)
2) Limit maximum size of outstanding TX DMA data read requests. This makes
hardware buffers running fine.

Signed-off-by: Pavel Belous <pavel.belous@aquantia.com>
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:46:41 -05:00
e4d02ca04c net: aquantia: Fix actual speed capabilities reporting
Different hardware device Ids correspond to different maximum speed
available. Extra checks were added for devices D108 and D109 to
remove unsupported speeds from these device capabilities list.

Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:46:41 -05:00
9463b2f72e Merge branch 'erspan-version-2'
William Tu says:

====================
ERSPAN version 2 (type III) support

ERSPAN has two versions, v1 (type II) and v2 (type III).  This patch
series add support for erspan v2 based on existing erspan v1
implementation.  The first patch refactors the existing erspan v1's
header structure, making it extensible to put additional v2's header.
The second and third patch introduces erspan v2's implementation to
ipv4 and ipv6 erspan, for both native mode and collect metadata mode.
Finally, test cases are added under the samples/bpf.

Note:
ERSPAN version 2 has many features and this patch does not implement
all.  One major use case of version 2 over version 1 is its timestamp
and direction.  So the traffic collector is able to distinguish the
mirrorred traffic better.  Other features such as SGT (security group
tag), FT (frame type) for carrying non-ethernet packet, and optional
subheader are not implemented yet.

Example commandline for ERSPAN version 2:
ip link add dev ip6erspan11 type ip6erspan seq key 102 \
	local fc00💯:2 remote fc00💯:1 \
	erspan_ver 2 erspan_dir 1 erspan_hwid 17

The corresponding iproute2 patch:
https://marc.info/?l=linux-netdev&m=151321141525106&w=2

William Tu (4):
  net: erspan: refactor existing erspan code
  net: erspan: introduce erspan v2 for ip_gre
  ip6_gre: add erspan v2 support
  samples/bpf: add erspan v2 sample code

 include/net/erspan.h           | 152 ++++++++++++++++++++++++++++++++++++++---
 include/net/ip6_tunnel.h       |   3 +
 include/net/ip_tunnels.h       |   5 +-
 include/uapi/linux/if_ether.h  |   1 +
 include/uapi/linux/if_tunnel.h |   3 +
 net/ipv4/ip_gre.c              | 124 +++++++++++++++++++++++++++------
 net/ipv6/ip6_gre.c             | 139 +++++++++++++++++++++++++++++++------
 net/openvswitch/flow_netlink.c |   8 +--
 samples/bpf/tcbpf2_kern.c      |  77 ++++++++++++++++++---
 samples/bpf/test_tunnel_bpf.sh |  38 ++++++++---
 10 files changed, 472 insertions(+), 78 deletions(-)

--
A simple script to test it:

set -ex
function cleanup() {
	set +ex
	ip netns del ns0
	ip link del ip6erspan11
	ip link del veth1
}

function main() {
	trap cleanup 0 2 3 9

	ip netns add ns0
	ip link add veth0 type veth peer name veth1
	ip link set veth0 netns ns0

	# non-namespace
	ip addr add dev veth1 fc00💯:2/96

	if [ "$1" == "v1" ]; then
		echo "create IP6 ERSPAN v1 tunnel"
		ip link add dev ip6erspan11 type ip6erspan seq key 102 \
			local fc00💯:2 remote fc00💯:1 \
			erspan 123 erspan_ver 1
	else
		echo "create IP6 ERSPAN v2 tunnel"
		ip link add dev ip6erspan11 type ip6erspan seq key 102 \
			local fc00💯:2 remote fc00💯:1 \
			erspan_ver 2 erspan_dir 1 erspan_hwid 17
	fi
	ip addr add dev ip6erspan11 fc00:200::2/96
	ip addr add dev ip6erspan11 10.10.200.2/24

	# namespace: ns0
	ip netns exec ns0 ip addr add fc00💯:1/96 dev veth0

	if [ "$1" == "v1" ]; then
		ip netns exec ns0 \
		ip link add dev ip6erspan00 type ip6erspan seq key 102 \
			local fc00💯:1 remote fc00💯:2 \
			erspan 123 erspan_ver 1
	else
		ip netns exec ns0 \
		ip link add dev ip6erspan00 type ip6erspan seq key 102 \
			local fc00💯:1 remote fc00💯:2 \
			erspan_ver 2 erspan_dir 1 erspan_hwid 7
	fi

	ip netns exec ns0 ip addr add dev ip6erspan00 fc00:200::1/96
	ip netns exec ns0 ip addr add dev ip6erspan00 10.10.200.1/24

	ip link set dev veth1 up
	ip link set dev ip6erspan11 up
	ip netns exec ns0 ip link set dev ip6erspan00 up
	ip netns exec ns0 ip link set dev veth0 up
}

main $1

ping6 -c 1 fc00💯:1 || true

ping -c 3 10.10.200.1
exit 0
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:34:01 -05:00
ac80c2a165 samples/bpf: add erspan v2 sample code
Extend the existing tests for ipv4 ipv6 erspan version 2.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:34:00 -05:00
94d7d8f292 ip6_gre: add erspan v2 support
Similar to support for ipv4 erspan, this patch adds
erspan v2 to ip6erspan tunnel.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:34:00 -05:00
f551c91de2 net: erspan: introduce erspan v2 for ip_gre
The patch adds support for erspan version 2.  Not all features are
supported in this patch.  The SGT (security group tag), GRA (timestamp
granularity), FT (frame type) are set to fixed value.  Only hardware
ID and direction are configurable.  Optional subheader is also not
supported.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:34:00 -05:00
1d7e2ed22f net: erspan: refactor existing erspan code
The patch refactors the existing erspan implementation in order
to support erspan version 2, which has additional metadata.  So, in
stead of having one 'struct erspanhdr' holding erspan version 1,
breaks it into 'struct erspan_base_hdr' and 'struct erspan_metadata'.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:33:59 -05:00
4650b7514c Merge branch 'nfp-ethtool-flash-updates'
Jakub Kicinski says:

====================
nfp: ethtool flash updates

Dirk says:

This series adds the ability to update the control FW with ethtool.

It should be noted that the locking scheme here is to release the RTNL
lock before the flashing operation and to take it again afterwards to
ensure consistent state from the core code point of view. In this time,
we take a reference to the device to prevent the device being freed
while its being flashed.

This provides protection for the device being flashed while at the same
time not holding up any networking related functions which would
otherwise be locked out due to RTNL being held.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:26:13 -05:00
7a74156591 nfp: implement firmware flashing
Firmware flashing takes around 60s (specified to not take more than
70s). Prevent hogging the RTNL lock in this time and make use of the
longer timeout for the NSP command. The timeout is set to 2.5 * 70
seconds.

We only allow flashing the firmware from reprs or PF netdevs. VFs do not
have an app reference.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:26:12 -05:00
87a23801e5 nfp: extend NSP infrastructure for configurable timeouts
The firmware flashing NSP operation takes longer to execute than the
current default timeout. We need a mechanism to set a longer timeout for
some commands. This patch adds the infrastructure to this.

The default timeout is still 30 seconds.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:26:12 -05:00
d31d38a0a9 Merge branch 'ipvlan-packet-scrub'
Mahesh Bandewar says:

====================
ipvlan: packet scrub

While crossing namespace boundary IPvlan aggressively scrubs packets.
This is creating problems. First thing is that scrubbing changes the
packet type in skb meta-data to PACKET_HOST. This causes erroneous
packet delivery when dev_forward_skb() has already marked the packet
type as OTHER_HOST.

On the egress side scrubbing just before calling dev_queue_xmit()
creates another set of problems. Scrubbing remove skb->sk so the
prio update gets missed and more seriously, socket back-pressure
fails making TSQ not function correctly.

The first patch in the series just reverts the earlier change which
was adding a mac-check, but that is unnecessary if packet_type that
dev_forward_skb() has set is honored. The second path removes two of
the scrubs which are causing problems described above.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 11:36:54 -05:00
c0d451c86c ipvlan: remove excessive packet scrubbing
IPvlan currently scrubs packets at every location where packets may be
crossing namespace boundary. Though this is desirable, currently IPvlan
does it more than necessary. e.g. packets that are going to take
dev_forward_skb() path will get scrubbed so no point in scrubbing them
before forwarding. Another side-effect of scrubbing is that pkt-type gets
set to PACKET_HOST which overrides what was already been set by the
earlier path making erroneous delivery of the packets.

Also scrubbing packets just before calling dev_queue_xmit() has detrimental
effects since packets lose skb->sk and because of that miss prio updates,
incorrect socket back-pressure and would even break TSQ.

Fixes: b93dd49c1a35 ('ipvlan: Scrub skb before crossing the namespace boundary')
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 11:36:53 -05:00
918150cbd6 Revert "ipvlan: add L2 check for packets arriving via virtual devices"
This reverts commit 92ff42645028fa6f9b8aa767718457b9264316b4.

Even though the check added is not that taxing, it's not really needed.
First of all this will be per packet cost and second thing is that the
eth_type_trans() already does this correctly. The excessive scrubbing
in IPvlan was changing the pkt-type skb metadata of the packet which
made it necessary to re-check the mac. The subsequent patch in this
series removes the faulty packet-scrub.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 11:36:53 -05:00
35b99dffc3 sock: free skb in skb_complete_tx_timestamp on error
skb_complete_tx_timestamp must ingest the skb it is passed. Call
kfree_skb if the skb cannot be enqueued.

Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl")
Fixes: 9ac25fc06375 ("net: fix socket refcounting in skb_complete_tx_timestamp()")
Reported-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 11:30:36 -05:00
d9356edc44 Merge branch 's390-fixes'
Julian Wiedmann says:

====================
s390/qeth: fixes 2017-12-13

some more patches for 4.15, that fix multiple issues with IP Takeover
configuration in qeth.
Please queue them up for stable kernels as well (4.9 and newer).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 11:29:44 -05:00
02f510f326 s390/qeth: update takeover IPs after configuration change
Any modification to the takeover IP-ranges requires that we re-evaluate
which IP addresses are takeover-eligible. Otherwise we might do takeover
for some addresses when we no longer should, or vice-versa.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 11:29:43 -05:00
8a03a3692b s390/qeth: lock IP table while applying takeover changes
Modifying the flags of an IP addr object needs to be protected against
eg. concurrent removal of the same object from the IP table.

Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 11:29:43 -05:00