40013 Commits

Author SHA1 Message Date
6ac311ae8b Adding switchdev ageing notification on port bridged
Configure ageing time to the HW for newly bridged device

CC: Scott Feldman <sfeldma@gmail.com>
CC: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 07:50:57 -07:00
50010c2059 irda: precedence bug in irlmp_seq_hb_idx()
This is decrementing the pointer, instead of the value stored in the
pointer.  KASan detects it as an out of bounds reference.

Reported-by: "Berry Cheng 程君(成淼)" <chengmiao.cj@alibaba-inc.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 07:48:26 -07:00
f6a835bb04 vsock: fix missing cleanup when misc_register failed
reset transport and unlock if misc_register failed.

Signed-off-by: Gao feng <omarapazanadi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 07:41:06 -07:00
146aa8b145 KEYS: Merge the type-specific data with the payload data
Merge the type-specific data with the payload data into one four-word chunk
as it seems pointless to keep them separate.

Use user_key_payload() for accessing the payloads of overloaded
user-defined keys.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cifs@vger.kernel.org
cc: ecryptfs@vger.kernel.org
cc: linux-ext4@vger.kernel.org
cc: linux-f2fs-devel@lists.sourceforge.net
cc: linux-nfs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: linux-ima-devel@lists.sourceforge.net
2015-10-21 15:18:36 +01:00
4f41b1c58a tcp: use RACK to detect losses
This patch implements the second half of RACK that uses the the most
recent transmit time among all delivered packets to detect losses.

tcp_rack_mark_lost() is called upon receiving a dubious ACK.
It then checks if an not-yet-sacked packet was sent at least
"reo_wnd" prior to the sent time of the most recently delivered.
If so the packet is deemed lost.

The "reo_wnd" reordering window starts with 1msec for fast loss
detection and changes to min-RTT/4 when reordering is observed.
We found 1msec accommodates well on tiny degree of reordering
(<3 pkts) on faster links. We use min-RTT instead of SRTT because
reordering is more of a path property but SRTT can be inflated by
self-inflicated congestion. The factor of 4 is borrowed from the
delayed early retransmit and seems to work reasonably well.

Since RACK is still experimental, it is now used as a supplemental
loss detection on top of existing algorithms. It is only effective
after the fast recovery starts or after the timeout occurs. The
fast recovery is still triggered by FACK and/or dupack threshold
instead of RACK.

We introduce a new sysctl net.ipv4.tcp_recovery for future
experiments of loss recoveries. For now RACK can be disabled by
setting it to 0.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 07:00:53 -07:00
659a8ad56f tcp: track the packet timings in RACK
This patch is the first half of the RACK loss recovery.

RACK loss recovery uses the notion of time instead
of packet sequence (FACK) or counts (dupthresh). It's inspired by the
previous FACK heuristic in tcp_mark_lost_retrans(): when a limited
transmit (new data packet) is sacked, then current retransmitted
sequence below the newly sacked sequence must been lost,
since at least one round trip time has elapsed.

But it has several limitations:
1) can't detect tail drops since it depends on limited transmit
2) is disabled upon reordering (assumes no reordering)
3) only enabled in fast recovery ut not timeout recovery

RACK (Recently ACK) addresses these limitations with the notion
of time instead: a packet P1 is lost if a later packet P2 is s/acked,
as at least one round trip has passed.

Since RACK cares about the time sequence instead of the data sequence
of packets, it can detect tail drops when later retransmission is
s/acked while FACK or dupthresh can't. For reordering RACK uses a
dynamically adjusted reordering window ("reo_wnd") to reduce false
positives on ever (small) degree of reordering.

This patch implements tcp_advanced_rack() which tracks the
most recent transmission time among the packets that have been
delivered (ACKed or SACKed) in tp->rack.mstamp. This timestamp
is the key to determine which packet has been lost.

Consider an example that the sender sends six packets:
T1: P1 (lost)
T2: P2
T3: P3
T4: P4
T100: sack of P2. rack.mstamp = T2
T101: retransmit P1
T102: sack of P2,P3,P4. rack.mstamp = T4
T205: ACK of P4 since the hole is repaired. rack.mstamp = T101

We need to be careful about spurious retransmission because it may
falsely advance tp->rack.mstamp by an RTT or an RTO, causing RACK
to falsely mark all packets lost, just like a spurious timeout.

We identify spurious retransmission by the ACK's TS echo value.
If TS option is not applicable but the retransmission is acknowledged
less than min-RTT ago, it is likely to be spurious. We refrain from
using the transmission time of these spurious retransmissions.

The second half is implemented in the next patch that marks packet
lost using RACK timestamp.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 07:00:48 -07:00
77c631273d tcp: add tcp_tsopt_ecr_before helper
a helper to prepare the main RACK patch

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 07:00:45 -07:00
af82f4e848 tcp: remove tcp_mark_lost_retrans()
Remove the existing lost retransmit detection because RACK subsumes
it completely. This also stops the overloading the ack_seq field of
the skb control block.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 07:00:44 -07:00
f672258391 tcp: track min RTT using windowed min-filter
Kathleen Nichols' algorithm for tracking the minimum RTT of a
data stream over some measurement window. It uses constant space
and constant time per update. Yet it almost always delivers
the same minimum as an implementation that has to keep all
the data in the window. The measurement window is tunable via
sysctl.net.ipv4.tcp_min_rtt_wlen with a default value of 5 minutes.

The algorithm keeps track of the best, 2nd best & 3rd best min
values, maintaining an invariant that the measurement time of
the n'th best >= n-1'th best. It also makes sure that the three
values are widely separated in the time window since that bounds
the worse case error when that data is monotonically increasing
over the window.

Upon getting a new min, we can forget everything earlier because
it has no value - the new min is less than everything else in the
window by definition and it's the most recent. So we restart fresh
on every new min and overwrites the 2nd & 3rd choices. The same
property holds for the 2nd & 3rd best.

Therefore we have to maintain two invariants to maximize the
information in the samples, one on values (1st.v <= 2nd.v <=
3rd.v) and the other on times (now-win <=1st.t <= 2nd.t <= 3rd.t <=
now). These invariants determine the structure of the code

The RTT input to the windowed filter is the minimum RTT measured
from ACK or SACK, or as the last resort from TCP timestamps.

The accessor tcp_min_rtt() returns the minimum RTT seen in the
window. ~0U indicates it is not available. The minimum is 1usec
even if the true RTT is below that.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 07:00:43 -07:00
9e45a3e36b tcp: apply Kern's check on RTTs used for congestion control
Currently ca_seq_rtt_us does not use Kern's check. Fix that by
checking if any packet acked is a retransmit, for both RTT used
for RTT estimation and congestion control.

Fixes: 5b08e47ca ("tcp: prefer packet timing to TS-ECR for RTT")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 07:00:41 -07:00
8ce783dc5e Bluetooth: Fix missing hdev locking for LE scan cleanup
The hci_conn objects don't have a dedicated lock themselves but rely
on the caller to hold the hci_dev lock for most types of access. The
hci_conn_timeout() function has so far sent certain HCI commands based
on the hci_conn state which has been possible without holding the
hci_dev lock.

The recent changes to do LE scanning before connect attempts added
even more operations to hci_conn and hci_dev from hci_conn_timeout,
thereby exposing potential race conditions with the hci_dev and
hci_conn states.

As an example of such a race, here there's a timeout but an
l2cap_sock_connect() call manages to race with the cleanup routine:

[Oct21 08:14] l2cap_chan_timeout: chan ee4b12c0 state BT_CONNECT
[  +0.000004] l2cap_chan_close: chan ee4b12c0 state BT_CONNECT
[  +0.000002] l2cap_chan_del: chan ee4b12c0, conn f3141580, err 111, state BT_CONNECT
[  +0.000002] l2cap_sock_teardown_cb: chan ee4b12c0 state BT_CONNECT
[  +0.000005] l2cap_chan_put: chan ee4b12c0 orig refcnt 4
[  +0.000010] hci_conn_drop: hcon f53d56e0 orig refcnt 1
[  +0.000013] l2cap_chan_put: chan ee4b12c0 orig refcnt 3
[  +0.000063] hci_conn_timeout: hcon f53d56e0 state BT_CONNECT
[  +0.000049] hci_conn_params_del: addr ee:0d:30:09:53:1f (type 1)
[  +0.000002] hci_chan_list_flush: hcon f53d56e0
[  +0.000001] hci_chan_del: hci0 hcon f53d56e0 chan f4e7ccc0
[  +0.004528] l2cap_sock_create: sock e708fc00
[  +0.000023] l2cap_chan_create: chan ee4b1770
[  +0.000001] l2cap_chan_hold: chan ee4b1770 orig refcnt 1
[  +0.000002] l2cap_sock_init: sk ee4b3390
[  +0.000029] l2cap_sock_bind: sk ee4b3390
[  +0.000010] l2cap_sock_setsockopt: sk ee4b3390
[  +0.000037] l2cap_sock_connect: sk ee4b3390
[  +0.000002] l2cap_chan_connect: 00:02:72:d9:e5:8b -> ee:0d:30:09:53:1f (type 2) psm 0x00
[  +0.000002] hci_get_route: 00:02:72:d9:e5:8b -> ee:0d:30:09:53:1f
[  +0.000001] hci_dev_hold: hci0 orig refcnt 8
[  +0.000003] hci_conn_hold: hcon f53d56e0 orig refcnt 0

Above the l2cap_chan_connect() shouldn't have been able to reach the
hci_conn f53d56e0 anymore but since hci_conn_timeout didn't do proper
locking that's not the case. The end result is a reference to hci_conn
that's not in the conn_hash list, resulting in list corruption when
trying to remove it later:

[Oct21 08:15] l2cap_chan_timeout: chan ee4b1770 state BT_CONNECT
[  +0.000004] l2cap_chan_close: chan ee4b1770 state BT_CONNECT
[  +0.000003] l2cap_chan_del: chan ee4b1770, conn f3141580, err 111, state BT_CONNECT
[  +0.000001] l2cap_sock_teardown_cb: chan ee4b1770 state BT_CONNECT
[  +0.000005] l2cap_chan_put: chan ee4b1770 orig refcnt 4
[  +0.000002] hci_conn_drop: hcon f53d56e0 orig refcnt 1
[  +0.000015] l2cap_chan_put: chan ee4b1770 orig refcnt 3
[  +0.000038] hci_conn_timeout: hcon f53d56e0 state BT_CONNECT
[  +0.000003] hci_chan_list_flush: hcon f53d56e0
[  +0.000002] hci_conn_hash_del: hci0 hcon f53d56e0
[  +0.000001] ------------[ cut here ]------------
[  +0.000461] WARNING: CPU: 0 PID: 1782 at lib/list_debug.c:56 __list_del_entry+0x3f/0x71()
[  +0.000839] list_del corruption, f53d56e0->prev is LIST_POISON2 (00000200)

The necessary fix is unfortunately more complicated than just adding
hci_dev_lock/unlock calls to the hci_conn_timeout() call path.
Particularly, the hci_conn_del() API, which expects the hci_dev lock to
be held, performs a cancel_delayed_work_sync(&hcon->disc_work) which
would lead to a deadlock if the hci_conn_timeout() call path tries to
acquire the same lock.

This patch solves the problem by deferring the cleanup work to a
separate work callback. To protect against the hci_dev or hci_conn
going away meanwhile temporary references are taken with the help of
hci_dev_hold() and hci_conn_get().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 4.3
2015-10-21 14:25:34 +02:00
e5a9f8d046 mac80211: move station statistics into sub-structs
Group station statistics by where they're (mostly) updated
(TX, RX and TX-status) and group them into sub-structs of
the struct sta_info.

Also rename the variables since the grouping now makes it
obvious where they belong.

This makes it easier to identify where the statistics are
updated in the code, and thus easier to think about them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-10-21 10:08:22 +02:00
976bd9efda mac80211: move beacon_loss_count into ifmgd
There's little point in keeping (and even sending to userspace)
the beacon_loss_count value per station, since it can only apply
to the AP on a managed-mode connection. Move the value to ifmgd,
advertise it only in managed mode, and remove it from ethtool as
it's available through better interfaces.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-10-21 10:08:21 +02:00
763aa27a29 mac80211: remove sta->last_ack_signal
This file only feeds a debugfs file that isn't very useful, so remove
it. If necessary, we can add other ways to get this information, for
example in the NL80211_CMD_PROBE_CLIENT response.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-10-21 10:08:21 +02:00
98a63aaf24 Bluetooth: Introduce driver specific post init callback
Some drivers might have to restore certain settings after the init
procedure has been completed. This driver callback allows them to hook
into that stage. This callback is run just before the controller is
declared as powered up.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-21 07:30:53 +03:00
9f7378a9d6 Bluetooth: l2cap_disconnection_req priority over shutdown
There is a L2CAP protocol race between the local peer and
the remote peer demanding disconnection of the L2CAP link.

When L2CAP ERTM is used, l2cap_sock_shutdown() can be called
from userland to disconnect L2CAP. However, there can be a
delay introduced by waiting for ACKs. During this waiting
period, the remote peer may have sent a Disconnection Request.
Therefore, recheck the shutdown status of the socket
after waiting for ACKs because there is no need to do
further processing if the connection has gone.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: Harish Jenny K N <harish_kandiga@mentor.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:26 +02:00
04ba72e6b2 Bluetooth: Reorganize mutex lock in l2cap_sock_shutdown()
This commit reorganizes the mutex lock and is now
only protecting l2cap_chan_close(). This is now consistent
with other places where l2cap_chan_close() is called.

If a conn connection exists, call
mutex_lock(&conn->chan_lock) before calling l2cap_chan_close()
to ensure other L2CAP protocol operations do not interfere.

Note that the conn structure has to be protected from being
freed as it is possible for the connection to be disconnected
whilst the locks are not held. This solution allows the mutex
lock to be used even when the connection has just been
disconnected.

This commit also reduces the scope of chan locking.

The only place where chan locking is needed is the call to
l2cap_chan_close(chan, 0) which if necessary closes the channel.
Therefore, move the l2cap_chan_lock(chan) and
l2cap_chan_lock(chan) locking calls to around
l2cap_chan_close(chan, 0).

This allows __l2cap_wait_ack(sk, chan) to be called with no
chan locks being held so L2CAP messaging over the ACL link
can be done unimpaired.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: Harish Jenny K N <harish_kandiga@mentor.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:26 +02:00
e7456437c1 Bluetooth: Unwind l2cap_sock_shutdown()
l2cap_sock_shutdown() is designed to only action shutdown
of the channel when shutdown is not already in progress.
Therefore, reorganise the code flow by adding a goto
to jump to the end of function handling when shutdown is
already being actioned. This removes one level of code
indentation and make the code more readable.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: Harish Jenny K N <harish_kandiga@mentor.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:26 +02:00
09bf420f10 6lowpan: put mcast compression in an own function
This patch moves the mcast compression algorithmn to an own function
like all other compression/decompression methods in iphc.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:25 +02:00
b5af9bdbfe 6lowpan: rework tc and flow label handling
This patch reworks the handling of compression/decompression of traffic
class and flow label handling. The current method is hard to understand,
also doesn't checks if we can read the buffer from skb length.

I tried to put the shifting operations into static inline functions and
comment each steps which I did there to make it hopefully somewhat more
readable. The big mess to deal with that is the that the ipv6 header
bring the order "DSCP + ECN" but iphc uses "ECN + DSCP". Additional the
DCSP + ECN bits are splitted in ipv6_hdr inside the priority and
flow_lbl[0] fields.

I tested these compressions by using fakelb 802.15.4 driver and
manipulate the tc and flow label fields manually in function
"__ip6_local_out" before the skb will be send to lower layers. Then I
looked up the tc and flow label fields in wireshark on a wpan and lowpan
interface.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:25 +02:00
c8a3e7eb98 6lowpan: iphc: change define values
This patch has the main goal to delete shift operations. Instead we
doing masks and equals afterwards. E.g. for the SAM evaluation we
masking only the SAM value which fits in iphc1 byte, then comparing with
all possible SAM values over a switch case statement. We will not
shifting the SAM value to somewhat readable anymore.
Additional this patch slighty change the naming style like RFC 6282,
e.g. TTL to HLIM and we will drop an errno now if CID flag is set,
because we don't support it.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:25 +02:00
028b2a8c16 6lowpan: remove lowpan_is_addr_broadcast
This macro is used at 802.15.4 6LoWPAN only and can be replaced by
memcmp with the interface broadcast address.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:25 +02:00
6350047eb8 6lowpan: move IPHC functionality defines
This patch removes the IPHC related defines for doing bit manipulation
from global 6lowpan header to the iphc file which should the only one
implementation which use these defines.

Also move next header compression defines to their nhc implementation.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:25 +02:00
607b0bd3f2 6lowpan: nhc: move iphc manipulation out of nhc
This patch moves the iphc setting of next header commpression bit inside
iphc functionality. Setting of IPHC bits should be happen at iphc.c file
only.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:25 +02:00
478208e3b9 6lowpan: remove lowpan_fetch_skb_u8
This patch removes the lowpan_fetch_skb_u8 function for getting the iphc
bytes. Instead we using the generic which has a len parameter to tell
the amount of bytes to fetch.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:25 +02:00
8911d7748c 6lowpan: cleanup lowpan_header_decompress
This patch changes the lowpan_header_decompress function by removing
inklayer related information from parameters. This is currently for
supporting short and extended address for iphc handling in 802154.
We don't support short address handling anyway right now, but there
exists already code for handling short addresses in
lowpan_header_decompress.

The address parameters are also changed to a void pointer, so 6LoWPAN
linklayer specific code can put complex structures as these parameters
and cast it again inside the generic code by evaluating linklayer type
before. The order is also changed by destination address at first and
then source address, which is the same like all others functions where
destination is always the first, memcpy, dev_hard_header,
lowpan_header_compress, etc.

This patch also moves the fetching of iphc values from 6LoWPAN linklayer
specific code into the generic branch.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:24 +02:00
a6f773891a 6lowpan: cleanup lowpan_header_compress
This patch changes the lowpan_header_compress function by removing
unused parameters like "len" and drop static value parameters of
protocol type. Instead we really check the protocol type inside inside
the skb structure. Also we drop the use of IEEE802154_ADDR_LEN which is
link-layer specific. Instead we using EUI64_ADDR_LEN which should always
the default case for now.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:24 +02:00
bf513fd6fc 6lowpan: introduce LOWPAN_IPHC_MAX_HC_BUF_LEN
This patch introduces the LOWPAN_IPHC_MAX_HC_BUF_LEN define which
represent the worst-case supported IPHC buffer length. It's used to
allocate the stack buffer space for creating the IPHC header.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:24 +02:00
cefdb801c8 bluetooth: 6lowpan: use lowpan dispatch helpers
This patch adds a check if the dataroom of skb contains a dispatch value
by checking if skb->len != 0. This patch also change the dispatch
evaluation by the recently introduced helpers for checking the common
6LoWPAN dispatch values for IPv6 and IPHC header.

There was also a forgotten else branch which should drop the packet if
no matching dispatch is available.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:24 +02:00
71cd2aa53d mac802154: llsec: use kzfree
This patch will use kzfree instead kfree for security related
information which can be offered by acccident.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:24 +02:00
a6ad2a6b9c Bluetooth: Fix removing connection parameters when unpairing
The commit 89cbb0638e9b7 introduced support for deferred connection
parameter removal when unpairing by removing them only once an
existing connection gets disconnected. However, it failed to address
the scenario when we're *not* connected and do an unpair operation.

What makes things worse is that most user space BlueZ versions will
first issue a disconnect request and only then unpair, meaning the
buggy code will be triggered every time. This effectively causes the
kernel to resume scanning and reconnect to a device for which we've
removed all keys and GATT database information.

This patch fixes the issue by adding the missing call to the
hci_conn_params_del() function to a branch which handles the case of
no existing connection.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.19+
2015-10-21 00:49:24 +02:00
e131d74a3a Bluetooth: Add support setup stage internal notification event
Before the vendor specific setup stage is triggered call back into the
core to trigger an internal notification event. That event is used to
send an index update to the monitor interface. With that specific event
it is possible to update userspace with manufacturer information before
any HCI command has been executed. This is useful for early stage
debugging of vendor specific initialization sequences.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-21 00:49:23 +02:00
660f0fc07d Bluetooth: hidp: fix device disconnect on idle timeout
The HIDP specs define an idle-timeout which automatically disconnects a
device. This has always been implemented in the HIDP layer and forced a
synchronous shutdown of the hidp-scheduler. This works just fine, but
lacks a forced disconnect on the underlying l2cap channels. This has been
broken since:

    commit 5205185d461d5902325e457ca80bd421127b7308
    Author: David Herrmann <dh.herrmann@gmail.com>
    Date:   Sat Apr 6 20:28:47 2013 +0200

        Bluetooth: hidp: remove old session-management

The old session-management always forced an l2cap error on the ctrl/intr
channels when shutting down. The new session-management skips this, as we
don't want to enforce channel policy on the caller. In other words, if
user-space removes an HIDP device, the underlying channels (which are
*owned* and *referenced* by user-space) are still left active. User-space
needs to call shutdown(2) or close(2) to release them.

Unfortunately, this does not work with idle-timeouts. There is no way to
signal user-space that the HIDP layer has been stopped. The API simply
does not support any event-passing except for poll(2). Hence, we restore
old behavior and force EUNATCH on the sockets if the HIDP layer is
disconnected due to idle-timeouts (behavior of explicit disconnects
remains unmodified). User-space can still call

    getsockopt(..., SO_ERROR, ...)

..to retrieve the EUNATCH error and clear sk_err. Hence, the channels can
still be re-used (which nobody does so far, though). Therefore, the API
still supports the new behavior, but with this patch it's also compatible
to the old implicit channel shutdown.

Cc: <stable@vger.kernel.org> # 3.10+
Reported-by: Mark Haun <haunma@keteu.org>
Reported-by: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:23 +02:00
7e995b9ead Bluetooth: Add new quirk for non-persistent diagnostic settings
If the diagnostic settings are not persistent over HCI Reset, then this
quirk can be used to tell the Bluetoth core about it. This will ensure
that the settings are programmed correctly when the controller is
powered up.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-21 00:49:22 +02:00
cad20c2780 Bluetooth: Don't use remote address type to decide IRK persistency
There are LE devices on the market that start off by announcing their
public address and then once paired switch to using private address.
To be interoperable with such devices we should simply trust the fact
that we're receiving an IRK from them to indicate that they may use
private addresses in the future. Instead, simply tie the persistency
to the bonding/no-bonding information the same way as for LTKs and
CSRKs.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 00:49:21 +02:00
581d6fd60f Bluetooth: Queue diagnostic messages together with HCI packets
Sending diagnostic messages directly to the monitor socket might cause
issues for devices processing their messages in interrupt context. So
instead of trying to directly forward them, queue them up with the other
HCI packets and lets them be processed by the sockets at the same time.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-21 00:49:21 +02:00
bb77543ebd Bluetooth: Restrict valid packet types via HCI_CHANNEL_RAW
When using the HCI_CHANNEL_RAW, restrict the packet types to valid ones
from the Bluetooth specification.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-21 00:49:21 +02:00
8cd4f58142 Bluetooth: Remove quirk for HCI_VENDOR_PKT filter handling
The HCI_VENDOR_PKT quirk was needed for BPA-100/105 devices that send
these messages. Now that there is support for proper diagnostic channel
this quirk is no longer needed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-21 00:49:21 +02:00
26440c835f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/asix_common.c
	net/ipv4/inet_connection_sock.c
	net/switchdev/switchdev.c

In the inet_connection_sock.c case the request socket hashing scheme
is completely different in net-next.

The other two conflicts were overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-20 06:08:27 -07:00
9abebb8a10 NFC: delete null dereference
The exit label performs device_unlock(&dev->dev);, which will fail when dev
is NULL, and nfc_put_device(dev);, which is not useful when dev is NULL, so
just exit the function immediately.

Problem found using scripts/coccinelle/null/deref_null.cocci

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-19 20:10:37 +02:00
1099f86044 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Account for extra headroom in ath9k driver, from Felix Fietkau.

 2) Fix OOPS in pppoe driver due to incorrect socket state transition,
    from Guillaume Nault.

 3) Kill memory leak in amd-xgbe debugfx, from Geliang Tang.

 4) Power management fixes for iwlwifi, from Johannes Berg.

 5) Fix races in reqsk_queue_unlink(), from Eric Dumazet.

 6) Fix dst_entry usage in ARP replies, from Jiri Benc.

 7) Cure OOPSes with SO_GET_FILTER, from Daniel Borkmann.

 8) Missing allocation failure check in amd-xgbe, from Tom Lendacky.

 9) Various resource allocation/freeing cures in DSA< from Neil
    Armstrong.

10) A series of bug fixes in the openvswitch conntrack support, from
    Joe Stringer.

11) Fix two cases (BPF and act_mirred) where we have to clean the sender
    cpu stored in the SKB before transmitting.  From WANG Cong and
    Alexei Starovoitov.

12) Disable VLAN filtering in promiscuous mode in mlx5 driver, from
    Achiad Shochat.

13) Older bnx2x chips cannot do 4-tuple UDP hashing, so prevent this
    configuration via ethtool.  From Yuval Mintz.

14) Don't call rt6_uncached_list_flush_dev() from rt6_ifdown() when
    'dev' is NULL, from Eric Biederman.

15) Prevent stalled link synchronization in tipc, from Jon Paul Maloy.

16) kcalloc() gstrings ethtool buffer before having driver fill it in,
    in order to prevent kernel memory leaking.  From Joe Perches.

17) Fix mixxing rt6_info initialization for blackhole routes, from
    Martin KaFai Lau.

18) Kill VLAN regression in via-rhine, from Andrej Ota.

19) Missing pfmemalloc check in sk_add_backlog(), from Eric Dumazet.

20) Fix spurious MSG_TRUNC signalling in netlink dumps, from Ronen Arad.

21) Scrube SKBs when pushing them between namespaces in openvswitch,
    from Joe Stringer.

22) bcmgenet enables link interrupts too early, fix from Florian
    Fainelli.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits)
  net: bcmgenet: Fix early link interrupt enabling
  tunnels: Don't require remote endpoint or ID during creation.
  openvswitch: Scrub skb between namespaces
  xen-netback: correctly check failed allocation
  net: asix: add support for the Billionton GUSB2AM-1G-B USB adapter
  netlink: Trim skb to alloc size to avoid MSG_TRUNC
  net: add pfmemalloc check in sk_add_backlog()
  via-rhine: fix VLAN receive handling regression.
  ipv6: Initialize rt6_info properly in ip6_blackhole_route()
  ipv6: Move common init code for rt6_info to a new function rt6_info_init()
  Bluetooth: Fix initializing conn_params in scan phase
  Bluetooth: Fix conn_params list update in hci_connect_le_scan_cleanup
  Bluetooth: Fix remove_device behavior for explicit connects
  Bluetooth: Fix LE reconnection logic
  Bluetooth: Fix reference counting for LE-scan based connections
  Bluetooth: Fix double scan updates
  mlxsw: core: Fix race condition in __mlxsw_emad_transmit
  tipc: move fragment importance field to new header position
  ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings
  tipc: eliminate risk of stalled link synchronization
  ...
2015-10-19 09:55:40 -07:00
ca064bd893 xfrm: Fix pmtu discovery for local generated packets.
Commit 044a832a777 ("xfrm: Fix local error reporting crash
with interfamily tunnels") moved the setting of skb->protocol
behind the last access of the inner mode family to fix an
interfamily crash. Unfortunately now skb->protocol might not
be set at all, so we fail dispatch to the inner address family.
As a reault, the local error handler is not called and the
mtu value is not reported back to userspace.

We fix this by setting skb->protocol on message size errors
before we call xfrm_local_error.

Fixes: 044a832a7779c ("xfrm: Fix local error reporting crash with interfamily tunnels")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-10-19 10:30:05 +02:00
371f1c7e0d Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree. Most relevantly, updates for the nfnetlink_log to integrate with
conntrack, fixes for cttimeout and improvements for nf_queue core, they are:

1) Remove useless ifdef around static inline function in IPVS, from
   Eric W. Biederman.

2) Simplify the conntrack support for nfnetlink_queue: Merge
   nfnetlink_queue_ct.c file into nfnetlink_queue_core.c, then rename it back
   to nfnetlink_queue.c

3) Use y2038 safe timestamp from nfnetlink_queue.

4) Get rid of dead function definition in nf_conntrack, from Flavio
   Leitner.

5) Attach conntrack support for nfnetlink_log.c, from Ken-ichirou MATSUZAWA.
   This adds a new NETFILTER_NETLINK_GLUE_CT Kconfig switch that
   controls enabling both nfqueue and nflog integration with conntrack.
   The userspace application can request this via NFULNL_CFG_F_CONNTRACK
   configuration flag.

6) Remove unused netns variables in IPVS, from Eric W. Biederman and
   Simon Horman.

7) Don't put back the refcount on the cttimeout object from xt_CT on success.

8) Fix crash on cttimeout policy object removal. We have to flush out
   the cttimeout extension area of the conntrack not to refer to an unexisting
   object that was just removed.

9) Make sure rcu_callback completion before removing nfnetlink_cttimeout
   module removal.

10) Fix compilation warning in br_netfilter when no nf_defrag_ipv4 and
    nf_defrag_ipv6 are enabled. Patch from Arnd Bergmann.

11) Autoload ctnetlink dependencies when NFULNL_CFG_F_CONNTRACK is
    requested. Again from Ken-ichirou MATSUZAWA.

12) Don't use pointer to previous hook when reinjecting traffic via
    nf_queue with NF_REPEAT verdict since it may be already gone. This
    also avoids a deadloop if the userspace application keeps returning
    NF_REPEAT.

13) A bunch of cleanups for netfilter IPv4 and IPv6 code from Ian Morris.

14) Consolidate logger instance existence check in nfulnl_recv_config().

15) Fix broken atomicity when applying configuration updates to logger
    instances in nfnetlink_log.

16) Get rid of the .owner attribute in our hook object. We don't need
    this anymore since we're dropping pending packets that have escaped
    from the kernel when unremoving the hook. Patch from Florian Westphal.

17) Remove unnecessary rcu_read_lock() from nf_reinject code, we always
    assume RCU read side lock from .call_rcu in nfnetlink. Also from Florian.

18) Use static inline function instead of macros to define NF_HOOK() and
    NF_HOOK_COND() when no netfilter support in on, from Arnd Bergmann.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18 22:48:34 -07:00
7b4b000951 RDS: fix rds-ping deadlock over TCP transport
Sowmini found hang with rds-ping while testing RDS over TCP. Its
a corner case and doesn't happen always. The issue is not reproducible
with IB transport. Its clear from below dump why we see it with RDS TCP.

 [<ffffffff8153b7e5>] do_tcp_setsockopt+0xb5/0x740
 [<ffffffff8153bec4>] tcp_setsockopt+0x24/0x30
 [<ffffffff814d57d4>] sock_common_setsockopt+0x14/0x20
 [<ffffffffa096071d>] rds_tcp_xmit_prepare+0x5d/0x70 [rds_tcp]
 [<ffffffffa093b5f7>] rds_send_xmit+0xd7/0x740 [rds]
 [<ffffffffa093bda2>] rds_send_pong+0x142/0x180 [rds]
 [<ffffffffa0939d34>] rds_recv_incoming+0x274/0x330 [rds]
 [<ffffffff810815ae>] ? ttwu_queue+0x11e/0x130
 [<ffffffff814dcacd>] ? skb_copy_bits+0x6d/0x2c0
 [<ffffffffa0960350>] rds_tcp_data_recv+0x2f0/0x3d0 [rds_tcp]
 [<ffffffff8153d836>] tcp_read_sock+0x96/0x1c0
 [<ffffffffa0960060>] ? rds_tcp_recv_init+0x40/0x40 [rds_tcp]
 [<ffffffff814d6a90>] ? sock_def_write_space+0xa0/0xa0
 [<ffffffffa09604d1>] rds_tcp_data_ready+0xa1/0xf0 [rds_tcp]
 [<ffffffff81545249>] tcp_data_queue+0x379/0x5b0
 [<ffffffffa0960cdb>] ? rds_tcp_write_space+0xbb/0x110 [rds_tcp]
 [<ffffffff81547fd2>] tcp_rcv_established+0x2e2/0x6e0
 [<ffffffff81552602>] tcp_v4_do_rcv+0x122/0x220
 [<ffffffff81553627>] tcp_v4_rcv+0x867/0x880
 [<ffffffff8152e0b3>] ip_local_deliver_finish+0xa3/0x220

This happens because rds_send_xmit() chain wants to take
sock_lock which is already taken by tcp_v4_rcv() on its
way to rds_tcp_data_ready(). Commit db6526dcb51b ("RDS: use
rds_send_xmit() state instead of RDS_LL_SEND_FULL") which
was trying to opportunistically finish the send request
in same thread context.

But because of above recursive lock hang with RDS TCP,
the send work from rds_send_pong() needs to deferred to
worker to avoid lock up. Given RDS ping is more of connectivity
test than performance critical path, its should be ok even
for transport like IB.

Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by:  Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18 22:45:55 -07:00
dc6ef6be52 tcp: do not set queue_mapping on SYNACK
At the time of commit fff326990789 ("tcp: reflect SYN queue_mapping into
SYNACK packets") we had little ways to cope with SYN floods.

We no longer need to reflect incoming skb queue mappings, and instead
can pick a TX queue based on cpu cooking the SYNACK, with normal XPS
affinities.

Note that all SYNACK retransmits were picking TX queue 0, this no longer
is a win given that SYNACK rtx are now distributed on all cpus.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18 22:26:02 -07:00
740dbc2891 openvswitch: Scrub skb between namespaces
If OVS receives a packet from another namespace, then the packet should
be scrubbed. However, people have already begun to rely on the behaviour
that skb->mark is preserved across namespaces, so retain this one field.

This is mainly to address information leakage between namespaces when
using OVS internal ports, but by placing it in ovs_vport_receive() it is
more generally applicable, meaning it should not be overlooked if other
port types are allowed to be moved into namespaces in future.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18 22:24:50 -07:00
a5d6f7dd30 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
pull request: bluetooth 2015-10-16

First of all, sorry for the late set of patches for the 4.3 cycle. We
just finished an intensive week of testing at the Bluetooth UnPlugFest
and discovered (and fixed) issues there. Unfortunately a few issues
affect 4.3-rc5 in a way that they break existing Bluetooth LE mouse and
keyboard support.

The regressions result from supporting LE privacy in conjunction with
scanning for Resolvable Private Addresses before connecting. A feature
that has been tested heavily (including automated unit tests), but sadly
some regressions slipped in. The UnPlugFest with its multitude of test
platforms is a good battle testing ground for uncovering every corner
case.

The patches in this pull request focus only on fixing the regressions in
4.3-rc5. The patches look a bit larger since we also added comments in
the critical sections of the fixes to improve clarity.

I would appreciate if we can get these regression fixes to Linus
quickly. Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18 22:23:33 -07:00
db65a3aaf2 netlink: Trim skb to alloc size to avoid MSG_TRUNC
netlink_dump() allocates skb based on the calculated min_dump_alloc or
a per socket max_recvmsg_len.
min_alloc_size is maximum space required for any single netdev
attributes as calculated by rtnl_calcit().
max_recvmsg_len tracks the user provided buffer to netlink_recvmsg.
It is capped at 16KiB.
The intention is to avoid small allocations and to minimize the number
of calls required to obtain dump information for all net devices.

netlink_dump packs as many small messages as could fit within an skb
that was sized for the largest single netdev information. The actual
space available within an skb is larger than what is requested. It could
be much larger and up to near 2x with align to next power of 2 approach.

Allowing netlink_dump to use all the space available within the
allocated skb increases the buffer size a user has to provide to avoid
truncaion (i.e. MSG_TRUNG flag set).

It was observed that with many VLANs configured on at least one netdev,
a larger buffer of near 64KiB was necessary to avoid "Message truncated"
error in "ip link" or "bridge [-c[ompressvlans]] vlan show" when
min_alloc_size was only little over 32KiB.

This patch trims skb to allocated size in order to allow the user to
avoid truncation with more reasonable buffer size.

Signed-off-by: Ronen Arad <ronen.arad@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18 19:34:12 -07:00
26fb342c73 ipconfig: send Client-identifier in DHCP requests
A dhcp server may provide parameters to a client from a pool of IP
addresses and using a shared rootfs, or provide a specific set of
parameters for a specific client, usually using the MAC address to
identify each client individually. The dhcp protocol also specifies
a client-id field which can be used to determine the correct
parameters to supply when no MAC address is available. There is
currently no way to tell the kernel to supply a specific client-id,
only the userspace dhcp clients support this feature, but this can
not be used when the network is needed before userspace is available
such as when the root filesystem is on NFS.

This patch is to be able to do something like "ip=dhcp,client_id_type,
client_id_value", as a kernel parameter to enable the kernel to
identify itself to the server.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-18 19:23:52 -07:00
fef062cbf2 tty: Remove ASYNC_CLOSING checks in open()/hangup() methods
Since at least before 2.6.30, tty drivers that do not drop the tty lock
while closing cannot observe ASYNC_CLOSING set while holding the
tty lock; this includes the tty driver's open() and hangup() methods,
since the tty core calls these methods holding the tty lock.

For these drivers, waiting for ASYNC_CLOSING to clear while opening
is not required, since this condition cannot occur. Similarly, even
when the open() method drops and reacquires the tty lock after
blocking, ASYNC_CLOSING cannot be set (again, for drivers that
do not drop the tty lock while closing).

Now that tty port drivers no longer drop the tty lock while closing
(since 'tty: Remove tty_wait_until_sent_from_close()'), the same
conditions apply: waiting for ASYNC_CLOSING to clear while opening
is not required, nor is re-checking ASYNC_CLOSING after dropping and
reacquiring the tty lock while blocking (eg., in *_block_til_ready()).

Note: The ASYNC_CLOSING flag state is still maintained since several
bitrotting drivers use it for (dubious) other purposes.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-17 21:11:29 -07:00