Currently netem is not in the ability to emulate channel bandwidth. Only static
delay (and optional random jitter) can be configured.
To emulate the channel rate the token bucket filter (sch_tbf) can be used. But
TBF has some major emulation flaws. The buffer (token bucket depth/rate) cannot
be 0. Also the idea behind TBF is that the credit (token in buckets) fills if
no packet is transmitted. So that there is always a "positive" credit for new
packets. In real life this behavior contradicts the law of nature where
nothing can travel faster as speed of light. E.g.: on an emulated 1000 byte/s
link a small IPv4/TCP SYN packet with ~50 byte require ~0.05 seconds - not 0
seconds.
Netem is an excellent place to implement a rate limiting feature: static
delay is already implemented, tfifo already has time information and the
user can skip TBF configuration completely.
This patch implement rate feature which can be configured via tc. e.g:
tc qdisc add dev eth0 root netem rate 10kbit
To emulate a link of 5000byte/s and add an additional static delay of 10ms:
tc qdisc add dev eth0 root netem delay 10ms rate 5KBps
Note: similar to TBF the rate extension is bounded to the kernel timing
system. Depending on the architecture timer granularity, higher rates (e.g.
10mbit/s and higher) tend to transmission bursts. Also note: further queues
living in network adaptors; see ethtool(8).
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@drr.davemloft.net>
Need not to used 'delta' flag when add single-source to interface
filter source list.
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
Signed-off-by: David S. Miller <davem@drr.davemloft.net>
Need not to used 'delta' flag when add single-source to interface
filter source list.
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
Signed-off-by: David S. Miller <davem@drr.davemloft.net>
Instead of instantiating an entire new neigh_table instance
just for ATM handling, use the neigh device private facility.
Signed-off-by: David S. Miller <davem@davemloft.net>
netdev->neigh_priv_len records the private area length.
This will trigger for neigh_table objects which set tbl->entry_size
to zero, and the first instances of this will be forthcoming.
Signed-off-by: David S. Miller <davem@davemloft.net>
We are going to alloc for device specific private areas for
neighbour entries, and in order to do that we have to move
away from the fixed allocation size enforced by using
neigh_table->kmem_cachep
As a nice side effect we can now use kfree_rcu().
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit f2c31e32b378 (fix NULL dereferences in check_peer_redir()),
dst_get_neighbour() should be guarded by rcu_read_lock() /
rcu_read_unlock() section.
Reported-by: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need rcu_read_lock() protection before using dst_get_neighbour(), and
we must cache its value (pass it to __teql_resolve())
teql_master_xmit() is called under rcu_read_lock_bh() protection, its
not enough.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Jones reported that TCP_CONGESTION sockopt performed on a listener
was ignored for its children sockets : right after accept() the
congestion control for new socket is the system default one.
This seems an oversight of the initial design (quoted from Stephen)
Based on prior investigation and patch from Rick.
Reported-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Yuchung Cheng <ycheng@google.com>
Tested-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Logging messages that mimic function tracer enter/exit
aren't necessary. Just remove them.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
All uses have been removed, so killing what's not necessary.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Using the standard debugging mechanisms is better than
subsystem specific ones when the subsystem doesn't use
a specific struct.
Coalesce long formats.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Using the normal logging styles is preferred over
subsystem specific styles when the subsystem does
not take a specific struct.
Convert nfc_<level> specific messages to pr_<level>
Add newlines to uses.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The on-channel work optimisations have caused a
number of issues, and the code is unfortunately
very complex and almost impossible to follow.
Instead of attempting to put in more workarounds
let's just remove those optimisations, we can
work on them again later, after we change the
whole auth/assoc design.
This should fix rate_control_send_low() warnings,
see RH bug 731365.
Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch converts the drivers in net/rfkill/* to use the
module_platform_driver() macro which makes the code smaller and a bit
simpler.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Antonio Ospite <ospite@studenti.unina.it>
Cc: Rhyland Klein <rklein@nvidia.com>
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Rhyland Klein <rklein@nvidia.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The rates bitmap for internal scan requests shoud be filled,
otherwise there will be probe requests with zero rates supported.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Cc: stable@vger.kernel.org
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes' patch for "cfg80211: fix regulatory NULL dereference"
broke user regulaotry hints and it did not address the fact that
last_request was left populated even if the previous regulatory
hint was stale due to the wiphy disappearing.
Fix user reguluatory hints by only bailing out if for those
regulatory hints where a request_wiphy is expected. The stale last_request
considerations are addressed through the previous fixes on last_request
where we reset the last_request to a static world regdom request upon
reset_regdomains(). In this case though we further enhance the effect
by simply restoring reguluatory settings completely.
Cc: stable@vger.kernel.org
Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
There is a theoretical race that if hit will trigger
a crash. The race is between when we issue the first
regulatory hint, regulatory_hint_core(), gets processed
by the workqueue and between when the first device
gets registered to the wireless core. This is not easy
to reproduce but it was easy to do so through the
regulatory simulator I have been working on. This
is a port of the fix I implemented there [1].
[1] a246ccf81f
Cc: stable@vger.kernel.org
Cc: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Commit 7dc00c82 added a 'notify' parameter for vif_delete() to
distinguish whether to unregister the device.
When notify=1 means we does not need to unregister the device,
so calling unregister_netdevice_many is useless.
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change function rcu_dereference to rcu_dereference_bh to avoid warning
[ INFO: suspicious RCU usage. ]
-------------------------------
net/core/dev.c:2459 suspicious rcu_dereference_check() usage!
because we are locking with
rcu_read_lock_bh();
in function dev_queue_xmit(struct sk_buff *skb)
Signed-off-by: Igor Maravic <igorm@etf.rs>
Signed-off-by: David S. Miller <davem@davemloft.net>
A recent fix to the the NetLabel code caused build problem with
configurations that did not have IPv6 enabled; see below:
netlabel_kapi.c: In function 'netlbl_cfg_unlbl_map_add':
netlabel_kapi.c:165:4:
error: implicit declaration of function 'netlbl_af6list_add'
This patch fixes this problem by making the IPv6 specific code conditional
on the IPv6 configuration flags as we done in the rest of NetLabel and the
network stack as a whole. We have to move some variable declarations
around as a result so things may not be quite as pretty, but at least it
builds cleanly now.
Some additional IPv6 conditionals were added to the NetLabel code as well
for the sake of consistency.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check from commit 30c2235c is incomplete and cannot prevent
cases like key_len = 0x80000000 (INT_MAX + 1). In that case, the
left-hand side of the check (INT_MAX - key_len), which is unsigned,
becomes 0xffffffff (UINT_MAX) and bypasses the check.
However this shouldn't be a security issue. The function is called
from the following two code paths:
1) setsockopt()
2) sctp_auth_asoc_set_secret()
In case (1), sca_keylength is never going to exceed 65535 since it's
bounded by a u16 from the user API. As such, the key length will
never overflow.
In case (2), sca_keylength is computed based on the user key (1 short)
and 2 * key_vector (3 shorts) for a total of 7 * USHRT_MAX, which still
will not overflow.
In other words, this overflow check is not really necessary. Just
make it more correct.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Cc: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using a custom flow dissector, use skb_flow_dissect() and
benefit from tunnelling support.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using a custom flow dissector, use skb_flow_dissect() and
benefit from tunnelling support.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_sendmsg() uses select_size() helper to choose skb head size when a
new skb must be allocated.
If GSO is enabled for the socket, current strategy is to force all
payload data to be outside of headroom, in PAGE fragments.
This strategy is not welcome for small packets, wasting memory.
Experiments show that best results are obtained when using 2048 bytes
for skb head (This includes the skb overhead and various headers)
This patch provides better len/truesize ratios for packets sent to
loopback device, and reduce memory needs for in-flight loopback packets,
particularly on arches with big pages.
If a sender sends many 1-byte packets to an unresponsive application,
receiver rmem_alloc will grow faster and will stop queuing these packets
sooner, or will collapse its receive queue to free excess memory.
netperf -t TCP_RR results are improved by ~4 %, and many workloads are
improved as well (tbench, mysql...)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Le lundi 28 novembre 2011 à 19:06 -0500, David Miller a écrit :
> From: Dimitris Michailidis <dm@chelsio.com>
> Date: Mon, 28 Nov 2011 08:25:39 -0800
>
> >> +bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys
> >> *flow)
> >> +{
> >> + int poff, nhoff = skb_network_offset(skb);
> >> + u8 ip_proto;
> >> + u16 proto = skb->protocol;
> >
> > __be16 instead of u16 for proto?
>
> I'll take care of this when I apply these patches.
( CC trimmed )
Thanks David !
Here is a small patch to use one 64bit load/store on x86_64 instead of
two 32bit load/stores.
[PATCH net-next] flow_dissector: use a 64bit load/store
gcc compiler is smart enough to use a single load/store if we
memcpy(dptr, sptr, 8) on x86_64, regardless of
CONFIG_CC_OPTIMIZE_FOR_SIZE
In IP header, daddr immediately follows saddr, this wont change in the
future. We only need to make sure our flow_keys (src,dst) fields wont
break the rule.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Networking stack support for byte queue limits, uses dynamic queue
limits library. Byte queue limits are maintained per transmit queue,
and a dql structure has been added to netdev_queue structure for this
purpose.
Configuration of bql is in the tx-<n> sysfs directory for the queue
under the byte_queue_limits directory. Configuration includes:
limit_min, bql minimum limit
limit_max, bql maximum limit
hold_time, bql slack hold time
Also under the directory are:
limit, current byte limit
inflight, current number of bytes on the queue
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the xps specific parts in netdev_queue_release into
its own function which netdev_queue_release can call. This allows
netdev_queue_release to be more generic (for adding new attributes
to tx queues).
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create separate queue state flags so that either the stack or drivers
can turn on XOFF. Added a set of functions used in the stack to determine
if a queue is really stopped (either by stack or driver)
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since 2005 (c1b4a7e69576d65efc31a8cea0714173c2841244)
tcp_tso_should_defer has been using tcp_max_burst() as a target limit
for deciding how large to make outgoing TSO packets when not using
sysctl_tcp_tso_win_divisor. But since 2008
(dd9e0dda66ba38a2ddd1405ac279894260dc5c36) tcp_max_burst() returns the
reordering degree. We should not have tcp_tso_should_defer attempt to
build larger segments just because there is more reordering. This
commit splits the notion of deferral size used in TSO from the notion
of burst size used in cwnd moderation, and returns the TSO deferral
limit to its original value.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use memcmp() instead of cast to u16 when checking the PAD field.
Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org>
Signed-off-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
Routed payload requires less headroom than bridged payload.
So do not reallocate headroom if not needed.
Also, add worst case AAL5 overhead to netdev->hard_header_len.
Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org>
Signed-off-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil>
Signed-off-by: David S. Miller <davem@davemloft.net>
We can test/set multiple bits from sk_flags at once, to shorten a bit
socket setup/dismantle phase.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Maravic reported an error caused by jump_label_dec() being called
from IRQ context :
BUG: sleeping function called from invalid context at kernel/mutex.c:271
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper
1 lock held by swapper/0:
#0: (&n->timer){+.-...}, at: [<ffffffff8107ce90>] call_timer_fn+0x0/0x340
Pid: 0, comm: swapper Not tainted 3.2.0-rc2-net-next-mpls+ #1
Call Trace:
<IRQ> [<ffffffff8104f417>] __might_sleep+0x137/0x1f0
[<ffffffff816b9a2f>] mutex_lock_nested+0x2f/0x370
[<ffffffff810a89fd>] ? trace_hardirqs_off+0xd/0x10
[<ffffffff8109a37f>] ? local_clock+0x6f/0x80
[<ffffffff810a90a5>] ? lock_release_holdtime.part.22+0x15/0x1a0
[<ffffffff81557929>] ? sock_def_write_space+0x59/0x160
[<ffffffff815e936e>] ? arp_error_report+0x3e/0x90
[<ffffffff810969cd>] atomic_dec_and_mutex_lock+0x5d/0x80
[<ffffffff8112fc1d>] jump_label_dec+0x1d/0x50
[<ffffffff81566525>] net_disable_timestamp+0x15/0x20
[<ffffffff81557a75>] sock_disable_timestamp+0x45/0x50
[<ffffffff81557b00>] __sk_free+0x80/0x200
[<ffffffff815578d0>] ? sk_send_sigurg+0x70/0x70
[<ffffffff815e936e>] ? arp_error_report+0x3e/0x90
[<ffffffff81557cba>] sock_wfree+0x3a/0x70
[<ffffffff8155c2b0>] skb_release_head_state+0x70/0x120
[<ffffffff8155c0b6>] __kfree_skb+0x16/0x30
[<ffffffff8155c119>] kfree_skb+0x49/0x170
[<ffffffff815e936e>] arp_error_report+0x3e/0x90
[<ffffffff81575bd9>] neigh_invalidate+0x89/0xc0
[<ffffffff81578dbe>] neigh_timer_handler+0x9e/0x2a0
[<ffffffff81578d20>] ? neigh_update+0x640/0x640
[<ffffffff81073558>] __do_softirq+0xc8/0x3a0
Since jump_label_{inc|dec} must be called from process context only,
we must defer jump_label_dec() if net_disable_timestamp() is called
from interrupt context.
Reported-by: Igor Maravic <igorm@etf.rs>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Linux coding style wants the return statement on its own line.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
nr_route.ndigis is unsigned int so the nr_route.ndigis < 0 expression is
never true and can be dropped. Doing the nr_ax25_dev_get call later
allows the nr_route.ndigis test to bail out without having to dev_put.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Osterried <thomas@osterried.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct nr_route_struct's mnemonic permits a string of up to 7 bytes to be
used. If userland passes a not zero terminated string to the kernel adding
a node to the routing table might result in the kernel attempting to read
copy a too long string.
Mnemonic is part of the NET/ROM routing protocol; NET/ROM routing table
updates only broadcast 6 bytes. The 7th byte in the mnemonic array exists
only as a \0 termination character for the kernel code's convenience.
Fixed by rejecting mnemonic strings that have no terminating \0 in the first
7 characters. Do this test only NETROM_NODE to avoid breaking NETROM_NEIGH
where userland might passing an uninitialized mnemonic field.
Initial patch by Dan Carpenter <dan.carpenter@oracle.com>.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Walter Harms <wharms@bfs.de>
Cc: Thomas Osterried <thomas@osterried.de>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Very large, nonsenical arguments or use in very extreme conditions could
result in integer overflows. Check ioctls arguments to avoid such
overflows and return -EINVAL for too large arguments.
To allow the use of AX.25 for even the most extreme setup (think packet
radio to the Phase 5E mars probe) we make no further attempt to clamp the
argument range.
Originally reported by Fan Long <longfancn@gmail.com> and a first patch
was sent by Xi Wang <xi.wang@gmail.com>.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Xi Wang <xi.wang@gmail.com>
Cc: Joerg Reuter <jreuter@yaina.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Thomas Osterried <thomas@osterried.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support for specific hardware belongs under drivers/net/ not net/.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Any headers included by drivers should be under include/, and
any definitions they use are not really private to the core as
the name "dsa_priv.h" suggests.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
I mistakenly exported functions from slave.c that are only called from
dsa.c, part of the same module.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current SFB double hashing is not fulfilling SFB theory, if two flows
share same rxhash value.
Using skb_flow_dissect() permits to really have better hash dispersion,
and get tunnelling support as well.
Double hashing point was mentioned by Florian Westphal
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of using a custom flow dissector, use skb_flow_dissect() and
benefit from tunnelling support.
This lack of tunnelling support was mentioned by Dan Siemon.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>