android_kernel_samsung_sm8650/net/core
David Ahern 58956317c8 neighbor: Improve garbage collection
The existing garbage collection algorithm has a number of problems:

1. The gc algorithm will not evict PERMANENT entries as those entries
   are managed by userspace, yet the existing algorithm walks the entire
   hash table which means it always considers PERMANENT entries when
   looking for entries to evict. In some use cases (e.g., EVPN) there
   can be tens of thousands of PERMANENT entries leading to wasted
   CPU cycles when gc kicks in. As an example, with 32k permanent
   entries, neigh_alloc has been observed taking more than 4 msec per
   invocation.

2. Currently, when the number of neighbor entries hits gc_thresh2 and
   the last flush for the table was more than 5 seconds ago gc kicks in
   walks the entire hash table evicting *all* entries not in PERMANENT
   or REACHABLE state and not marked as externally learned. There is no
   discriminator on when the neigh entry was created or if it just moved
   from REACHABLE to another NUD_VALID state (e.g., NUD_STALE).

   It is possible for entries to be created or for established neighbor
   entries to be moved to STALE (e.g., an external node sends an ARP
   request) right before the 5 second window lapses:

        -----|---------x|----------|-----
            t-5         t         t+5

   If that happens those entries are evicted during gc causing unnecessary
   thrashing on neighbor entries and userspace caches trying to track them.

   Further, this contradicts the description of gc_thresh2 which says
   "Entries older than 5 seconds will be cleared".

   One workaround is to make gc_thresh2 == gc_thresh3 but that negates the
   whole point of having separate thresholds.

3. Clearing *all* neigh non-PERMANENT/REACHABLE/externally learned entries
   when gc_thresh2 is exceeded is over kill and contributes to trashing
   especially during startup.

This patch addresses these problems as follows:

1. Use of a separate list_head to track entries that can be garbage
   collected along with a separate counter. PERMANENT entries are not
   added to this list.

   The gc_thresh parameters are only compared to the new counter, not the
   total entries in the table. The forced_gc function is updated to only
   walk this new gc_list looking for entries to evict.

2. Entries are added to the list head at the tail and removed from the
   front.

3. Entries are only evicted if they were last updated more than 5 seconds
   ago, adhering to the original intent of gc_thresh2.

4. Forced gc is stopped once the number of gc_entries drops below
   gc_thresh2.

5. Since gc checks do not apply to PERMANENT entries, gc levels are skipped
   when allocating a new neighbor for a PERMANENT entry. By extension this
   means there are no explicit limits on the number of PERMANENT entries
   that can be created, but this is no different than FIB entries or FDB
   entries.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-07 16:03:10 -08:00
..
datagram.c net: dump more useful information in netdev_rx_csum_fault() 2018-11-15 11:37:04 -08:00
dev_addr_lists.c net: core: dev_addr_lists: add auxiliary func to handle reference address updates 2018-11-08 20:30:57 -08:00
dev_ioctl.c net: core: dev: Add extack argument to dev_change_flags() 2018-12-06 13:26:07 -08:00
dev.c net: core: dev: Attach extack to NETDEV_PRE_UP 2018-12-06 13:26:07 -08:00
devlink.c devlink: Add 'fw_load_policy' generic parameter 2018-12-03 13:55:43 -08:00
drop_monitor.c treewide: setup_timer() -> timer_setup() 2017-11-21 15:57:07 -08:00
dst_cache.c net: core: dst_cache_set_ip6: Rename 'addr' parameter to 'saddr' for consistency 2018-03-05 12:52:45 -05:00
dst.c netfilter: nf_tables: add tunnel support 2018-08-03 21:12:12 +02:00
ethtool.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-10-19 11:03:06 -07:00
failover.c net: Introduce generic failover module 2018-05-28 22:59:54 -04:00
fib_notifier.c net: Fix fib notifer to return errno 2018-03-29 14:10:30 -04:00
fib_rules.c net/fib_rules: Update fib_nl_dumprule for strict data checking 2018-10-08 10:39:05 -07:00
filter.c bpf: helper to pop data from messages 2018-11-28 22:07:57 +01:00
flow_dissector.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-11-11 17:57:54 -08:00
gen_estimator.c net: core: protect rate estimator statistics pointer with lock 2018-08-11 12:37:10 -07:00
gen_stats.c net/core: make function ___gnet_stats_copy_basic() static 2018-09-28 10:25:11 -07:00
gro_cells.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
hwbm.c net: hwbm: Fix unbalanced spinlock in error case 2016-05-25 12:35:09 -07:00
link_watch.c net: linkwatch: add check for netdevice being present to linkwatch_do_dev 2018-09-19 21:06:46 -07:00
lwt_bpf.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2018-08-07 11:02:05 -07:00
lwtunnel.c ipv6: sr: define core operations for seg6local lightweight tunnel 2017-08-07 14:16:22 -07:00
Makefile bpf, sockmap: convert to generic sk_msg interface 2018-10-15 12:23:19 -07:00
neighbour.c neighbor: Improve garbage collection 2018-12-07 16:03:10 -08:00
net_namespace.c netns: enable to dump full nsid translation table 2018-11-27 16:20:20 -08:00
net-procfs.c proc: introduce proc_create_net{,_data} 2018-05-16 07:24:30 +02:00
net-sysfs.c net: core: dev: Add extack argument to dev_change_flags() 2018-12-06 13:26:07 -08:00
net-sysfs.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
net-traces.c net/ipv6: Udate fib6_table_lookup tracepoint 2018-05-24 23:01:15 -04:00
netclassid_cgroup.c cgroup, netclassid: add a preemption point to write_classid 2018-10-23 12:58:17 -07:00
netevent.c netevent: remove automatic variable in register_netevent_notifier() 2015-05-31 00:03:21 -07:00
netpoll.c net: core: dev: Add extack argument to dev_open() 2018-12-06 13:26:06 -08:00
netprio_cgroup.c net: remove duplicate includes 2017-12-13 13:18:46 -05:00
page_pool.c net/page_pool: Fix inconsistent lock state warning 2018-07-19 23:23:01 -07:00
pktgen.c pktgen: Fix fall-through annotation 2018-09-13 15:36:41 -07:00
ptp_classifier.c ptp: Change ptp_class to a proper bitmask 2015-11-03 11:08:22 -05:00
request_sock.c ipv4: Namespaceify tcp_max_syn_backlog knob 2016-12-29 11:38:31 -05:00
rtnetlink.c net: core: dev: Add extack argument to __dev_change_flags() 2018-12-06 13:26:07 -08:00
scm.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/user.h> 2017-03-02 08:42:29 +01:00
secure_seq.c infiniband: i40iw, nes: don't use wall time for TCP sequence numbers 2018-07-11 12:10:19 -06:00
skbuff.c skbuff: Rename 'offload_mr_fwd_mark' to 'offload_l3_fwd_mark' 2018-12-04 08:36:36 -08:00
skmsg.c tls: convert to generic sk_msg interface 2018-10-15 12:23:19 -07:00
sock_diag.c net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd() 2018-08-14 10:01:24 -07:00
sock_map.c bpf: skmsg, fix psock create on existing kcm/tls port 2018-10-20 00:40:45 +02:00
sock_reuseport.c sctp: add sock_reuseport for the sock in __sctp_hash_endpoint 2018-11-12 09:09:51 -08:00
sock.c udp: msg_zerocopy 2018-12-03 15:58:32 -08:00
stream.c tcp: reduce POLLOUT events caused by TCP_NOTSENT_LOWAT 2018-12-04 21:21:18 -08:00
sysctl_net_core.c bpf: add bpf_jit_limit knob to restrict unpriv allocations 2018-10-25 17:11:42 -07:00
timestamping.c net: skb_defer_rx_timestamp should check for phydev before setting up classify 2015-07-09 14:17:15 -07:00
tso.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
utils.c net: Remove some unneeded semicolon 2018-08-04 13:05:39 -07:00
xdp.c xdp: remove redundant variable 'headroom' 2018-09-01 01:35:53 +02:00