android_kernel_xiaomi_sm8450/net/sched
Vladimir Oltean c44e96d6c3 Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs"
commit af7b29b1deaac6da3bb7637f0e263dfab7bfc7a3 upstream.

taprio_attach() has this logic at the end, which should have been
removed with the blamed patch (which is now being reverted):

	/* access to the child qdiscs is not needed in offload mode */
	if (FULL_OFFLOAD_IS_ENABLED(q->flags)) {
		kfree(q->qdiscs);
		q->qdiscs = NULL;
	}

because otherwise, we make use of q->qdiscs[] even after this array was
deallocated, namely in taprio_leaf(). Therefore, whenever one would try
to attach a valid child qdisc to a fully offloaded taprio root, one
would immediately dereference a NULL pointer.

$ tc qdisc replace dev eno0 handle 8001: parent root taprio \
	num_tc 8 \
	map 0 1 2 3 4 5 6 7 \
	queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
	max-sdu 0 0 0 0 0 200 0 0 \
	base-time 200 \
	sched-entry S 80 20000 \
	sched-entry S a0 20000 \
	sched-entry S 5f 60000 \
	flags 2
$ max_frame_size=1500
$ data_rate_kbps=20000
$ port_transmit_rate_kbps=1000000
$ idleslope=$data_rate_kbps
$ sendslope=$(($idleslope - $port_transmit_rate_kbps))
$ locredit=$(($max_frame_size * $sendslope / $port_transmit_rate_kbps))
$ hicredit=$(($max_frame_size * $idleslope / $port_transmit_rate_kbps))
$ tc qdisc replace dev eno0 parent 8001:7 cbs \
	idleslope $idleslope \
	sendslope $sendslope \
	hicredit $hicredit \
	locredit $locredit \
	offload 0

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
pc : taprio_leaf+0x28/0x40
lr : qdisc_leaf+0x3c/0x60
Call trace:
 taprio_leaf+0x28/0x40
 tc_modify_qdisc+0xf0/0x72c
 rtnetlink_rcv_msg+0x12c/0x390
 netlink_rcv_skb+0x5c/0x130
 rtnetlink_rcv+0x1c/0x2c

The solution is not as obvious as the problem. The code which deallocates
q->qdiscs[] is in fact copied and pasted from mqprio, which also
deallocates the array in mqprio_attach() and never uses it afterwards.

Therefore, the identical cleanup logic of priv->qdiscs[] that
mqprio_destroy() has is deceptive because it will never take place at
qdisc_destroy() time, but just at raw ops->destroy() time (otherwise
said, priv->qdiscs[] do not last for the entire lifetime of the mqprio
root), but rather, this is just the twisted way in which the Qdisc API
understands error path cleanup should be done (Qdisc_ops :: destroy() is
called even when Qdisc_ops :: init() never succeeded).

Side note, in fact this is also what the comment in mqprio_init() says:

	/* pre-allocate qdisc, attachment can't fail */

Or reworded, mqprio's priv->qdiscs[] scheme is only meant to serve as
data passing between Qdisc_ops :: init() and Qdisc_ops :: attach().

[ this comment was also copied and pasted into the initial taprio
  commit, even though taprio_attach() came way later ]

The problem is that taprio also makes extensive use of the q->qdiscs[]
array in the software fast path (taprio_enqueue() and taprio_dequeue()),
but it does not keep a reference of its own on q->qdiscs[i] (you'd think
that since it creates these Qdiscs, it holds the reference, but nope,
this is not completely true).

To understand the difference between taprio_destroy() and mqprio_destroy()
one must look before commit 13511704f8d7 ("net: taprio offload: enforce
qdisc to netdev queue mapping"), because that just muddied the waters.

In the "original" taprio design, taprio always attached itself (the root
Qdisc) to all netdev TX queues, so that dev_qdisc_enqueue() would go
through taprio_enqueue().

It also called qdisc_refcount_inc() on itself for as many times as there
were netdev TX queues, in order to counter-balance what tc_get_qdisc()
does when destroying a Qdisc (simplified for brevity below):

	if (n->nlmsg_type == RTM_DELQDISC)
		err = qdisc_graft(dev, parent=NULL, new=NULL, q, extack);

qdisc_graft(where "new" is NULL so this deletes the Qdisc):

	for (i = 0; i < num_q; i++) {
		struct netdev_queue *dev_queue;

		dev_queue = netdev_get_tx_queue(dev, i);

		old = dev_graft_qdisc(dev_queue, new);
		if (new && i > 0)
			qdisc_refcount_inc(new);

		qdisc_put(old);
		~~~~~~~~~~~~~~
		this decrements taprio's refcount once for each TX queue
	}

	notify_and_destroy(net, skb, n, classid,
			   rtnl_dereference(dev->qdisc), new);
			   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
			   and this finally decrements it to zero,
			   making qdisc_put() call qdisc_destroy()

The q->qdiscs[] created using qdisc_create_dflt() (or their
replacements, if taprio_graft() was ever to get called) were then
privately freed by taprio_destroy().

This is still what is happening after commit 13511704f8d7 ("net: taprio
offload: enforce qdisc to netdev queue mapping"), but only for software
mode.

In full offload mode, the per-txq "qdisc_put(old)" calls from
qdisc_graft() now deallocate the child Qdiscs rather than decrement
taprio's refcount. So when notify_and_destroy(taprio) finally calls
taprio_destroy(), the difference is that the child Qdiscs were already
deallocated.

And this is exactly why the taprio_attach() comment "access to the child
qdiscs is not needed in offload mode" is deceptive too. Not only the
q->qdiscs[] array is not needed, but it is also necessary to get rid of
it as soon as possible, because otherwise, we will also call qdisc_put()
on the child Qdiscs in qdisc_destroy() -> taprio_destroy(), and this
will cause a nasty use-after-free/refcount-saturate/whatever.

In short, the problem is that since the blamed commit, taprio_leaf()
needs q->qdiscs[] to not be freed by taprio_attach(), while qdisc_destroy()
-> taprio_destroy() does need q->qdiscs[] to be freed by taprio_attach()
for full offload. Fixing one problem triggers the other.

All of this can be solved by making taprio keep its q->qdiscs[i] with a
refcount elevated at 2 (in offloaded mode where they are attached to the
netdev TX queues), both in taprio_attach() and in taprio_graft(). The
generic qdisc_graft() would just decrement the child qdiscs' refcounts
to 1, and taprio_destroy() would give them the final coup de grace.

However the rabbit hole of changes is getting quite deep, and the
complexity increases. The blamed commit was supposed to be a bug fix in
the first place, and the bug it addressed is not so significant so as to
justify further rework in stable trees. So I'd rather just revert it.
I don't know enough about multi-queue Qdisc design to make a proper
judgement right now regarding what is/isn't idiomatic use of Qdisc
concepts in taprio. I will try to study the problem more and come with a
different solution in net-next.

Fixes: 1461d212ab27 ("net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs")
Reported-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Reported-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20221004220100.1650558-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-25 11:55:04 +01:00
..
act_api.c net/sched: act_api: Notify user space if any actions were flushed before error 2022-07-07 17:52:18 +02:00
act_bpf.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_connmark.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_csum.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_ct.c netfilter: conntrack: Fix data-races around ct mark 2022-12-02 17:40:00 +01:00
act_ctinfo.c net/sched: act_ctinfo: use percpu stats 2023-02-22 12:55:59 +01:00
act_gact.c net_sched: defer tcf_idr_insert() in tcf_action_init_1() 2020-09-24 19:46:21 -07:00
act_gate.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_ife.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_ipt.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_meta_mark.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
act_meta_skbprio.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
act_meta_skbtcindex.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
act_mirred.c net: sched: act_mirred: drop dst for the direction from egress to ingress 2021-11-26 10:39:16 +01:00
act_mpls.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_nat.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_pedit.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_police.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_sample.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_simple.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_skbedit.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_skbmod.c flow_offload: fill flags to action structure 2023-02-22 12:55:59 +01:00
act_tunnel_key.c net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels 2020-10-20 21:10:41 -07:00
act_vlan.c net/sched: act_vlan: Fix modify to allow 0 2021-07-14 16:56:19 +02:00
cls_api.c net: sched: fix possible refcount leak in tc_new_tfilter() 2022-09-28 11:10:37 +02:00
cls_basic.c net_sched: fix ops->bind_class() implementations 2020-01-27 10:51:43 +01:00
cls_bpf.c net_sched: fix ops->bind_class() implementations 2020-01-27 10:51:43 +01:00
cls_cgroup.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
cls_flow.c Remove uninitialized_var() macro for v5.9-rc1 2020-08-04 13:49:43 -07:00
cls_flower.c net/sched: flower: fix parsing of ethertype following VLAN header 2022-04-20 09:23:11 +02:00
cls_fw.c net_sched: fix ops->bind_class() implementations 2020-01-27 10:51:43 +01:00
cls_matchall.c net: qos offload add flow status with dropped count 2020-06-19 12:53:30 -07:00
cls_route.c net_sched: cls_route: disallow handle of 0 2022-08-21 15:16:26 +02:00
cls_rsvp6.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
cls_rsvp.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
cls_rsvp.h cls_rsvp: fix rsvp_policy 2020-02-01 12:25:06 -08:00
cls_tcindex.c net/sched: tcindex: search key must be 16 bits 2023-02-22 12:55:59 +01:00
cls_u32.c net/sched: cls_u32: fix possible leak in u32_init_knode() 2022-04-27 13:53:50 +02:00
em_canid.c net: sched: kerneldoc fixes 2020-07-13 17:20:40 -07:00
em_cmp.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
em_ipset.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_ipt.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_meta.c sched: consistently handle layer3 header accesses in the presence of VLANs 2020-07-03 14:34:53 -07:00
em_nbyte.c net: sched: Replace zero-length array with flexible-array member 2020-02-29 21:27:02 -08:00
em_text.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
em_u32.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
ematch.c net_sched: reject TCF_EM_SIMPLE case for complex ematch module 2023-01-14 10:16:12 +01:00
Kconfig net: sched: allow act_ct to be built without NF_NAT 2022-12-02 17:40:01 +01:00
Makefile net: qos: introduce a gate control flow action 2020-05-01 16:08:19 -07:00
sch_api.c net: sched: disallow noqueue for qdisc classes 2023-01-14 10:16:52 +01:00
sch_atm.c net: sched: atm: dont intepret cls results when asked to drop 2023-01-14 10:16:47 +01:00
sch_blackhole.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_cake.c net: sched: cake: fix null pointer access issue when cake_init() fails 2022-10-30 09:41:17 +01:00
sch_cbq.c net: sched: cbq: dont intepret cls results when asked to drop 2023-01-14 10:16:47 +01:00
sch_cbs.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_choke.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_codel.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_drr.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_dsmark.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_etf.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_ets.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_fifo.c net_sched: fix NULL deref in fifo_set_limit() 2021-10-13 10:04:26 +02:00
sch_fq_codel.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_fq_pie.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_fq.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_generic.c net/sched: fix netdevice reference leaks in attach_default_qdiscs() 2022-09-08 11:11:36 +02:00
sch_gred.c net: sched: validate stab values 2021-03-30 14:31:57 +02:00
sch_hfsc.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_hhf.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_htb.c net: sched: sch: Fix off by one in htb_activate_prios() 2023-02-22 12:56:00 +01:00
sch_ingress.c net: sched: Pass ingress block to tcf_classify_ingress 2020-02-19 17:49:48 -08:00
sch_mq.c net: sched: update default qdisc visibility after Tx queue cnt changes 2021-11-18 14:03:53 +01:00
sch_mqprio.c net: sched: update default qdisc visibility after Tx queue cnt changes 2021-11-18 14:03:53 +01:00
sch_multiq.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_netem.c net/sched: sch_netem: Fix arithmetic in netem_dump() for 32-bit platforms 2022-06-29 08:59:47 +02:00
sch_pie.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_plug.c Revert "net: sched: Pass root lock to Qdisc_ops.enqueue" 2020-07-16 16:48:34 -07:00
sch_prio.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_qfq.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_red.c net: sched: Fix use after free in red_enqueue() 2022-11-10 18:14:18 +01:00
sch_sfb.c net: sched: sfb: fix null pointer access issue when sfb_init() fails 2022-10-30 09:41:17 +01:00
sch_sfq.c net: sched: validate stab values 2021-03-30 14:31:57 +02:00
sch_skbprio.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_taprio.c Revert "net/sched: taprio: make qdisc_leaf() see the per-netdev-queue pfifo child qdiscs" 2023-02-25 11:55:04 +01:00
sch_tbf.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00
sch_teql.c net: sched: delete duplicate cleanup of backlog and qlen 2022-10-30 09:41:17 +01:00