Commit Graph

5974 Commits

Author SHA1 Message Date
Greg Kroah-Hartman
c7f89f1b6b This is the 5.4.249 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmSb7V4ACgkQONu9yGCS
 aT5vLxAA0yhg7h210wyMLrPNgQHrIItxkvcosoAG04WziImnvTT84XYpvthKlQrZ
 jzLGwdrH8ggdZIq+jPblmGvfvpGuM7MjKw1F8tgmviMnMyfKziGO/kIEzkNPaHSt
 sRFuGniXx2Q/m2IVblhC8pqJG6SRgkBbNgg3by7SpTRSEHBjpxaOVxvGC53Bdlkb
 ep90ox3iVbA4Q45rGCn5UfJM22wEnUYbzRv04085fzWaPDEZyHi5S6a3rHepVbrq
 7ElDQgUgHKlLm7rd1ngB8Ac+EdfavVcPok789pbEmQwf6jsAetl43yPUSEE6xFXb
 5FZAA7uUUa+E7P+140+iWBCZwQX9g+WglEkOxJV8gOMtWoiFZjpPcJxyWnvz/7ch
 XFz88WW/Ub4+bpg62TJ2F3dboeF0x1rN5kB8/ylb+Gf9vACT2gPLDbFaeG24DZEr
 s1hdsRx1Q3m8ffOYbsuTTn3bfGv8TfycV4Cwy+v+QPwJF/WPdMUnIDRY7VgWJ6fO
 scRdhkgMer9MLDrcSwxgS3tyn6JObQMp5A40H1Yb6ZVwN+q2BRC/B4Gqi6BmUNKr
 uU0BRMeyExyyQfKYCgvcf0M23qUf5L4PDpk1MX38pU+AHm8rPHlE36/pNFG4PG0g
 p6vBTlKzYeHKh12VAdPJjiWICloaz2ixf3K85xJ+vH56jXfjbSY=
 =3Pqk
 -----END PGP SIGNATURE-----

Merge 5.4.249 into android11-5.4-lts

Changes in 5.4.249
	nilfs2: reject devices with insufficient block count
	mm: rewrite wait_on_page_bit_common() logic
	list: add "list_del_init_careful()" to go with "list_empty_careful()"
	epoll: ep_autoremove_wake_function should use list_del_init_careful
	tracing: Add tracing_reset_all_online_cpus_unlocked() function
	x86/purgatory: remove PGO flags
	tick/common: Align tick period during sched_timer setup
	media: dvbdev: Fix memleak in dvb_register_device
	media: dvbdev: fix error logic at dvb_register_device()
	media: dvb-core: Fix use-after-free due to race at dvb_register_device()
	nilfs2: fix buffer corruption due to concurrent device reads
	Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs
	PCI: hv: Fix a race condition bug in hv_pci_query_relations()
	cgroup: Do not corrupt task iteration when rebinding subsystem
	mmc: meson-gx: remove redundant mmc_request_done() call from irq context
	ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN
	writeback: fix dereferencing NULL mapping->host on writeback_page_template
	nilfs2: prevent general protection fault in nilfs_clear_dirty_page()
	cifs: Clean up DFS referral cache
	cifs: Get rid of kstrdup_const()'d paths
	cifs: Introduce helpers for finding TCP connection
	cifs: Merge is_path_valid() into get_normalized_path()
	cifs: Fix potential deadlock when updating vol in cifs_reconnect()
	x86/mm: Avoid using set_pgd() outside of real PGD pages
	rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer()
	ieee802154: hwsim: Fix possible memory leaks
	xfrm: Linearize the skb after offloading if needed.
	net: qca_spi: Avoid high load if QCA7000 is not available
	mmc: mtk-sd: fix deferred probing
	mmc: mvsdio: convert to devm_platform_ioremap_resource
	mmc: mvsdio: fix deferred probing
	mmc: omap: fix deferred probing
	mmc: omap_hsmmc: fix deferred probing
	mmc: sdhci-acpi: fix deferred probing
	mmc: sh_mmcif: fix deferred probing
	mmc: usdhi60rol0: fix deferred probing
	ipvs: align inner_mac_header for encapsulation
	net: dsa: mt7530: fix trapping frames on non-MT7621 SoC MT7530 switch
	be2net: Extend xmit workaround to BE3 chip
	netfilter: nf_tables: disallow element updates of bound anonymous sets
	netfilter: nfnetlink_osf: fix module autoload
	Revert "net: phy: dp83867: perform soft reset and retain established link"
	sch_netem: acquire qdisc lock in netem_change()
	scsi: target: iscsi: Prevent login threads from racing between each other
	HID: wacom: Add error check to wacom_parse_and_register()
	arm64: Add missing Set/Way CMO encodings
	media: cec: core: don't set last_initiator if tx in progress
	nfcsim.c: Fix error checking for debugfs_create_dir
	usb: gadget: udc: fix NULL dereference in remove()
	s390/cio: unregister device when the only path is gone
	ASoC: nau8824: Add quirk to active-high jack-detect
	ARM: dts: Fix erroneous ADS touchscreen polarities
	drm/exynos: vidi: fix a wrong error return
	drm/exynos: fix race condition UAF in exynos_g2d_exec_ioctl
	drm/radeon: fix race condition UAF in radeon_gem_set_domain_ioctl
	x86/apic: Fix kernel panic when booting with intremap=off and x2apic_phys
	i2c: imx-lpi2c: fix type char overflow issue when calculating the clock cycle
	mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)
	mm: make wait_on_page_writeback() wait for multiple pending writebacks
	xfs: verify buffer contents when we skip log replay
	Linux 5.4.249

Change-Id: I3f7cf3804fddac70b4c1accef1c7374b184b1ea3
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-28 09:54:38 +00:00
Darrick J. Wong
c874390551 xfs: verify buffer contents when we skip log replay
commit 22ed903eee23a5b174e240f1cdfa9acf393a5210 upstream.

syzbot detected a crash during log recovery:

XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200.
XFS (loop0): Starting recovery (logdev: internal)
==================================================================
BUG: KASAN: slab-out-of-bounds in xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
Read of size 8 at addr ffff88807e89f258 by task syz-executor132/5074

CPU: 0 PID: 5074 Comm: syz-executor132 Not tainted 6.2.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
 print_address_description+0x74/0x340 mm/kasan/report.c:306
 print_report+0x107/0x1f0 mm/kasan/report.c:417
 kasan_report+0xcd/0x100 mm/kasan/report.c:517
 xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
 xfs_btree_lookup+0x346/0x12c0 fs/xfs/libxfs/xfs_btree.c:1913
 xfs_btree_simple_query_range+0xde/0x6a0 fs/xfs/libxfs/xfs_btree.c:4713
 xfs_btree_query_range+0x2db/0x380 fs/xfs/libxfs/xfs_btree.c:4953
 xfs_refcount_recover_cow_leftovers+0x2d1/0xa60 fs/xfs/libxfs/xfs_refcount.c:1946
 xfs_reflink_recover_cow+0xab/0x1b0 fs/xfs/xfs_reflink.c:930
 xlog_recover_finish+0x824/0x920 fs/xfs/xfs_log_recover.c:3493
 xfs_log_mount_finish+0x1ec/0x3d0 fs/xfs/xfs_log.c:829
 xfs_mountfs+0x146a/0x1ef0 fs/xfs/xfs_mount.c:933
 xfs_fs_fill_super+0xf95/0x11f0 fs/xfs/xfs_super.c:1666
 get_tree_bdev+0x400/0x620 fs/super.c:1282
 vfs_get_tree+0x88/0x270 fs/super.c:1489
 do_new_mount+0x289/0xad0 fs/namespace.c:3145
 do_mount fs/namespace.c:3488 [inline]
 __do_sys_mount fs/namespace.c:3697 [inline]
 __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f89fa3f4aca
Code: 83 c4 08 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fffd5fb5ef8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00646975756f6e2c RCX: 00007f89fa3f4aca
RDX: 0000000020000100 RSI: 0000000020009640 RDI: 00007fffd5fb5f10
RBP: 00007fffd5fb5f10 R08: 00007fffd5fb5f50 R09: 000000000000970d
R10: 0000000000200800 R11: 0000000000000206 R12: 0000000000000004
R13: 0000555556c6b2c0 R14: 0000000000200800 R15: 00007fffd5fb5f50
 </TASK>

The fuzzed image contains an AGF with an obviously garbage
agf_refcount_level value of 32, and a dirty log with a buffer log item
for that AGF.  The ondisk AGF has a higher LSN than the recovered log
item.  xlog_recover_buf_commit_pass2 reads the buffer, compares the
LSNs, and decides to skip replay because the ondisk buffer appears to be
newer.

Unfortunately, the ondisk buffer is corrupt, but recovery just read the
buffer with no buffer ops specified:

	error = xfs_buf_read(mp->m_ddev_targp, buf_f->blf_blkno,
			buf_f->blf_len, buf_flags, &bp, NULL);

Skipping the buffer leaves its contents in memory unverified.  This sets
us up for a kernel crash because xfs_refcount_recover_cow_leftovers
reads the buffer (which is still around in XBF_DONE state, so no read
verification) and creates a refcountbt cursor of height 32.  This is
impossible so we run off the end of the cursor object and crash.

Fix this by invoking the verifier on all skipped buffers and aborting
log recovery if the ondisk buffer is corrupt.  It might be smarter to
force replay the log item atop the buffer and then see if it'll pass the
write verifier (like ext4 does) but for now let's go with the
conservative option where we stop immediately.

Link: https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994e
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-28 10:18:42 +02:00
Greg Kroah-Hartman
6d6982b563 Merge 5.4.246 into android11-5.4-lts
Changes in 5.4.246
	RDMA/efa: Fix unsupported page sizes in device
	RDMA/bnxt_re: Enable SRIOV VF support on Broadcom's 57500 adapter series
	RDMA/bnxt_re: Refactor queue pair creation code
	RDMA/bnxt_re: Fix return value of bnxt_re_process_raw_qp_pkt_rx
	iommu/rockchip: Fix unwind goto issue
	iommu/amd: Don't block updates to GATag if guest mode is on
	dmaengine: pl330: rename _start to prevent build error
	net/mlx5: fw_tracer, Fix event handling
	netrom: fix info-leak in nr_write_internal()
	af_packet: Fix data-races of pkt_sk(sk)->num.
	amd-xgbe: fix the false linkup in xgbe_phy_status
	mtd: rawnand: ingenic: fix empty stub helper definitions
	af_packet: do not use READ_ONCE() in packet_bind()
	tcp: deny tcp_disconnect() when threads are waiting
	tcp: Return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if user_mss set
	net/sched: sch_ingress: Only create under TC_H_INGRESS
	net/sched: sch_clsact: Only create under TC_H_CLSACT
	net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs
	net/sched: Prohibit regrafting ingress or clsact Qdiscs
	net: sched: fix NULL pointer dereference in mq_attach
	ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use
	net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report
	udp6: Fix race condition in udp6_sendmsg & connect
	net/sched: flower: fix possible OOB write in fl_set_geneve_opt()
	net: dsa: mv88e6xxx: Increase wait after reset deactivation
	mtd: rawnand: marvell: ensure timing values are written
	mtd: rawnand: marvell: don't set the NAND frequency select
	watchdog: menz069_wdt: fix watchdog initialisation
	mailbox: mailbox-test: Fix potential double-free in mbox_test_message_write()
	ARM: 9295/1: unwind:fix unwind abort for uleb128 case
	media: rcar-vin: Select correct interrupt mode for V4L2_FIELD_ALTERNATE
	fbdev: modedb: Add 1920x1080 at 60 Hz video mode
	fbdev: stifb: Fix info entry in sti_struct on error path
	nbd: Fix debugfs_create_dir error checking
	ASoC: dwc: limit the number of overrun messages
	xfrm: Check if_id in inbound policy/secpath match
	ASoC: ssm2602: Add workaround for playback distortions
	media: dvb_demux: fix a bug for the continuity counter
	media: dvb-usb: az6027: fix three null-ptr-deref in az6027_i2c_xfer()
	media: dvb-usb-v2: ec168: fix null-ptr-deref in ec168_i2c_xfer()
	media: dvb-usb-v2: ce6230: fix null-ptr-deref in ce6230_i2c_master_xfer()
	media: dvb-usb-v2: rtl28xxu: fix null-ptr-deref in rtl28xxu_i2c_xfer
	media: dvb-usb: digitv: fix null-ptr-deref in digitv_i2c_xfer()
	media: dvb-usb: dw2102: fix uninit-value in su3000_read_mac_address
	media: netup_unidvb: fix irq init by register it at the end of probe
	media: dvb_ca_en50221: fix a size write bug
	media: ttusb-dec: fix memory leak in ttusb_dec_exit_dvb()
	media: mn88443x: fix !CONFIG_OF error by drop of_match_ptr from ID table
	media: dvb-core: Fix use-after-free due on race condition at dvb_net
	media: dvb-core: Fix kernel WARNING for blocking operation in wait_event*()
	media: dvb-core: Fix use-after-free due to race condition at dvb_ca_en50221
	wifi: rtl8xxxu: fix authentication timeout due to incorrect RCR value
	ARM: dts: stm32: add pin map for CAN controller on stm32f7
	arm64/mm: mark private VM_FAULT_X defines as vm_fault_t
	scsi: core: Decrease scsi_device's iorequest_cnt if dispatch failed
	wifi: b43: fix incorrect __packed annotation
	netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT
	ALSA: oss: avoid missing-prototype warnings
	atm: hide unused procfs functions
	mailbox: mailbox-test: fix a locking issue in mbox_test_message_write()
	iio: adc: mxs-lradc: fix the order of two cleanup operations
	HID: google: add jewel USB id
	HID: wacom: avoid integer overflow in wacom_intuos_inout()
	iio: light: vcnl4035: fixed chip ID check
	iio: dac: mcp4725: Fix i2c_master_send() return value handling
	iio: dac: build ad5758 driver when AD5758 is selected
	net: usb: qmi_wwan: Set DTR quirk for BroadMobi BM818
	usb: gadget: f_fs: Add unbind event before functionfs_unbind
	misc: fastrpc: return -EPIPE to invocations on device removal
	misc: fastrpc: reject new invocations during device removal
	scsi: stex: Fix gcc 13 warnings
	ata: libata-scsi: Use correct device no in ata_find_dev()
	flow_dissector: work around stack frame size warning
	x86/boot: Wrap literal addresses in absolute_pointer()
	ACPI: thermal: drop an always true check
	gcc-12: disable '-Wdangling-pointer' warning for now
	eth: sun: cassini: remove dead code
	kernel/extable.c: use address-of operator on section symbols
	treewide: Remove uninitialized_var() usage
	lib/dynamic_debug.c: use address-of operator on section symbols
	wifi: rtlwifi: remove always-true condition pointed out by GCC 12
	mmc: vub300: fix invalid response handling
	tty: serial: fsl_lpuart: use UARTCTRL_TXINV to send break instead of UARTCTRL_SBK
	selinux: don't use make's grouped targets feature yet
	tracing/probe: trace_probe_primary_from_call(): checked list_first_entry
	ext4: add EA_INODE checking to ext4_iget()
	ext4: set lockdep subclass for the ea_inode in ext4_xattr_inode_cache_find()
	ext4: disallow ea_inodes with extended attributes
	ext4: add lockdep annotations for i_data_sem for ea_inode's
	fbcon: Fix null-ptr-deref in soft_cursor
	test_firmware: fix the memory leak of the allocated firmware buffer
	regmap: Account for register length when chunking
	scsi: dpt_i2o: Remove broken pass-through ioctl (I2OUSERCMD)
	scsi: dpt_i2o: Do not process completions with invalid addresses
	RDMA/bnxt_re: Remove set but not used variable 'dev_attr'
	RDMA/bnxt_re: Remove the qp from list only if the qp destroy succeeds
	drm/edid: Fix uninitialized variable in drm_cvt_modes()
	wifi: rtlwifi: 8192de: correct checking of IQK reload
	drm/edid: fix objtool warning in drm_cvt_modes()
	Linux 5.4.246

Change-Id: I8721e40543af31c56dbbd47910dd3b474e3a79ab
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-06-20 19:13:58 +00:00
Kees Cook
0638dcc7e7 treewide: Remove uninitialized_var() usage
commit 3f649ab728cda8038259d8f14492fe400fbab911 upstream.

Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-09 10:29:01 +02:00
Greg Kroah-Hartman
734577d21e This is the 5.4.242 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmRI7cMACgkQONu9yGCS
 aT4KYhAA11Pj8JrPzyk0LSUHlFmZ6LRUA1/0aAog2V4wIY0tSODHiYMqcqsgiay9
 SGg+eGlHU5+6FL36BGHneg8iLVwX/HHERPdRrOrTJO8YUe/GumFkgz0h2O7+ILOa
 QAao9FB9OBCNaR0guQjTZk+8QQtgfj3X381mBbzoFIIprNUw9/P3lYowyVPSWjHZ
 Ax+ptL2oxfBTGylrU543q7Vfzvy56DSNeDnCNyAHHVAXpQU3YhsWqW5pXcxQhtDh
 aEt21YB0CVBs4BF2sdSpUSXlQYS9NsH814lK3T1pEYONRJ/Osxl9QpdZ077avhTA
 7pS9oKUMNfak8BoK4aRsSy51UrKBnb6IOflJVQiXYRft0GvF630qra47hwZLeZOf
 Sl1tIW0cwlzJcv7H5tsg2eWXsZBCfmvd2MtDHjdYD/fxED7NSvPXjCyO4Qs1UqB9
 vq2cTU0CiCBQPL0UzKeuhPXfSwpbDXYmCMpH2GfHVtlYA1g+zPOlLA9HfGBR0pzH
 X1tdZ9hjUYSBdD5PoaTJzMcg+cME+tPq2kN09dECM+cTdlB/s5cKj8OcNLIv6jVD
 PfFXjJCL1CJk93C4UaHlO7efdQbferEVViu6hmfhGIAYOd3S400/5p71b22rfViH
 9CwkhhlafZ25/AF2NCfhvK5gjRTFH+gOtw0JmG2AQ4f1z44joWw=
 =4sA8
 -----END PGP SIGNATURE-----

Merge 5.4.242 into android11-5.4-lts

Changes in 5.4.242
	ARM: dts: rockchip: fix a typo error for rk3288 spdif node
	arm64: dts: meson-g12-common: specify full DMC range
	netfilter: br_netfilter: fix recent physdev match breakage
	regulator: fan53555: Explicitly include bits header
	net: sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg
	virtio_net: bugfix overflow inside xdp_linearize_page()
	netfilter: nf_tables: fix ifdef to also consider nf_tables=m
	i40e: fix accessing vsi->active_filters without holding lock
	i40e: fix i40e_setup_misc_vector() error handling
	mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next()
	bpf: Fix incorrect verifier pruning due to missing register precision taints
	e1000e: Disable TSO on i219-LM card to increase speed
	f2fs: Fix f2fs_truncate_partial_nodes ftrace event
	Input: i8042 - add quirk for Fujitsu Lifebook A574/H
	selftests: sigaltstack: fix -Wuninitialized
	scsi: megaraid_sas: Fix fw_crash_buffer_show()
	scsi: core: Improve scsi_vpd_inquiry() checks
	net: dsa: b53: mmap: add phy ops
	s390/ptrace: fix PTRACE_GET_LAST_BREAK error handling
	nvme-tcp: fix a possible UAF when failing to allocate an io queue
	xen/netback: use same error messages for same errors
	iio: light: tsl2772: fix reading proximity-diodes from device tree
	nilfs2: initialize unused bytes in segment summary blocks
	memstick: fix memory leak if card device is never registered
	mmc: sdhci_am654: Set HIGH_SPEED_ENA for SDR12 and SDR25
	MIPS: Define RUNTIME_DISCARD_EXIT in LD script
	x86/purgatory: Don't generate debug info for purgatory.ro
	Revert "ext4: fix use-after-free in ext4_xattr_set_entry"
	ext4: remove duplicate definition of ext4_xattr_ibody_inline_set()
	ext4: fix use-after-free in ext4_xattr_set_entry
	udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM).
	tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct().
	inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().
	dccp: Call inet6_destroy_sock() via sk->sk_destruct().
	sctp: Call inet6_destroy_sock() via sk->sk_destruct().
	xfs: fix forkoff miscalculation related to XFS_LITINO(mp)
	pwm: meson: Explicitly set .polarity in .get_state()
	iio: adc: at91-sama5d2_adc: fix an error code in at91_adc_allocate_trigger()
	ASN.1: Fix check for strdup() success
	Linux 5.4.242

Change-Id: Id72de17865bf93d2a6d39009fbc46cb3f8f7b6ca
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-26 13:21:48 +00:00
Gao Xiang
7f2b8046da xfs: fix forkoff miscalculation related to XFS_LITINO(mp)
commit ada49d64fb3538144192181db05de17e2ffc3551 upstream.

Currently, commit e9e2eae89ddb dropped a (int) decoration from
XFS_LITINO(mp), and since sizeof() expression is also involved,
the result of XFS_LITINO(mp) is simply as the size_t type
(commonly unsigned long).

Considering the expression in xfs_attr_shortform_bytesfit():
  offset = (XFS_LITINO(mp) - bytes) >> 3;
let "bytes" be (int)340, and
    "XFS_LITINO(mp)" be (unsigned long)336.

on 64-bit platform, the expression is
  offset = ((unsigned long)336 - (int)340) >> 3 =
           (int)(0xfffffffffffffffcUL >> 3) = -1

but on 32-bit platform, the expression is
  offset = ((unsigned long)336 - (int)340) >> 3 =
           (int)(0xfffffffcUL >> 3) = 0x1fffffff
instead.

so offset becomes a large positive number on 32-bit platform, and
cause xfs_attr_shortform_bytesfit() returns maxforkoff rather than 0.

Therefore, one result is
  "ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork));"

assertion failure in xfs_idata_realloc(), which was also the root
cause of the original bugreport from Dennis, see:
   https://bugzilla.redhat.com/show_bug.cgi?id=1894177

And it can also be manually triggered with the following commands:
  $ touch a;
  $ setfattr -n user.0 -v "`seq 0 80`" a;
  $ setfattr -n user.1 -v "`seq 0 80`" a

on 32-bit platform.

Fix the case in xfs_attr_shortform_bytesfit() by bailing out
"XFS_LITINO(mp) < bytes" in advance suggested by Eric and a misleading
comment together with this bugfix suggested by Darrick. It seems the
other users of XFS_LITINO(mp) are not impacted.

Fixes: e9e2eae89ddb ("xfs: only check the superblock version for dinode size calculation")
Cc: <stable@vger.kernel.org> # 5.7+
Reported-and-tested-by: Dennis Gilmore <dgilmore@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-26 11:24:06 +02:00
Greg Kroah-Hartman
da8b283c08 This is the 5.4.241 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmRBDvQACgkQONu9yGCS
 aT58+g/+PmTvsCcjSlTvpHw+kFsIu2F2pIjlJNlX+YHSzEYMG/G2mn7XD1rUE7JB
 mQQ7hFXEeQZTuN4zsFnviotzn1xd6bV26Od598m+yFZ3Fty07NXy3OD1/08ZdP0G
 k+U9u4TDr47BV3WdokGWIdXAjnlvC0kwjc+EWznOmXquhW8hSrGvz0IUJQWgDue1
 IpOkuRs60T85dUHoPYEL8T23XRyhFhdeIrlx6Hqzlp1fGg7dSpfh0yYJmDLF6Y21
 CXO2WcADhMMszk+CFEXgfWVem5mheUuVsfmBUg34ZGzzNos0rPyHG+VmwZxnsPYE
 SVLVxLL649CzAuJdpXflTghdBUU/WKT/ZZ0aQddohPjij55My2oCV6/FpHXg7kuo
 ZTXN0ByyeAbV4DlY2+ooY6Vhr5lzQG1l15Ap9xj4ioQO8U3jeCCQ5wT9L01ig6a9
 /9U9wb6CYp3thrY94x1WapJF5utiIigbiS+rioRfAwHAzj5JiLfQvdH1nVgBNz+9
 DWCEGIiUONmHMRrt+X6Nnu8KkHWOkDFF2lisphXUsaY140gFG0+d3xnRArsU4ZDi
 j1zT0ErqigV6vzA39ic898EW8wkNqVCWPwYRLVRSBuPCDKK7SjnOuStGi7eIoYp2
 zI2NV9UUMm5Qxub5zUNktpWDFFn9knN3E7Gc4wAKWkHkWc/DJ+U=
 =ffeb
 -----END PGP SIGNATURE-----

Merge 5.4.241 into android11-5.4-lts

Changes in 5.4.241
	scsi: ses: Handle enclosure with just a primary component gracefully
	x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
	cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
	Revert "treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()"
	treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()
	smb3: fix problem with null cifs super block with previous patch
	pinctrl: amd: Use irqchip template
	pinctrl: amd: disable and mask interrupts on probe
	pinctrl: amd: Disable and mask interrupts on resume
	pwm: cros-ec: Explicitly set .polarity in .get_state()
	pwm: sprd: Explicitly set .polarity in .get_state()
	wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
	icmp: guard against too small mtu
	net: don't let netpoll invoke NAPI if in xmit context
	sctp: check send stream number after wait_for_sndbuf
	ipv6: Fix an uninit variable access bug in __ip6_make_skb()
	gpio: davinci: Add irq chip flag to skip set wake
	sunrpc: only free unix grouplist after RCU settles
	NFSD: callback request does not use correct credential for AUTH_SYS
	xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu
	USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
	usb: typec: altmodes/displayport: Fix configure initial pin assignment
	USB: serial: option: add Telit FE990 compositions
	USB: serial: option: add Quectel RM500U-CN modem
	iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
	iio: dac: cio-dac: Fix max DAC write value check for 12-bit
	tty: serial: sh-sci: Fix transmit end interrupt handler
	tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
	tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty
	nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
	nilfs2: fix sysfs interface lifetime
	ALSA: hda/realtek: Add quirk for Clevo X370SNW
	perf/core: Fix the same task check in perf_event_set_output
	ftrace: Mark get_lock_parent_ip() __always_inline
	can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
	tracing: Free error logs of tracing instances
	net_sched: prevent NULL dereference if default qdisc setup failed
	drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
	ring-buffer: Fix race while reader and writer are on the same page
	mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
	irqdomain: Look for existing mapping only once
	irqdomain: Refactor __irq_domain_alloc_irqs()
	irqdomain: Fix mapping-creation race
	Revert "pinctrl: amd: Disable and mask interrupts on resume"
	ALSA: emu10k1: fix capture interrupt handler unlinking
	ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
	ALSA: i2c/cs8427: fix iec958 mixer control deactivation
	ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex()
	ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
	Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
	Bluetooth: Fix race condition in hidp_session_thread
	btrfs: print checksum type and implementation at mount time
	btrfs: fix fast csum implementation detection
	mtdblock: tolerate corrected bit-flips
	mtd: rawnand: meson: fix bitmask for length in command word
	mtd: rawnand: stm32_fmc2: remove unsupported EDO mode
	9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
	niu: Fix missing unwind goto in niu_alloc_channels()
	qlcnic: check pci_reset_function result
	sctp: fix a potential overflow in sctp_ifwdtsn_skip
	RDMA/core: Fix GID entry ref leak when create_ah fails
	udp6: fix potential access to stale information
	net: macb: fix a memory corruption in extended buffer descriptor mode
	power: supply: cros_usbpd: reclassify "default case!" as debug
	i2c: imx-lpi2c: clean rx/tx buffers upon new message
	efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
	drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Book X90F
	verify_pefile: relax wrapper length check
	asymmetric_keys: log on fatal failures in PE/pkcs7
	ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
	mtd: ubi: wl: Fix a couple of kernel-doc issues
	ubi: Fix deadlock caused by recursively holding work_sem
	i2c: ocores: generate stop condition after timeout in polling mode
	watchdog: sbsa_wdog: Make sure the timeout programming is within the limits
	coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
	xfs: show the proper user quota options
	xfs: merge the projid fields in struct xfs_icdinode
	xfs: ensure that the inode uid/gid match values match the icdinode ones
	xfs: remove the icdinode di_uid/di_gid members
	xfs: remove the kuid/kgid conversion wrappers
	xfs: add a new xfs_sb_version_has_v3inode helper
	xfs: only check the superblock version for dinode size calculation
	xfs: simplify di_flags2 inheritance in xfs_ialloc
	xfs: simplify a check in xfs_ioctl_setattr_check_cowextsize
	xfs: remove the di_version field from struct icdinode
	xfs: fix up non-directory creation in SGID directories
	xfs: set inode size after creating symlink
	xfs: report corruption only as a regular error
	xfs: shut down the filesystem if we screw up quota reservation
	xfs: consider shutdown in bmapbt cursor delete assert
	xfs: don't reuse busy extents on extent trim
	xfs: force log and push AIL to clear pinned inodes when aborting mount
	Linux 5.4.241

Change-Id: I428eec45c4ac9796104683d40b7cb0d38d4c8015
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-04-21 13:34:52 +00:00
Darrick J. Wong
8795936437 xfs: force log and push AIL to clear pinned inodes when aborting mount
commit d336f7ebc65007f5831e2297e6f3383ae8dbf8ed upstream.

[ Slightly modify fs/xfs/xfs_mount.c to resolve merge conflicts ]

If we allocate quota inodes in the process of mounting a filesystem but
then decide to abort the mount, it's possible that the quota inodes are
sitting around pinned by the log.  Now that inode reclaim relies on the
AIL to flush inodes, we have to force the log and push the AIL in
between releasing the quota inodes and kicking off reclaim to tear down
all the incore inodes.  Do this by extracting the bits we need from the
unmount path and reusing them.  As an added bonus, failed writes during
a failed mount will not retry forever now.

This was originally found during a fuzz test of metadata directories
(xfs/1546), but the actual symptom was that reclaim hung up on the quota
inodes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:38 +02:00
Brian Foster
c76dd36875 xfs: don't reuse busy extents on extent trim
commit 06058bc40534530e617e5623775c53bb24f032cb upstream.

Freed extents are marked busy from the point the freeing transaction
commits until the associated CIL context is checkpointed to the log.
This prevents reuse and overwrite of recently freed blocks before
the changes are committed to disk, which can lead to corruption
after a crash. The exception to this rule is that metadata
allocation is allowed to reuse busy extents because metadata changes
are also logged.

As of commit 97d3ac75e5 ("xfs: exact busy extent tracking"), XFS
has allowed modification or complete invalidation of outstanding
busy extents for metadata allocations. This implementation assumes
that use of the associated extent is imminent, which is not always
the case. For example, the trimmed extent might not satisfy the
minimum length of the allocation request, or the allocation
algorithm might be involved in a search for the optimal result based
on locality.

generic/019 reproduces a corruption caused by this scenario. First,
a metadata block (usually a bmbt or symlink block) is freed from an
inode. A subsequent bmbt split on an unrelated inode attempts a near
mode allocation request that invalidates the busy block during the
search, but does not ultimately allocate it. Due to the busy state
invalidation, the block is no longer considered busy to subsequent
allocation. A direct I/O write request immediately allocates the
block and writes to it. Finally, the filesystem crashes while in a
state where the initial metadata block free had not committed to the
on-disk log. After recovery, the original metadata block is in its
original location as expected, but has been corrupted by the
aforementioned dio.

This demonstrates that it is fundamentally unsafe to modify busy
extent state for extents that are not guaranteed to be allocated.
This applies to pretty much all of the code paths that currently
trim busy extents for one reason or another. Therefore to address
this problem, drop the reuse mechanism from the busy extent trim
path. This code already knows how to return partial non-busy ranges
of the targeted free extent and higher level code tracks the busy
state of the allocation attempt. If a block allocation fails where
one or more candidate extents is busy, we force the log and retry
the allocation.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:38 +02:00
Brian Foster
4679b73a8e xfs: consider shutdown in bmapbt cursor delete assert
commit 1cd738b13ae9b29e03d6149f0246c61f76e81fcf upstream.

[ Slightly modify fs/xfs/libxfs/xfs_btree.c to resolve merge conflicts ]

The assert in xfs_btree_del_cursor() checks that the bmapbt block
allocation field has been handled correctly before the cursor is
freed. This field is used for accurate calculation of indirect block
reservation requirements (for delayed allocations), for example.
generic/019 reproduces a scenario where this assert fails because
the filesystem has shutdown while in the middle of a bmbt record
insertion. This occurs after a bmbt block has been allocated via the
cursor but before the higher level bmap function (i.e.
xfs_bmap_add_extent_hole_real()) completes and resets the field.

Update the assert to accommodate the transient state if the
filesystem has shutdown. While here, clean up the indentation and
comments in the function.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:38 +02:00
Darrick J. Wong
9355fd118b xfs: shut down the filesystem if we screw up quota reservation
commit 2a4bdfa8558ca2904dc17b83497dc82aa7fc05e9 upstream.

If we ever screw up the quota reservations enough to trip the
assertions, something's wrong with the quota code.  Shut down the
filesystem when this happens, because this is corruption.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:38 +02:00
Darrick J. Wong
48f75df5b3 xfs: report corruption only as a regular error
commit 6519f708cc355c4834edbe1885c8542c0e7ef907 uptream.

[ Slightly modify fs/xfs/xfs_linux.h to resolve merge conflicts ]

Redefine XFS_IS_CORRUPT so that it reports corruptions only via
xfs_corruption_report.  Since these are on-disk contents (and not checks
of internal state), we don't ever want to panic the kernel.  This also
amends the corruption report to recommend unmounting and running
xfs_repair.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:38 +02:00
Jeffrey Mitchell
3cce34ceb2 xfs: set inode size after creating symlink
commit 8aa921a95335d0a8c8e2be35a44467e7c91ec3e4 upstream.

When XFS creates a new symlink, it writes its size to disk but not to the
VFS inode. This causes i_size_read() to return 0 for that symlink until
it is re-read from disk, for example when the system is rebooted.

I found this inconsistency while protecting directories with eCryptFS.
The command "stat path/to/symlink/in/ecryptfs" will report "Size: 0" if
the symlink was created after the last reboot on an XFS root.

Call i_size_write() in xfs_symlink()

Signed-off-by: Jeffrey Mitchell <jeffrey.mitchell@starlab.io>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:38 +02:00
Christoph Hellwig
e76bd6da51 xfs: fix up non-directory creation in SGID directories
commit 01ea173e103edd5ec41acec65b9261b87e123fc2 upstream.

XFS always inherits the SGID bit if it is set on the parent inode, while
the generic inode_init_owner does not do this in a few cases where it can
create a possible security problem, see commit 0fa3ecd878
("Fix up non-directory creation in SGID directories") for details.

Switch XFS to use the generic helper for the normal path to fix this,
just keeping the simple field inheritance open coded for the case of the
non-sgid case with the bsdgrpid mount option.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:37 +02:00
Christoph Hellwig
ad6613c984 xfs: remove the di_version field from struct icdinode
commit 6471e9c5e7a109a952be8e3e80b8d9e262af239d upstream.

We know the version is 3 if on a v5 file system.   For earlier file
systems formats we always upgrade the remaining v1 inodes to v2 and
thus only use v2 inodes.  Use the xfs_sb_version_has_large_dinode
helper to check if we deal with small or large dinodes, and thus
remove the need for the di_version field in struct icdinode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:37 +02:00
Christoph Hellwig
ca4533c951 xfs: simplify a check in xfs_ioctl_setattr_check_cowextsize
commit 5e28aafe708ba3e388f92a7148093319d3521c2f upstream.

Only v5 file systems can have the reflink feature, and those will
always use the large dinode format.  Remove the extra check for the
inode version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:37 +02:00
Christoph Hellwig
e078b3de3e xfs: simplify di_flags2 inheritance in xfs_ialloc
commit b3d1d37544d8c98be610df0ed66c884ff18748d5 upstream.

di_flags2 is initialized to zero for v4 and earlier file systems.  This
means di_flags2 can only be non-zero for a v5 file systems, in which
case both the parent and child inodes can store the field.  Remove the
extra di_version check, and also remove the rather pointless local
di_flags2 variable while at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:37 +02:00
Christoph Hellwig
0c553917b6 xfs: only check the superblock version for dinode size calculation
commit e9e2eae89ddb658ea332295153fdca78c12c1e0d upstream.

The size of the dinode structure is only dependent on the file system
version, so instead of checking the individual inode version just use
the newly added xfs_sb_version_has_large_dinode helper, and simplify
various calling conventions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:37 +02:00
Christoph Hellwig
90aab52d06 xfs: add a new xfs_sb_version_has_v3inode helper
commit b81b79f4eda2ea98ae5695c0b6eb384c8d90b74d upstream.

Add a new wrapper to check if a file system supports the v3 inode format
with a larger dinode core.  Previously we used xfs_sb_version_hascrc for
that, which is technically correct but a little confusing to read.

Also move xfs_dinode_good_version next to xfs_sb_version_has_v3inode
so that we have one place that documents the superblock version to
inode version relationship.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:37 +02:00
Christoph Hellwig
edd36a57b4 xfs: remove the kuid/kgid conversion wrappers
commit ba8adad5d036733d240fa8a8f4d055f3d4490562 upstream.

Remove the XFS wrappers for converting from and to the kuid/kgid types.
Mostly this means switching to VFS i_{u,g}id_{read,write} helpers, but
in a few spots the calls to the conversion functions is open coded.
To match the use of sb->s_user_ns in the helpers and other file systems,
sb->s_user_ns is also used in the quota code.  The ACL code already does
the conversion in a grotty layering violation in the VFS xattr code,
so it keeps using init_user_ns for the identity mapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:37 +02:00
Christoph Hellwig
3ef81874f7 xfs: remove the icdinode di_uid/di_gid members
commit 542951592c99ff7b15c050954c051dd6dd6c0f97 upstream.

Use the Linux inode i_uid/i_gid members everywhere and just convert
from/to the scalar value when reading or writing the on-disk inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:37 +02:00
Christoph Hellwig
cc508a41ae xfs: ensure that the inode uid/gid match values match the icdinode ones
commit 3d8f2821502d0b60bac2789d0bea951fda61de0c upstream.

Instead of only synchronizing the uid/gid values in xfs_setup_inode,
ensure that they always match to prepare for removing the icdinode
fields.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:37 +02:00
Christoph Hellwig
7a9dc79771 xfs: merge the projid fields in struct xfs_icdinode
commit de7a866fd41b227b0aa6e9cbeb0dae221c12f542 upstream.

There is no point in splitting the fields like this in an purely
in-memory structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:37 +02:00
Kaixu Xia
4f3252e7e1 xfs: show the proper user quota options
commit 237d7887ae723af7d978e8b9a385fdff416f357b upstream.

The quota option 'usrquota' should be shown if both the XFS_UQUOTA_ACCT
and XFS_UQUOTA_ENFD flags are set. The option 'uqnoenforce' should be
shown when only the XFS_UQUOTA_ACCT flag is set. The current code logic
seems wrong, Fix it and show proper options.

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-20 12:07:37 +02:00
Greg Kroah-Hartman
abc4ede193 This is the 5.4.232 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmP2AZoACgkQONu9yGCS
 aT4Kog//cMOCPvc+9yam5NCZj76k9jzIfteKMZzvSyxjV/ShPGynLIwcR26vE4j1
 CtEB0aknuxgpqthfCahjjf51POhYLJD62H62UtTfxgIkWxnETd8F6y2xvuVXsds5
 mC0LUzQ9md6slgTTIobQF9ilIGAt/yKPOg89fUXNYzNsO2us46XZCmZOXg5MVwlI
 hXYQuVBze1VhWt40J8TYDFckjoQLUgH6lBawWHC8/r2MBBydzX1cZEyL2TXhDfFS
 7t9gWXKteAFE6GWfgAY6MrtqGx+X25Xe7qds4V8v6FgxR2MFeo94+k3DbhXRnjjY
 gA6czJBurGzhiXWo2E4laYlEMfsY0qkl17M49C/LwkJhZCSjF60b0Vo0dNfLLogZ
 cWsG6qcrfV8/js5h97kFSluWZ4VM7xTgcJQ/qtU/O8IprRQioCERjvm4Dl+/emXI
 ycFaiZOP3RvYdHxADIsItm46C7WzpzqZpjjs+9jHEaACrnOQfepGGFmgImMd9P8r
 kkU5KUtPQoAgSFfPz1tvJgyQiazONRAKtg1UprPOnLN0PsBQrE8ekCOk9lDoW60l
 t+G2lC0dJBYkcKC+4jHa9y18U7wz/eYdYE+K/u8kUENYFLSBfYxIqbXPxQZcq6aO
 TnyVr1n+Dd3HXtLX58+vDE2RUjosvCXctBGrE6Q56d8AKXh6FvM=
 =rk4j
 -----END PGP SIGNATURE-----

Merge 5.4.232 into android11-5.4-lts

Changes in 5.4.232
	firewire: fix memory leak for payload of request subaction to IEC 61883-1 FCP region
	bus: sunxi-rsb: Fix error handling in sunxi_rsb_init()
	ASoC: Intel: bytcr_rt5651: Drop reference count of ACPI device after use
	ALSA: hda/via: Avoid potential array out-of-bound in add_secret_dac_path()
	arm64: dts: imx8mm: Fix pad control for UART1_DTE_RX
	scsi: Revert "scsi: core: map PQ=1, PDT=other values to SCSI_SCAN_TARGET_PRESENT"
	WRITE is "data source", not destination...
	fix iov_iter_bvec() "direction" argument
	fix "direction" argument of iov_iter_kvec()
	netrom: Fix use-after-free caused by accept on already connected socket
	netfilter: br_netfilter: disable sabotage_in hook after first suppression
	squashfs: harden sanity check in squashfs_read_xattr_id_table
	net: phy: meson-gxl: Add generic dummy stubs for MMD register access
	can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate
	ata: libata: Fix sata_down_spd_limit() when no link speed is reported
	selftests: net: udpgso_bench_rx: Fix 'used uninitialized' compiler warning
	selftests: net: udpgso_bench_rx/tx: Stop when wrong CLI args are provided
	selftests: net: udpgso_bench: Fix racing bug between the rx/tx programs
	selftests: net: udpgso_bench_tx: Cater for pending datagrams zerocopy benchmarking
	virtio-net: Keep stop() to follow mirror sequence of open()
	net: openvswitch: fix flow memory leak in ovs_flow_cmd_new
	efi: fix potential NULL deref in efi_mem_reserve_persistent
	scsi: target: core: Fix warning on RT kernels
	scsi: iscsi_tcp: Fix UAF during login when accessing the shost ipaddress
	i2c: rk3x: fix a bunch of kernel-doc warnings
	net/x25: Fix to not accept on connected socket
	iio: adc: stm32-dfsdm: fill module aliases
	usb: dwc3: dwc3-qcom: Fix typo in the dwc3 vbus override API
	usb: dwc3: qcom: enable vbus override when in OTG dr-mode
	usb: gadget: f_fs: Fix unbalanced spinlock in __ffs_ep0_queue_wait
	vc_screen: move load of struct vc_data pointer in vcs_read() to avoid UAF
	Input: i8042 - move __initconst to fix code styling warning
	Input: i8042 - merge quirk tables
	Input: i8042 - add TUXEDO devices to i8042 quirk tables
	Input: i8042 - add Clevo PCX0DX to i8042 quirk table
	fbcon: Check font dimension limits
	watchdog: diag288_wdt: do not use stack buffers for hardware data
	watchdog: diag288_wdt: fix __diag288() inline assembly
	efi: Accept version 2 of memory attributes table
	iio: hid: fix the retval in accel_3d_capture_sample
	iio: adc: berlin2-adc: Add missing of_node_put() in error path
	iio:adc:twl6030: Enable measurements of VUSB, VBAT and others
	parisc: Fix return code of pdc_iodc_print()
	parisc: Wire up PTRACE_GETREGS/PTRACE_SETREGS for compat case
	riscv: disable generation of unwind tables
	mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps
	fpga: stratix10-soc: Fix return value check in s10_ops_write_init()
	mm/swapfile: add cond_resched() in get_swap_pages()
	Squashfs: fix handling and sanity checking of xattr_ids count
	nvmem: core: fix cell removal on error
	mm: swap: properly update readahead statistics in unuse_pte_range()
	xprtrdma: Fix regbuf data not freed in rpcrdma_req_create()
	serial: 8250_dma: Fix DMA Rx completion race
	serial: 8250_dma: Fix DMA Rx rearm race
	powerpc/imc-pmu: Revert nest_init_lock to being a mutex
	fbdev: smscufx: fix error handling code in ufx_usb_probe
	f2fs: fix to do sanity check on i_extra_isize in is_alive()
	wifi: brcmfmac: Check the count value of channel spec to prevent out-of-bounds reads
	iio:adc:twl6030: Enable measurement of VAC
	btrfs: limit device extents to the device size
	btrfs: zlib: zero-initialize zlib workspace
	ALSA: emux: Avoid potential array out-of-bound in snd_emux_xg_control()
	tracing: Fix poll() and select() do not work on per_cpu trace_pipe and trace_pipe_raw
	can: j1939: do not wait 250 ms if the same addr was already claimed
	IB/hfi1: Restore allocated resources on failed copyout
	IB/IPoIB: Fix legacy IPoIB due to wrong number of queues
	iommu: Add gfp parameter to iommu_ops::map
	RDMA/usnic: use iommu_map_atomic() under spin_lock()
	xfrm: fix bug with DSCP copy to v6 from v4 tunnel
	bonding: fix error checking in bond_debug_reregister()
	net: phy: meson-gxl: use MMD access dummy stubs for GXL, internal PHY
	ionic: clean interrupt before enabling queue to avoid credit race
	ice: Do not use WQ_MEM_RECLAIM flag for workqueue
	rds: rds_rm_zerocopy_callback() use list_first_entry()
	selftests: forwarding: lib: quote the sysctl values
	ALSA: pci: lx6464es: fix a debug loop
	pinctrl: aspeed: Fix confusing types in return value
	pinctrl: single: fix potential NULL dereference
	pinctrl: intel: Restore the pins that used to be in Direct IRQ mode
	net: USB: Fix wrong-direction WARNING in plusb.c
	usb: core: add quirk for Alcor Link AK9563 smartcard reader
	usb: typec: altmodes/displayport: Fix probe pin assign check
	ceph: flush cap releases when the session is flushed
	riscv: Fixup race condition on PG_dcache_clean in flush_icache_pte
	arm64: dts: meson-gx: Make mmc host controller interrupts level-sensitive
	arm64: dts: meson-g12-common: Make mmc host controller interrupts level-sensitive
	arm64: dts: meson-axg: Make mmc host controller interrupts level-sensitive
	nvme-pci: Move enumeration by class to be last in the table
	bpf: Always return target ifindex in bpf_fib_lookup
	migrate: hugetlb: check for hugetlb shared PMD in node migration
	selftests/bpf: Verify copy_register_state() preserves parent/live fields
	ASoC: cs42l56: fix DT probe
	tools/virtio: fix the vringh test for virtio ring changes
	net/rose: Fix to not accept on connected socket
	net: stmmac: do not stop RX_CLK in Rx LPI state for qcs404 SoC
	net: sched: sch: Bounds check priority
	s390/decompressor: specify __decompress() buf len to avoid overflow
	nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association
	aio: fix mremap after fork null-deref
	btrfs: free device in btrfs_close_devices for a single device filesystem
	netfilter: nft_tproxy: restrict to prerouting hook
	xfs: remove the xfs_efi_log_item_t typedef
	xfs: remove the xfs_efd_log_item_t typedef
	xfs: remove the xfs_inode_log_item_t typedef
	xfs: factor out a xfs_defer_create_intent helper
	xfs: merge the ->log_item defer op into ->create_intent
	xfs: merge the ->diff_items defer op into ->create_intent
	xfs: turn dfp_intent into a xfs_log_item
	xfs: refactor xfs_defer_finish_noroll
	xfs: log new intent items created as part of finishing recovered intent items
	xfs: fix finobt btree block recovery ordering
	xfs: proper replay of deferred ops queued during log recovery
	xfs: xfs_defer_capture should absorb remaining block reservations
	xfs: xfs_defer_capture should absorb remaining transaction reservation
	xfs: clean up bmap intent item recovery checking
	xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering
	xfs: fix an incore inode UAF in xfs_bui_recover
	xfs: change the order in which child and parent defer ops are finished
	xfs: periodically relog deferred intent items
	xfs: expose the log push threshold
	xfs: only relog deferred intent items if free space in the log gets low
	xfs: fix missing CoW blocks writeback conversion retry
	xfs: ensure inobt record walks always make forward progress
	xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks
	xfs: prevent UAF in xfs_log_item_in_current_chkpt
	xfs: sync lazy sb accounting on quiesce of read-only mounts
	Revert "ipv4: Fix incorrect route flushing when source address is deleted"
	ipv4: Fix incorrect route flushing when source address is deleted
	mmc: sdio: fix possible resource leaks in some error paths
	mmc: mmc_spi: fix error handling in mmc_spi_probe()
	ALSA: hda/conexant: add a new hda codec SN6180
	ALSA: hda/realtek - fixed wrong gpio assigned
	sched/psi: Fix use-after-free in ep_remove_wait_queue()
	hugetlb: check for undefined shift on 32 bit architectures
	Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."
	net: Fix unwanted sign extension in netdev_stats_to_stats64()
	revert "squashfs: harden sanity check in squashfs_read_xattr_id_table"
	ixgbe: allow to increase MTU to 3K with XDP enabled
	i40e: add double of VLAN header when computing the max MTU
	net: bgmac: fix BCM5358 support by setting correct flags
	sctp: sctp_sock_filter(): avoid list_entry() on possibly empty list
	dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions.
	net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path
	net: stmmac: fix order of dwmac5 FlexPPS parametrization sequence
	bnxt_en: Fix mqprio and XDP ring checking logic
	net: stmmac: Restrict warning on disabling DMA store and fwd mode
	net: mpls: fix stale pointer if allocation fails during device rename
	ixgbe: add double of VLAN header when computing the max MTU
	ipv6: Fix datagram socket connection with DSCP.
	ipv6: Fix tcp socket connection with DSCP.
	i40e: Add checking for null for nlmsg_find_attr()
	kvm: initialize all of the kvm_debugregs structure before sending it to userspace
	nilfs2: fix underflow in second superblock position calculations
	ASoC: SOF: Intel: hda-dai: fix possible stream_tag leak
	net: sched: sch: Fix off by one in htb_activate_prios()
	iommu/amd: Pass gfp flags to iommu_map_page() in amd_iommu_map()
	Linux 5.4.232

Change-Id: I607aaac0f8477eb9a0f059e0a9d2f5c037fb19fc
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-02-22 12:30:48 +00:00
Brian Foster
85eda80883 xfs: sync lazy sb accounting on quiesce of read-only mounts
commit 50d25484bebe94320c49dd1347d3330c7063bbdb upstream.

[ Modify xfs_log_unmount_write() to return zero when the log is in a read-only
state ]

xfs_log_sbcount() syncs the superblock specifically to accumulate
the in-core percpu superblock counters and commit them to disk. This
is required to maintain filesystem consistency across quiesce
(freeze, read-only mount/remount) or unmount when lazy superblock
accounting is enabled because individual transactions do not update
the superblock directly.

This mechanism works as expected for writable mounts, but
xfs_log_sbcount() skips the update for read-only mounts. Read-only
mounts otherwise still allow log recovery and write out an unmount
record during log quiesce. If a read-only mount performs log
recovery, it can modify the in-core superblock counters and write an
unmount record when the filesystem unmounts without ever syncing the
in-core counters. This leaves the filesystem with a clean log but in
an inconsistent state with regard to lazy sb counters.

Update xfs_log_sbcount() to use the same logic
xfs_log_unmount_write() uses to determine when to write an unmount
record. This ensures that lazy accounting is always synced before
the log is cleaned. Refactor this logic into a new helper to
distinguish between a writable filesystem and a writable log.
Specifically, the log is writable unless the filesystem is mounted
with the norecovery mount option, the underlying log device is
read-only, or the filesystem is shutdown. Drop the freeze state
check because the update is already allowed during the freezing
process and no context calls this function on an already frozen fs.
Also, retain the shutdown check in xfs_log_unmount_write() to catch
the case where the preceding log force might have triggered a
shutdown.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Gao Xiang <hsiangkao@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:38 +01:00
Darrick J. Wong
fb8ee907c1 xfs: prevent UAF in xfs_log_item_in_current_chkpt
commit f8d92a66e810acbef6ddbc0bd0cbd9b117ce8acd upstream.

[ Continue to interpret xfs_log_item->li_seq as an LSN rather than a CIL sequence
  number. ]

While I was running with KASAN and lockdep enabled, I stumbled upon an
KASAN report about a UAF to a freed CIL checkpoint.  Looking at the
comment for xfs_log_item_in_current_chkpt, it seems pretty obvious to me
that the original patch to xfs_defer_finish_noroll should have done
something to lock the CIL to prevent it from switching the CIL contexts
while the predicate runs.

For upper level code that needs to know if a given log item is new
enough not to need relogging, add a new wrapper that takes the CIL
context lock long enough to sample the current CIL context.  This is
kind of racy in that the CIL can switch the contexts immediately after
sampling, but that's ok because the consequence is that the defer ops
code is a little slow to relog items.

 ==================================================================
 BUG: KASAN: use-after-free in xfs_log_item_in_current_chkpt+0x139/0x160 [xfs]
 Read of size 8 at addr ffff88804ea5f608 by task fsstress/527999

 CPU: 1 PID: 527999 Comm: fsstress Tainted: G      D      5.16.0-rc4-xfsx #rc4
 Call Trace:
  <TASK>
  dump_stack_lvl+0x45/0x59
  print_address_description.constprop.0+0x1f/0x140
  kasan_report.cold+0x83/0xdf
  xfs_log_item_in_current_chkpt+0x139/0x160
  xfs_defer_finish_noroll+0x3bb/0x1e30
  __xfs_trans_commit+0x6c8/0xcf0
  xfs_reflink_remap_extent+0x66f/0x10e0
  xfs_reflink_remap_blocks+0x2dd/0xa90
  xfs_file_remap_range+0x27b/0xc30
  vfs_dedupe_file_range_one+0x368/0x420
  vfs_dedupe_file_range+0x37c/0x5d0
  do_vfs_ioctl+0x308/0x1260
  __x64_sys_ioctl+0xa1/0x170
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f2c71a2950b
 Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffe8c0e03c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00005600862a8740 RCX: 00007f2c71a2950b
 RDX: 00005600862a7be0 RSI: 00000000c0189436 RDI: 0000000000000004
 RBP: 000000000000000b R08: 0000000000000027 R09: 0000000000000003
 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000005a
 R13: 00005600862804a8 R14: 0000000000016000 R15: 00005600862a8a20
  </TASK>

 Allocated by task 464064:
  kasan_save_stack+0x1e/0x50
  __kasan_kmalloc+0x81/0xa0
  kmem_alloc+0xcd/0x2c0 [xfs]
  xlog_cil_ctx_alloc+0x17/0x1e0 [xfs]
  xlog_cil_push_work+0x141/0x13d0 [xfs]
  process_one_work+0x7f6/0x1380
  worker_thread+0x59d/0x1040
  kthread+0x3b0/0x490
  ret_from_fork+0x1f/0x30

 Freed by task 51:
  kasan_save_stack+0x1e/0x50
  kasan_set_track+0x21/0x30
  kasan_set_free_info+0x20/0x30
  __kasan_slab_free+0xed/0x130
  slab_free_freelist_hook+0x7f/0x160
  kfree+0xde/0x340
  xlog_cil_committed+0xbfd/0xfe0 [xfs]
  xlog_cil_process_committed+0x103/0x1c0 [xfs]
  xlog_state_do_callback+0x45d/0xbd0 [xfs]
  xlog_ioend_work+0x116/0x1c0 [xfs]
  process_one_work+0x7f6/0x1380
  worker_thread+0x59d/0x1040
  kthread+0x3b0/0x490
  ret_from_fork+0x1f/0x30

 Last potentially related work creation:
  kasan_save_stack+0x1e/0x50
  __kasan_record_aux_stack+0xb7/0xc0
  insert_work+0x48/0x2e0
  __queue_work+0x4e7/0xda0
  queue_work_on+0x69/0x80
  xlog_cil_push_now.isra.0+0x16b/0x210 [xfs]
  xlog_cil_force_seq+0x1b7/0x850 [xfs]
  xfs_log_force_seq+0x1c7/0x670 [xfs]
  xfs_file_fsync+0x7c1/0xa60 [xfs]
  __x64_sys_fsync+0x52/0x80
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae

 The buggy address belongs to the object at ffff88804ea5f600
  which belongs to the cache kmalloc-256 of size 256
 The buggy address is located 8 bytes inside of
  256-byte region [ffff88804ea5f600, ffff88804ea5f700)
 The buggy address belongs to the page:
 page:ffffea00013a9780 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88804ea5ea00 pfn:0x4ea5e
 head:ffffea00013a9780 order:1 compound_mapcount:0
 flags: 0x4fff80000010200(slab|head|node=1|zone=1|lastcpupid=0xfff)
 raw: 04fff80000010200 ffffea0001245908 ffffea00011bd388 ffff888004c42b40
 raw: ffff88804ea5ea00 0000000000100009 00000001ffffffff 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff88804ea5f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff88804ea5f580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 >ffff88804ea5f600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                       ^
  ffff88804ea5f680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff88804ea5f700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ==================================================================

Fixes: 4e919af7827a ("xfs: periodically relog deferred intent items")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:38 +01:00
Darrick J. Wong
7c07806ab0 xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks
commit a5336d6bb2d02d0e9d4d3c8be04b80b8b68d56c8 upstream.

In commit 27c14b5daa82 we started tracking the last inode seen during an
inode walk to avoid infinite loops if a corrupt inobt record happens to
have a lower ir_startino than the record preceeding it.  Unfortunately,
the assertion trips over the case where there are completely empty inobt
records (which can happen quite easily on 64k page filesystems) because
we advance the tracking cursor without actually putting the empty record
into the processing buffer.  Fix the assert to allow for this case.

Reported-by: zlang@redhat.com
Fixes: 27c14b5daa82 ("xfs: ensure inobt record walks always make forward progress")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Zorro Lang <zlang@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:38 +01:00
Darrick J. Wong
313699d505 xfs: ensure inobt record walks always make forward progress
commit 27c14b5daa82861220d6fa6e27b51f05f21ffaa7 upstream.

[ In xfs_iwalk_ag(), Replace a call to XFS_IS_CORRUPT() with a call to
  ASSERT() ]

The aim of the inode btree record iterator function is to call a
callback on every record in the btree.  To avoid having to tear down and
recreate the inode btree cursor around every callback, it caches a
certain number of records in a memory buffer.  After each batch of
callback invocations, we have to perform a btree lookup to find the
next record after where we left off.

However, if the keys of the inode btree are corrupt, the lookup might
put us in the wrong part of the inode btree, causing the walk function
to loop forever.  Therefore, we add extra cursor tracking to make sure
that we never go backwards neither when performing the lookup nor when
jumping to the next inobt record.  This also fixes an off by one error
where upon resume the lookup should have been for the inode /after/ the
point at which we stopped.

Found by fuzzing xfs/460 with keys[2].startino = ones causing bulkstat
and quotacheck to hang.

Fixes: a211432c27 ("xfs: create simplified inode walk function")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:38 +01:00
Darrick J. Wong
7f9309a9f5 xfs: fix missing CoW blocks writeback conversion retry
commit c2f09217a4305478c55adc9a98692488dd19cd32 upstream.

[ Set xfs_writepage_ctx->fork to XFS_DATA_FORK since 5.4.y tracks current
  extent's fork in this variable ]

In commit 7588cbeec6, we tried to fix a race stemming from the lack of
coordination between higher level code that wants to allocate and remap
CoW fork extents into the data fork.  Christoph cites as examples the
always_cow mode, and a directio write completion racing with writeback.

According to the comments before the goto retry, we want to restart the
lookup to catch the extent in the data fork, but we don't actually reset
whichfork or cow_fsb, which means the second try executes using stale
information.  Up until now I think we've gotten lucky that either
there's something left in the CoW fork to cause cow_fsb to be reset, or
either data/cow fork sequence numbers have advanced enough to force a
fresh lookup from the data fork.  However, if we reach the retry with an
empty stable CoW fork and a stable data fork, neither of those things
happens.  The retry foolishly re-calls xfs_convert_blocks on the CoW
fork which fails again.  This time, we toss the write.

I've recently been working on extending reflink to the realtime device.
When the realtime extent size is larger than a single block, we have to
force the page cache to CoW the entire rt extent if a write (or
fallocate) are not aligned with the rt extent size.  The strategy I've
chosen to deal with this is derived from Dave's blocksize > pagesize
series: dirtying around the write range, and ensuring that writeback
always starts mapping on an rt extent boundary.  This has brought this
race front and center, since generic/522 blows up immediately.

However, I'm pretty sure this is a bug outright, independent of that.

Fixes: 7588cbeec6 ("xfs: retry COW fork delalloc conversion when no extent was found")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:38 +01:00
Darrick J. Wong
6246b3a18f xfs: only relog deferred intent items if free space in the log gets low
commit 74f4d6a1e065c92428c5b588099e307a582d79d9 upstream.

Now that we have the ability to ask the log how far the tail needs to be
pushed to maintain its free space targets, augment the decision to relog
an intent item so that we only do it if the log has hit the 75% full
threshold.  There's no point in relogging an intent into the same
checkpoint, and there's no need to relog if there's plenty of free space
in the log.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:37 +01:00
Darrick J. Wong
09d6181447 xfs: expose the log push threshold
commit ed1575daf71e4e21d8ae735b6e687c95454aaa17 upstream.

Separate the computation of the log push threshold and the push logic in
xlog_grant_push_ail.  This enables higher level code to determine (for
example) that it is holding on to a logged intent item and the log is so
busy that it is more than 75% full.  In that case, it would be desirable
to move the log item towards the head to release the tail, which we will
cover in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:37 +01:00
Darrick J. Wong
5d711e4136 xfs: periodically relog deferred intent items
commit 4e919af7827a6adfc28e82cd6c4ffcfcc3dd6118 upstream.

[ Modify xfs_{bmap|extfree|refcount|rmap}_item.c to fix merge conflicts ]

There's a subtle design flaw in the deferred log item code that can lead
to pinning the log tail.  Taking up the defer ops chain examples from
the previous commit, we can get trapped in sequences like this:

Caller hands us a transaction t0 with D0-D3 attached.  The defer ops
chain will look like the following if the transaction rolls succeed:

t1: D0(t0), D1(t0), D2(t0), D3(t0)
t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0)
t3: d5(t1), D1(t0), D2(t0), D3(t0)
...
t9: d9(t7), D3(t0)
t10: D3(t0)
t11: d10(t10), d11(t10)
t12: d11(t10)

In transaction 9, we finish d9 and try to roll to t10 while holding onto
an intent item for D3 that we logged in t0.

The previous commit changed the order in which we place new defer ops in
the defer ops processing chain to reduce the maximum chain length.  Now
make xfs_defer_finish_noroll capable of relogging the entire chain
periodically so that we can always move the log tail forward.  Most
chains will never get relogged, except for operations that generate very
long chains (large extents containing many blocks with different sharing
levels) or are on filesystems with small logs and a lot of ongoing
metadata updates.

Callers are now required to ensure that the transaction reservation is
large enough to handle logging done items and new intent items for the
maximum possible chain length.  Most callers are careful to keep the
chain lengths low, so the overhead should be minimal.

The decision to relog an intent item is made based on whether the intent
was logged in a previous checkpoint, since there's no point in relogging
an intent into the same checkpoint.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:37 +01:00
Darrick J. Wong
870e7d7108 xfs: change the order in which child and parent defer ops are finished
commit 27dada070d59c28a441f1907d2cec891b17dcb26 upstream.

The defer ops code has been finishing items in the wrong order -- if a
top level defer op creates items A and B, and finishing item A creates
more defer ops A1 and A2, we'll put the new items on the end of the
chain and process them in the order A B A1 A2.  This is kind of weird,
since it's convenient for programmers to be able to think of A and B as
an ordered sequence where all the sub-tasks for A must finish before we
move on to B, e.g. A A1 A2 D.

Right now, our log intent items are not so complex that this matters,
but this will become important for the atomic extent swapping patchset.
In order to maintain correct reference counting of extents, we have to
unmap and remap extents in that order, and we want to complete that work
before moving on to the next range that the user wants to swap.  This
patch fixes defer ops to satsify that requirement.

The primary symptom of the incorrect order was noticed in an early
performance analysis of the atomic extent swap code.  An astonishingly
large number of deferred work items accumulated when userspace requested
an atomic update of two very fragmented files.  The cause of this was
traced to the same ordering bug in the inner loop of
xfs_defer_finish_noroll.

If the ->finish_item method of a deferred operation queues new deferred
operations, those new deferred ops are appended to the tail of the
pending work list.  To illustrate, say that a caller creates a
transaction t0 with four deferred operations D0-D3.  The first thing
defer ops does is roll the transaction to t1, leaving us with:

t1: D0(t0), D1(t0), D2(t0), D3(t0)

Let's say that finishing each of D0-D3 will create two new deferred ops.
After finish D0 and roll, we'll have the following chain:

t2: D1(t0), D2(t0), D3(t0), d4(t1), d5(t1)

d4 and d5 were logged to t1.  Notice that while we're about to start
work on D1, we haven't actually completed all the work implied by D0
being finished.  So far we've been careful (or lucky) to structure the
dfops callers such that D1 doesn't depend on d4 or d5 being finished,
but this is a potential logic bomb.

There's a second problem lurking.  Let's see what happens as we finish
D1-D3:

t3: D2(t0), D3(t0), d4(t1), d5(t1), d6(t2), d7(t2)
t4: D3(t0), d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3)
t5: d4(t1), d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4)

Let's say that d4-d11 are simple work items that don't queue any other
operations, which means that we can complete each d4 and roll to t6:

t6: d5(t1), d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4)
t7: d6(t2), d7(t2), d8(t3), d9(t3), d10(t4), d11(t4)
...
t11: d10(t4), d11(t4)
t12: d11(t4)
<done>

When we try to roll to transaction #12, we're holding defer op d11,
which we logged way back in t4.  This means that the tail of the log is
pinned at t4.  If the log is very small or there are a lot of other
threads updating metadata, this means that we might have wrapped the log
and cannot get roll to t11 because there isn't enough space left before
we'd run into t4.

Let's shift back to the original failure.  I mentioned before that I
discovered this flaw while developing the atomic file update code.  In
that scenario, we have a defer op (D0) that finds a range of file blocks
to remap, creates a handful of new defer ops to do that, and then asks
to be continued with however much work remains.

So, D0 is the original swapext deferred op.  The first thing defer ops
does is rolls to t1:

t1: D0(t0)

We try to finish D0, logging d1 and d2 in the process, but can't get all
the work done.  We log a done item and a new intent item for the work
that D0 still has to do, and roll to t2:

t2: D0'(t1), d1(t1), d2(t1)

We roll and try to finish D0', but still can't get all the work done, so
we log a done item and a new intent item for it, requeue D0 a second
time, and roll to t3:

t3: D0''(t2), d1(t1), d2(t1), d3(t2), d4(t2)

If it takes 48 more rolls to complete D0, then we'll finally dispense
with D0 in t50:

t50: D<fifty primes>(t49), d1(t1), ..., d102(t50)

We then try to roll again to get a chain like this:

t51: d1(t1), d2(t1), ..., d101(t50), d102(t50)
...
t152: d102(t50)
<done>

Notice that in rolling to transaction #51, we're holding on to a log
intent item for d1 that was logged in transaction #1.  This means that
the tail of the log is pinned at t1.  If the log is very small or there
are a lot of other threads updating metadata, this means that we might
have wrapped the log and cannot roll to t51 because there isn't enough
space left before we'd run into t1.  This is of course problem #2 again.

But notice the third problem with this scenario: we have 102 defer ops
tied to this transaction!  Each of these items are backed by pinned
kernel memory, which means that we risk OOM if the chains get too long.

Yikes.  Problem #1 is a subtle logic bomb that could hit someone in the
future; problem #2 applies (rarely) to the current upstream, and problem

This is not how incremental deferred operations were supposed to work.
The dfops design of logging in the same transaction an intent-done item
and a new intent item for the work remaining was to make it so that we
only have to juggle enough deferred work items to finish that one small
piece of work.  Deferred log item recovery will find that first
unfinished work item and restart it, no matter how many other intent
items might follow it in the log.  Therefore, it's ok to put the new
intents at the start of the dfops chain.

For the first example, the chains look like this:

t2: d4(t1), d5(t1), D1(t0), D2(t0), D3(t0)
t3: d5(t1), D1(t0), D2(t0), D3(t0)
...
t9: d9(t7), D3(t0)
t10: D3(t0)
t11: d10(t10), d11(t10)
t12: d11(t10)

For the second example, the chains look like this:

t1: D0(t0)
t2: d1(t1), d2(t1), D0'(t1)
t3: d2(t1), D0'(t1)
t4: D0'(t1)
t5: d1(t4), d2(t4), D0''(t4)
...
t148: D0<50 primes>(t147)
t149: d101(t148), d102(t148)
t150: d102(t148)
<done>

This actually sucks more for pinning the log tail (we try to roll to t10
while holding an intent item that was logged in t1) but we've solved
problem #1.  We've also reduced the maximum chain length from:

    sum(all the new items) + nr_original_items

to:

    max(new items that each original item creates) + nr_original_items

This solves problem #3 by sharply reducing the number of defer ops that
can be attached to a transaction at any given time.  The change makes
the problem of log tail pinning worse, but is improvement we need to
solve problem #2.  Actually solving #2, however, is left to the next
patch.

Note that a subsequent analysis of some hard-to-trigger reflink and COW
livelocks on extremely fragmented filesystems (or systems running a lot
of IO threads) showed the same symptoms -- uncomfortably large numbers
of incore deferred work items and occasional stalls in the transaction
grant code while waiting for log reservations.  I think this patch and
the next one will also solve these problems.

As originally written, the code used list_splice_tail_init instead of
list_splice_init, so change that, and leave a short comment explaining
our actions.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:37 +01:00
Darrick J. Wong
f5af1d5c2d xfs: fix an incore inode UAF in xfs_bui_recover
commit ff4ab5e02a0447dd1e290883eb6cd7d94848e590 upstream.

In xfs_bui_item_recover, there exists a use-after-free bug with regards
to the inode that is involved in the bmap replay operation.  If the
mapping operation does not complete, we call xfs_bmap_unmap_extent to
create a deferred op to finish the unmapping work, and we retain a
pointer to the incore inode.

Unfortunately, the very next thing we do is commit the transaction and
drop the inode.  If reclaim tears down the inode before we try to finish
the defer ops, we dereference garbage and blow up.  Therefore, create a
way to join inodes to the defer ops freezer so that we can maintain the
xfs_inode reference until we're done with the inode.

Note: This imposes the requirement that there be enough memory to keep
every incore inode in memory throughout recovery.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:37 +01:00
Darrick J. Wong
efcdc2e70e xfs: clean up xfs_bui_item_recover iget/trans_alloc/ilock ordering
commit 64a3f3315bc60f710a0a25c1798ac0ea58c6fa1f upstream.

In most places in XFS, we have a specific order in which we gather
resources: grab the inode, allocate a transaction, then lock the inode.
xfs_bui_item_recover doesn't do it in that order, so fix it to be more
consistent.  This also makes the error bailout code a bit less weird.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:37 +01:00
Darrick J. Wong
abad319dee xfs: clean up bmap intent item recovery checking
commit 919522e89f8e71fc6a8f8abe17be4011573c6ea0 upstream.

The bmap intent item checking code in xfs_bui_item_recover is spread all
over the function.  We should check the recovered log item at the top
before we allocate any resources or do anything else, so do that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:37 +01:00
Darrick J. Wong
6601531db8 xfs: xfs_defer_capture should absorb remaining transaction reservation
commit 929b92f64048d90d23e40a59c47adf59f5026903 upstream.

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should record the transaction reservation type
from the old transaction so that when we continue the dfops chain, we
still use the same reservation parameters.

Doing this means that the log item recovery functions get to determine
the transaction reservation instead of abusing tr_itruncate in yet
another part of xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:37 +01:00
Darrick J. Wong
411b14e68c xfs: xfs_defer_capture should absorb remaining block reservations
commit 4f9a60c48078c0efa3459678fa8d6e050e8ada5d upstream.

When xfs_defer_capture extracts the deferred ops and transaction state
from a transaction, it should record the remaining block reservations so
that when we continue the dfops chain, we can reserve the same number of
blocks to use.  We capture the reservations for both data and realtime
volumes.

This adds the requirement that every log intent item recovery function
must be careful to reserve enough blocks to handle both itself and all
defer ops that it can queue.  On the other hand, this enables us to do
away with the handwaving block estimation nonsense that was going on in
xlog_finish_defer_ops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:36 +01:00
Darrick J. Wong
3324249e6e xfs: proper replay of deferred ops queued during log recovery
commit e6fff81e487089e47358a028526a9f63cdbcd503 upstream.

When we replay unfinished intent items that have been recovered from the
log, it's possible that the replay will cause the creation of more
deferred work items.  As outlined in commit 509955823c ("xfs: log
recovery should replay deferred ops in order"), later work items have an
implicit ordering dependency on earlier work items.  Therefore, recovery
must replay the items (both recovered and created) in the same order
that they would have been during normal operation.

For log recovery, we enforce this ordering by using an empty transaction
to collect deferred ops that get created in the process of recovering a
log intent item to prevent them from being committed before the rest of
the recovered intent items.  After we finish committing all the
recovered log items, we allocate a transaction with an enormous block
reservation, splice our huge list of created deferred ops into that
transaction, and commit it, thereby finishing all those ops.

This is /really/ hokey -- it's the one place in XFS where we allow
nested transactions; the splicing of the defer ops list is is inelegant
and has to be done twice per recovery function; and the broken way we
handle inode pointers and block reservations cause subtle use-after-free
and allocator problems that will be fixed by this patch and the two
patches after it.

Therefore, replace the hokey empty transaction with a structure designed
to capture each chain of deferred ops that are created as part of
recovering a single unfinished log intent.  Finally, refactor the loop
that replays those chains to do so using one transaction per chain.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:36 +01:00
Dave Chinner
1c89c04305 xfs: fix finobt btree block recovery ordering
commit 671459676ab0e1d371c8d6b184ad1faa05b6941e upstream.

[ In 5.4.y, xlog_recover_get_buf_lsn() is defined inside
  fs/xfs/xfs_log_recover.c ]

Nathan popped up on #xfs and pointed out that we fail to handle
finobt btree blocks in xlog_recover_get_buf_lsn(). This means they
always fall through the entire magic number matching code to "recover
immediately". Whilst most of the time this is the correct behaviour,
occasionally it will be incorrect and could potentially overwrite
more recent metadata because we don't check the LSN in the on disk
metadata at all.

This bug has been present since the finobt was first introduced, and
is a potential cause of the occasional xfs_iget_check_free_state()
failures we see that indicate that the inode btree state does not
match the on disk inode state.

Fixes: aafc3c2465 ("xfs: support the XFS_BTNUM_FINOBT free inode btree type")
Reported-by: Nathan Scott <nathans@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:36 +01:00
Darrick J. Wong
6678b2787b xfs: log new intent items created as part of finishing recovered intent items
commit 93293bcbde93567efaf4e6bcd58cad270e1fcbf5 upstream.

[Slightly edit fs/xfs/xfs_bmap_item.c & fs/xfs/xfs_refcount_item.c to resolve
merge conflicts]

During a code inspection, I found a serious bug in the log intent item
recovery code when an intent item cannot complete all the work and
decides to requeue itself to get that done.  When this happens, the
item recovery creates a new incore deferred op representing the
remaining work and attaches it to the transaction that it allocated.  At
the end of _item_recover, it moves the entire chain of deferred ops to
the dummy parent_tp that xlog_recover_process_intents passed to it, but
fail to log a new intent item for the remaining work before committing
the transaction for the single unit of work.

xlog_finish_defer_ops logs those new intent items once recovery has
finished dealing with the intent items that it recovered, but this isn't
sufficient.  If the log is forced to disk after a recovered log item
decides to requeue itself and the system goes down before we call
xlog_finish_defer_ops, the second log recovery will never see the new
intent item and therefore has no idea that there was more work to do.
It will finish recovery leaving the filesystem in a corrupted state.

The same logic applies to /any/ deferred ops added during intent item
recovery, not just the one handling the remaining work.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:36 +01:00
Christoph Hellwig
562da8e704 xfs: refactor xfs_defer_finish_noroll
commit bb47d79750f1a68a75d4c7defc2da934ba31de14 upstream.

Split out a helper that operates on a single xfs_defer_pending structure
to untangle the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:36 +01:00
Christoph Hellwig
42a2406f90 xfs: turn dfp_intent into a xfs_log_item
commit 13a8333339072b8654c1d2c75550ee9f41ee15de upstream.

All defer op instance place their own extension of the log item into
the dfp_intent field.  Replace that with a xfs_log_item to improve type
safety and make the code easier to follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:36 +01:00
Christoph Hellwig
e11f1516fc xfs: merge the ->diff_items defer op into ->create_intent
commit d367a868e46b025a8ced8e00ef2b3a3c2f3bf732 upstream.

This avoids a per-item indirect call, and also simplifies the interface
a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:36 +01:00
Christoph Hellwig
e84096edf8 xfs: merge the ->log_item defer op into ->create_intent
commit c1f09188e8de0ae65433cb9c8ace4feb66359bcc upstream.

These are aways called together, and my merging them we reduce the amount
of indirect calls, improve type safety and in general clean up the code
a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:36 +01:00
Christoph Hellwig
64b21eaa33 xfs: factor out a xfs_defer_create_intent helper
commit e046e949486ec92d83b2ccdf0e7e9144f74ef028 upstream.

Create a helper that encapsulates the whole logic to create a defer
intent.  This reorders some of the work that was done, but none of
that has an affect on the operation as only fields that don't directly
interact are affected.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:35 +01:00
Christoph Hellwig
d24633f3c2 xfs: remove the xfs_inode_log_item_t typedef
commit fd9cbe51215198ccffa64169c98eae35b0916088 upstream.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:35 +01:00
Christoph Hellwig
e0373eeaaa xfs: remove the xfs_efd_log_item_t typedef
commit c84e819090f39e96e4d432c9047a50d2424f99e0 upstream.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:35 +01:00
Christoph Hellwig
94e0639992 xfs: remove the xfs_efi_log_item_t typedef
commit 82ff450b2d936d778361a1de43eb078cc043c7fe upstream.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-02-22 12:50:35 +01:00