Pull x86 fixes from Peter Anvin:
"This is a collection of minor fixes for x86, plus the IRET information
leak fix (forbid the use of 16-bit segments in 64-bit mode)"
NOTE! We may have to relax the "forbid the use of 16-bit segments in
64-bit mode" part, since there may be people who still run and depend on
16-bit Windows binaries under Wine.
But I'm taking this in the current unconditional form for now to see who
(if anybody) screams bloody murder. Maybe nobody cares. And maybe
we'll have to update it with some kind of runtime enablement (like our
vm.mmap_min_addr tunable that people who run dosemu/qemu/wine already
need to tweak).
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
efi: Pass correct file handle to efi_file_{read,close}
x86/efi: Correct EFI boot stub use of code32_start
x86/efi: Fix boot failure with EFI stub
x86/platform/hyperv: Handle VMBUS driver being a module
x86/apic: Reinstate error IRQ Pentium erratum 3AP workaround
x86, CMCI: Add proper detection of end of CMCI storms
firmware was reading random values from the stack because we were
passing a pointer to the wrong object type.
* Kernel corruption has been reported when booting with the EFI boot
stub which was tracked down to setting a bogus value for
bp->hdr.code32_start, resulting in corruption during relocation.
* Olivier Martin reported that the wrong file handles were being passed
to efi_file_(read|close), which works for x86 by luck due to the way
that the FAT driver is implemented, but doesn't work on ARM.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJTR5QIAAoJEC84WcCNIz1VMn0P/01GF8A2frSK+NuCJCkmZoAa
fcOvcHmQajNwG3WAVsVWlS/i2QsYwK1jAgameEusn+FFrnWIwaZ9qb1TjEMbJylu
4odaRc1YYiLOJi9UD2jRB644374jJwgwteKGs0Vt99g4pa8HsgSbXTR6oF8PUDWr
1HZUV9tq8O1eAzpQdMADEgWYieylnldfvHk+ArPTJyR5fTNx8xCYALlCthc6Tv+A
cpi0rQj/YzNh+vqZF1YYZ8xqktvV1di2Hvmy3UVt05y1kwkaTquNY9478ZRF5UHm
oUk3nAYyA9M/1gxVnvUfyLgUtrWtyF02N+iDTxLoz05KxeK5wVdKaIPZfSAUrglt
hOvnL+5EOss6w9gG19zpPD4FVHCd696W+iCIBoqooWJqX8AqOVRr81GTYb3q3YDr
EIH0wLipuV4XI4sdN8JMH9fIbfkRdAvaGUR2lPSYFq2Cm7nn2hs820UdKFYeH0wT
fdgtGpWAdXhEq/SUW4KRZMCXLDz4XuNF3d/JREcC28CyiRgdjKFD/PMbZEShpisF
fYE16+IiAq8UMgfgUDqlrSP2UMqkyZ2kp5itvJBrLbTD6rWzEcpK+CMXqykWTOwV
ONzPAfZEbUmFuU3JhKOTFO5uf7dM9EG5BDKduWR6Wjl8VIVTQlD8R1OB5o1lbZPN
ecFWo1eIQGZjeoMm36EM
=rovT
-----END PGP SIGNATURE-----
Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent
Pull EFI fixes from Matt Fleming:
"* Fix EFI boot regression introduced during the merge window where the
firmware was reading random values from the stack because we were
passing a pointer to the wrong object type.
* Kernel corruption has been reported when booting with the EFI boot
stub which was tracked down to setting a bogus value for
bp->hdr.code32_start, resulting in corruption during relocation.
* Olivier Martin reported that the wrong file handles were being passed
to efi_file_(read|close), which works for x86 by luck due to the way
that the FAT driver is implemented, but doesn't work on ARM."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We're currently passing the file handle for the root file system to
efi_file_read() and efi_file_close(), instead of the file handle for the
file we wish to read/close.
While this has worked up until now, it seems that it has only been by
pure luck. Olivier explains,
"The issue is the UEFI Fat driver might return the same function for
'fh->read()' and 'h->read()'. While in our case it does not work with
a different implementation of EFI_SIMPLE_FILE_SYSTEM_PROTOCOL. In our
case, we return a different pointer when reading a directory and
reading a file."
Fixing this actually clears up the two functions because we can drop one
of the arguments, and instead only pass a file 'handle' argument.
Reported-by: Olivier Martin <olivier.martin@arm.com>
Reviewed-by: Olivier Martin <olivier.martin@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Pull x86 platform driver updates from Matthew Garrett:
"Support for the new keyboard features on the Thinkpad Carbon, a bunch
of updates for the Sony and Toshiba drivers, a new driver for upcoming
Alienware hardware and a few misc fixes. There's a couple of patches
that got Acked today but aren't invasive, so I'll send a further PR
for them next week"
* 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86: (28 commits)
alienware-wmi: cover some scenarios where memory allocations would fail
Add WMI driver for controlling AlienFX features on some Alienware products
fujitsu-tablet: add support for Lifebook T901 and T902
x86, platform: Make HP_WIRELESS option text more descriptive
x86, acpi: LLVMLinux: Remove nested functions from Thinkpad ACPI
save and restore adaptive keyboard mode for suspend and,resume
support Thinkpad X1 Carbon 2nd generation's adaptive keyboard
toshiba_acpi: Fix whitespace
toshiba_acpi: Update version and copyright info
toshiba_acpi: Add accelerometer support
toshiba_acpi: Add ECO mode led support
toshiba_acpi: Add touchpad enable/disable support-
toshiba_acpi: Add keyboard backlight support
toshiba_acpi: Adapt Illumination code to use SCI
toshiba_acpi: Add System Configuration Interface
thinkpad_acpi: Fix inconsistent mute LED after resume
sonypi: Simplify dependencies
Revert "X86 platform: New BayTrail IOSF-SB MBI driver"
sony-laptop: remove useless sony-laptop versioning
sony-laptop: add smart connect control function
...
Pull block layer fixes from Jens Axboe:
"A small collection of fixes that should go in before -rc1. The pull
request contains:
- A two patch fix for a regression with block enabled tagging caused
by a commit in the initial pull request. One patch is from Martin
and ensures that SCSI doesn't truncate 64-bit block flags, the
other one is from me and prevents us from double using struct
request queuelist for both completion and busy tags. This caused
anything from a boot crash for some, to crashes under load.
- A blk-mq fix for a potential soft stall when hot unplugging CPUs
with busy IO.
- percpu_counter fix is listed in here, that caused a suspend issue
with virtio-blk due to percpu counters having an inconsistent state
during CPU removal. Andrew sent this in separately a few days ago,
but it's here. JFYI.
- A few fixes for block integrity from Martin.
- A ratelimit fix for loop from Mike Galbraith, to avoid spewing too
much in error cases"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix regression with block enabled tagging
scsi: Make sure cmd_flags are 64-bit
block: Ensure we only enable integrity metadata for reads and writes
block: Fix integrity verification
block: Fix for_each_bvec()
drivers/block/loop.c: ratelimit error messages
blk-mq: fix potential stall during CPU unplug with IO pending
percpu_counter: fix bad counter state during suspend
Pull thermal management updates from Zhang Rui:
"We only have a couple of fixes/cleanups for platform thermal drivers
this time.
Specifics:
- rcar thermal driver: avoid updating the thermal zone in case an IRQ
was triggered but the temperature didn't effectively change. From
Patrick Titiano.
- update the imx thermal driver' formula of converting thermal
sensor' raw date to real temperature in degree C. From Anson
Huang.
- trivial code cleanups of ti soc thermal and rcar thermal driver
from Jingoo Han and Patrick Titiano"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: rcar-thermal: update thermal zone only when temperature changes
thermal: rcar-thermal: fix same mask applied twice
thermal: ti-soc-thermal: Use SIMPLE_DEV_PM_OPS macro
thermal: imx: update formula for thermal sensor
Intel test builder caught a few instances that should test if kzalloc failed to
allocate memory as well as a scenario that platform_driver wasn't properly
initialized.
Signed-off-by: Mario Limonciello <mario_limonciello@dell.com>
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Pull LED updates from Bryan Wu:
"This cycle we got:
- new driver for leds-mc13783
- bug fixes
- code cleanup"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
leds: make sure we unregister a trigger only once
leds: leds-pwm: properly clean up after probe failure
leds: clevo-mail: Make probe function __init
leds-ot200: Fix dependencies
leds-gpio: of: introduce MODULE_DEVICE_TABLE for module autoloading
leds: clevo-mail: remove __initdata marker
leds: leds-ss4200: remove __initdata marker
leds: blinkm: remove unnecessary spaces
leds: lp5562: remove unnecessary parentheses
leds: leds-ss4200: remove DEFINE_PCI_DEVICE_TABLE macro
leds: leds-s3c24xx: Trivial cleanup in header file
drivers/leds: delete non-required instances of include <linux/init.h>
leds: leds-gpio: add retain-state-suspended property
leds: leds-mc13783: Add devicetree support
leds: leds-mc13783: Remove unnecessary cleaning of registers on exit
leds: leds-mc13783: Use proper "max_brightness" value fo LEDs
leds: leds-mc13783: Use LED core PM functions
leds: leds-mc13783: Add MC34708 LED support
leds: Turn off led if blinking is disabled
ledtrig-cpu: Handle CPU hot(un)plugging
Pull slave-dmaengine updates from Vinod Koul:
- New driver for Qcom bam dma
- New driver for RCAR peri-peri
- New driver for FSL eDMA
- Various odd fixes and updates thru the subsystem
* 'for-linus' of git://git.infradead.org/users/vkoul/slave-dma: (29 commits)
dmaengine: add Qualcomm BAM dma driver
shdma: add R-Car Audio DMAC peri peri driver
dmaengine: sirf: enable generic dt binding for dma channels
dma: omap-dma: Implement device_slave_caps callback
dmaengine: qcom_bam_dma: Add device tree binding
dma: dw: Add suspend and resume handling for PCI mode DW_DMAC.
dma: dw: allocate memory in two stages in probe
Add new line to test result strings produced in verbose mode
dmaengine: pch_dma: use tasklet_kill in teardown
dmaengine: at_hdmac: use tasklet_kill in teardown
dma: cppi41: start tear down only if channel is busy
usb: musb: musb_cppi41: Dont reprogram DMA if tear down is initiated
dmaengine: s3c24xx-dma: make phy->irq signed for error handling
dma: imx-dma: Add missing module owner field
dma: imx-dma: Replace printk with dev_*
dma: fsl-edma: fix static checker warning of NULL dereference
dma: Remove comment about embedding dma_slave_config into custom structs
dma: mmp_tdma: move to generic device tree binding
dma: mmp_pdma: add IRQF_SHARED when request irq
dma: edma: Fix memory leak in edma_prep_dma_cyclic()
...
cmd_flags in struct request is now 64 bits wide but the scsi_execute
functions truncated arguments passed to int leading to errors. Make sure
the flags parameters are u64.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Jens Axboe <axboe@fb.com>
CC: Jan Kara <jack@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull i2c updates from Wolfram Sang:
"Here is the pull request from the i2c subsystem. It got a little
delayed because I needed to wait for a dependency to be included
(commit b424080a9e08: "reset: Add optional resets and stubs"). Plus,
I had some email problems. All done now, the highlights are:
- drivers can now deprecate their use of i2c classes. That shouldn't
be used on embedded platforms anyhow and was often blindly
copy&pasted. This mechanism gives users time to switch away and
ultimately boot faster once the use of classes for those drivers is
gone for good.
- new drivers for QUP, Cadence, efm32
- tracepoint support for I2C and SMBus
- bigger cleanups for the mv64xxx, nomadik, and designware drivers
And the usual bugfixes, cleanups, feature additions. Most stuff has
been in linux-next for a while. Just some hot fixes and new drivers
were added a bit more recently."
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (63 commits)
i2c: cadence: fix Kconfig dependency
i2c: Add driver for Cadence I2C controller
i2c: cadence: Document device tree bindings
Documentation: i2c: improve section about flags mangling the protocol
i2c: qup: use proper type fro clk_freq
i2c: qup: off by ones in qup_i2c_probe()
i2c: efm32: fix binding doc
MAINTAINERS: update I2C web resources
i2c: qup: New bus driver for the Qualcomm QUP I2C controller
i2c: qup: Add device tree bindings information
i2c: i2c-xiic: deprecate class based instantiation
i2c: i2c-sirf: deprecate class based instantiation
i2c: i2c-mv64xxx: deprecate class based instantiation
i2c: i2c-designware-platdrv: deprecate class based instantiation
i2c: i2c-davinci: deprecate class based instantiation
i2c: i2c-bcm2835: deprecate class based instantiation
i2c: mv64xxx: Fix reset controller handling
i2c: omap: fix usage of IS_ERR_VALUE with pm_runtime_get_sync
i2c: efm32: new bus driver
i2c: exynos5: remove unnecessary cast of void pointer
...
Core:
- CONFIG_MMC_UNSAFE_RESUME=y is now default behavior.
- DT bindings for SDHCI UHS, eMMC HS200, high-speed DDR, at 1.8/1.2V.
- Add GPIO descriptor based slot-gpio card detect API.
Drivers:
- dw_mmc: Refactor SOCFPGA support as a variant inside dw_mmc-pltfm.c.
- mmci: Support HW busy detection on ux500.
- omap: Support MMC_ERASE.
- omap_hsmmc: Support MMC_PM_KEEP_POWER, MMC_PM_WAKE_SDIO_IRQ, (a)cmd23.
- rtsx: Support pre-req/post-req async.
- sdhci: Add support for Realtek RTS5250 controllers.
- sdhci-acpi: Add support for 80860F16, fix 80860F14/SDIO card detect.
- sdhci-msm: Add new driver for Qualcomm SDHCI chipset support.
- sdhci-pxav3: Add support for Marvell Armada 380 and 385 SoCs.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJTRLHLAAoJEHNBYZ7TNxYMoqEQAOULXl1SHt0aHn5I0cfdVnRm
J3i56MqarwXQOse/qJrg8/uKsggAu0ivTlQ7x1h6bpXmzHqvOtZhSoO9BqGEvxOU
WNeA9ouaKMx3gCpIAwl9Odox+d2E+91nRfxU3fZTDITy554fREXmIpWiidjFPR7n
2oHT0yvGuLjunTC8MhxSB0OsggoIDXDTVPxrcf2k+AcAZAMlCMDNirN9+JbhiVM9
PNESapMyQAbFy18BGzCt5lO2o6aRileaSdX4BFTW4lx2LSPryUVV3cnfIH4zlytW
joVDWyU5kAtQgfhoEhTsWJld+cwHsMUrl/FOfhMvBWbPMxLJnbFx8b459nKJDM5j
NUo29KQxxHgWblGYx+F5SYuTloqWtX5iQWsez9g38Z/3UtjHR++o3+auwTFsZFRe
7EusZqsXdKggx1iiW/5afgb+tFOiCe5WOOQv29YdqWurPhaSK2Nr1aprD4RRiMeT
IG9qBLhHFLl8Pv0nTdEGbJHhAhihja6w2ul+i/8JSaDOYAGFbEn47MC8JfrKAnpw
WovxkSqMroMhjI+51cwJnVtdczQWx5kpjqDY0VaJlKvOfcwyOuyTU+s2vrHVDMZS
a0HgaXeVxr5IcDTz2zo1f6UbM4k2z/Ka0LOOSPqyOYOpFuT6VkXhgOVq6fsRpnaN
/9CUirULwF5ej0oz38hk
=6S8w
-----END PGP SIGNATURE-----
Merge tag 'mmc-updates-for-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
Pull MMC updates from Chris Ball:
"MMC highlights for 3.15:
Core:
- CONFIG_MMC_UNSAFE_RESUME=y is now default behavior
- DT bindings for SDHCI UHS, eMMC HS200, high-speed DDR, at 1.8/1.2V
- Add GPIO descriptor based slot-gpio card detect API
Drivers:
- dw_mmc: Refactor SOCFPGA support as a variant inside dw_mmc-pltfm.c
- mmci: Support HW busy detection on ux500
- omap: Support MMC_ERASE
- omap_hsmmc: Support MMC_PM_KEEP_POWER, MMC_PM_WAKE_SDIO_IRQ, (a)cmd23
- rtsx: Support pre-req/post-req async
- sdhci: Add support for Realtek RTS5250 controllers
- sdhci-acpi: Add support for 80860F16, fix 80860F14/SDIO card detect
- sdhci-msm: Add new driver for Qualcomm SDHCI chipset support
- sdhci-pxav3: Add support for Marvell Armada 380 and 385 SoCs"
* tag 'mmc-updates-for-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (102 commits)
mmc: sdhci-acpi: Intel SDIO has broken card detect
mmc: sdhci-pxav3: add support for the Armada 38x SDHCI controller
mmc: sdhci-msm: Add platform_execute_tuning implementation
mmc: sdhci-msm: Initial support for Qualcomm chipsets
mmc: sdhci-msm: Qualcomm SDHCI binding documentation
sdhci: only reprogram retuning timer when flag is set
mmc: rename ARCH_BCM to ARCH_BCM_MOBILE
mmc: sdhci: Allow for irq being shared
mmc: sdhci-acpi: Add device id 80860F16
mmc: sdhci-acpi: Fix broken card detect for ACPI HID 80860F14
mmc: slot-gpio: Add GPIO descriptor based CD GPIO API
mmc: slot-gpio: Split out CD IRQ request into a separate function
mmc: slot-gpio: Record GPIO descriptors instead of GPIO numbers
Revert "dts: socfpga: Add support for SD/MMC on the SOCFPGA platform"
mmc: sdhci-spear: use generic card detection gpio support
mmc: sdhci-spear: remove support for power gpio
mmc: sdhci-spear: simplify resource handling
mmc: sdhci-spear: fix platform_data usage
mmc: sdhci-spear: fix error handling paths for DT
mmc: sdhci-bcm-kona: fix build errors when built-in
...
Pull more powerpc updates from Ben Herrenschmidt:
"Here are a few more powerpc things for you.
So you'll find here the conversion of the two new firmware sysfs
interfaces to the new API for self-removing files that Greg and Tejun
introduced, so they can finally remove the old one.
I'm also reverting the hwmon driver for powernv. I shouldn't have
merged it, I got a bit carried away here. I hadn't realized it was
never CCed to the relevant maintainer(s) and list(s), and happens to
have some issues so I'm taking it out and it will come back via the
proper channels.
The rest is a bunch of LE fixes (argh, some of the new stuff was
broken on LE, I really need to start testing LE myself !) and various
random fixes here and there.
Finally one bit that's not strictly a fix, which is the HVC OPAL
change to "kick" the HVC thread when the firmware tells us there is
new incoming data. I don't feel like waiting for this one, it's
simple enough, and it makes a big difference in console responsiveness
which is good for my nerves"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (26 commits)
powerpc/powernv Adapt opal-elog and opal-dump to new sysfs_remove_file_self
Revert "powerpc/powernv: hwmon driver for power values, fan rpm and temperature"
power, sched: stop updating inside arch_update_cpu_topology() when nothing to be update
powerpc/le: Avoid creatng R_PPC64_TOCSAVE relocations for modules.
arch/powerpc: Use RCU_INIT_POINTER(x, NULL) in platforms/cell/spu_syscalls.c
powerpc/opal: Add missing include
powerpc: Convert last uses of __FUNCTION__ to __func__
powerpc: Add lq/stq emulation
powerpc/powernv: Add invalid OPAL call
powerpc/powernv: Add OPAL message log interface
powerpc/book3s: Fix mc_recoverable_range buffer overrun issue.
powerpc: Remove dead code in sycall entry
powerpc: Use of_node_init() for the fakenode in msi_bitmap.c
powerpc/mm: NUMA pte should be handled via slow path in get_user_pages_fast()
powerpc/powernv: Fix endian issues with sensor code
powerpc/powernv: Fix endian issues with OPAL async code
tty/hvc_opal: Kick the HVC thread on OPAL console events
powerpc/powernv: Add opal_notifier_unregister() and export to modules
powerpc/ppc64: Do not turn AIL (reloc-on interrupts) too early
powerpc/ppc64: Gracefully handle early interrupts
...
This reverts commit 0de7f8a917b5202014430e0055c0e1db0348bd62.
This driver wasn't merged via the proper maintainers (my fault ... ooops !)
and has serious issues so let's take it out for now and have a new better
one be merged the right way
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
Pull more networking updates from David Miller:
1) If a VXLAN interface is created with no groups, we can crash on
reception of packets. Fix from Mike Rapoport.
2) Missing includes in CPTS driver, from Alexei Starovoitov.
3) Fix string validations in isdnloop driver, from YOSHIFUJI Hideaki
and Dan Carpenter.
4) Missing irq.h include in bnxw2x, enic, and qlcnic drivers. From
Josh Boyer.
5) AF_PACKET transmit doesn't statistically count TX drops, from Daniel
Borkmann.
6) Byte-Queue-Limit enabled drivers aren't handled properly in
AF_PACKET transmit path, also from Daniel Borkmann.
Same problem exists in pktgen, and Daniel fixed it there too.
7) Fix resource leaks in driver probe error paths of new sxgbe driver,
from Francois Romieu.
8) Truesize of SKBs can gradually get more and more corrupted in NAPI
packet recycling path, fix from Eric Dumazet.
9) Fix uniprocessor netfilter build, from Florian Westphal. In the
longer term we should perhaps try to find a way for ARRAY_SIZE() to
work even with zero sized array elements.
10) Fix crash in netfilter conntrack extensions due to mis-estimation of
required extension space. From Andrey Vagin.
11) Since we commit table rule updates before trying to copy the
counters back to userspace (it's the last action we perform), we
really can't signal the user copy with an error as we are beyond the
point from which we can unwind everything. This causes all kinds of
use after free crashes and other mysterious behavior.
From Thomas Graf.
12) Restore previous behvaior of div/mod by zero in BPF filter
processing. From Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (38 commits)
net: sctp: wake up all assocs if sndbuf policy is per socket
isdnloop: several buffer overflows
netdev: remove potentially harmful checks
pktgen: fix xmit test for BQL enabled devices
net/at91_ether: avoid NULL pointer dereference
tipc: Let tipc_release() return 0
at86rf230: fix MAX_CSMA_RETRIES parameter
mac802154: fix duplicate #include headers
sxgbe: fix duplicate #include headers
net: filter: be more defensive on div/mod by X==0
netfilter: Can't fail and free after table replacement
xen-netback: Trivial format string fix
net: bcmgenet: Remove unnecessary version.h inclusion
net: smc911x: Remove unused local variable
bonding: Inactive slaves should keep inactive flag's value
netfilter: nf_tables: fix wrong format in request_module()
netfilter: nf_tables: set names cannot be larger than 15 bytes
netfilter: nf_conntrack: reserve two bytes for nf_ct_ext->len
netfilter: Add {ipt,ip6t}_osf aliases for xt_osf
netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks
...
Here are some more staging patches for 3.15-rc1. They include a
late-submission of a wireless driver that a bunch of people seem to have
the hardware for now. As it's stand-alone, it should be fine (now
passes the 0-day random build bot tests.) There are also some fixes for
the unisys drivers, as they were causing havoc on a number of different
machines. To resolve all of those issues, we just mark the driver as
BROKEN now, and we can fix it up "properly" over time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEABECAAYFAlNEO1sACgkQMUfUDdst+ymMegCgy/ZpC010cymW2ZYgDxqjEEss
3xUAoLA0Psv1z59hZv4EwrfStxzTiYAQ
=waW7
-----END PGP SIGNATURE-----
Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull more staging patches from Greg KH:
"Here are some more staging patches for 3.15-rc1.
They include a late-submission of a wireless driver that a bunch of
people seem to have the hardware for now. As it's stand-alone, it
should be fine (now passes the 0-day random build bot tests).
There are also some fixes for the unisys drivers, as they were causing
havoc on a number of different machines. To resolve all of those
issues, we just mark the driver as BROKEN now, and we can fix it up
"properly" over time"
* tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8723au: The 8723 only has two paths
Staging: unisys: mark drivers as BROKEN
Staging: unisys: verify that a control channel exists
staging: unisys: Add missing close parentheses in filexfer.c
staging: r8723au: Fix build problem when RFKILL is not selected
staging: r8723au: Fix randconfig build errors
staging: r8723au: Turn on build of new driver
staging: r8723au: Additional source patches
staging: r8723au: Add source files for new driver - part 4
staging: r8723au: Add source files for new driver - part 3
staging: r8723au: Add source files for new driver - part 2
staging: r8723au: Add source files for new driver - part 1
Pull second set of s390 patches from Martin Schwidefsky:
"The second part of Heikos uaccess rework, the page table walker for
uaccess is now a thing of the past (yay!)
The code change to fix the theoretical TLB flush problem allows us to
add a TLB flush optimization for zEC12, this machine has new
instructions that allow to do CPU local TLB flushes for single pages
and for all pages of a specific address space.
Plus the usual bug fixing and some more cleanup"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/uaccess: rework uaccess code - fix locking issues
s390/mm,tlb: optimize TLB flushing for zEC12
s390/mm,tlb: safeguard against speculative TLB creation
s390/irq: Use defines for external interruption codes
s390/irq: Add defines for external interruption codes
s390/sclp: add timeout for queued requests
kvm/s390: also set guest pages back to stable on kexec/kdump
lcs: Add missing destroy_timer_on_stack()
s390/tape: Add missing destroy_timer_on_stack()
s390/tape: Use del_timer_sync()
s390/3270: fix crash with multiple reset device requests
s390/bitops,atomic: add missing memory barriers
s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6
There are three buffer overflows addressed in this patch.
1) In isdnloop_fake_err() we add an 'E' to a 60 character string and
then copy it into a 60 character buffer. I have made the destination
buffer 64 characters and I'm changed the sprintf() to a snprintf().
2) In isdnloop_parse_cmd(), p points to a 6 characters into a 60
character buffer so we have 54 characters. The ->eazlist[] is 11
characters long. I have modified the code to return if the source
buffer is too long.
3) In isdnloop_command() the cbuf[] array was 60 characters long but the
max length of the string then can be up to 79 characters. I made the
cbuf array 80 characters long and changed the sprintf() to snprintf().
I also removed the temporary "dial" buffer and changed it to use "p"
directly.
Unfortunately, we pass the "cbuf" string from isdnloop_command() to
isdnloop_writecmd() which truncates anything over 60 characters to make
it fit in card->omsg[]. (It can accept values up to 255 characters so
long as there is a '\n' character every 60 characters). For now I have
just fixed the memory corruption bug and left the other problems in this
driver alone.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change to use devm_backlight_device_register() for simple cleanup.
Signed-off-by: Daniel Jeong <gshark.jeong@gmail.com>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
As per the comments on device_register, we shouldn't call kfree()
right after a device_register() failure. Instead call put_device(),
which in turn will call bl_device_release resulting in a kfree to the
full structure.
Signed-off-by: Levente Kurusa <levex@linux.com>
Acked-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Intel SDIO has broken card detect so add a quirk to reflect that.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Chris Ball <chris@printf.net>
Avoid updating the thermal zone in case an IRQ was triggered but the
temperature didn't effectively change.
Note this is not a driver issue.
Below is a captured debug trace illustrating the purpose of this patch:
out of 8 thermal zone updates, only 2 are actually necessary.
[ 41.120000] rcar_thermal_work(): cctemp=25000
[ 41.120000] rcar_thermal_work(): nctemp=30000
[ 41.120000] rcar_thermal_work(): temp is now 30000C, update thermal zone
[ 58.990000] rcar_thermal_work(): cctemp=30000
[ 58.990000] rcar_thermal_work(): nctemp=30000
[ 58.990000] rcar_thermal_work(): same temp, do not update thermal zone
[ 59.290000] rcar_thermal_work(): cctemp=30000
[ 59.290000] rcar_thermal_work(): nctemp=30000
[ 59.290000] rcar_thermal_work(): same temp, do not update thermal zone
[ 59.590000] rcar_thermal_work(): cctemp=30000
[ 59.590000] rcar_thermal_work(): nctemp=30000
[ 59.590000] rcar_thermal_work(): same temp, do not update thermal zone
[ 59.890000] rcar_thermal_work(): cctemp=30000
[ 59.890000] rcar_thermal_work(): nctemp=30000
[ 59.890000] rcar_thermal_work(): same temp, do not update thermal zone
[ 60.190000] rcar_thermal_work(): cctemp=30000
[ 60.190000] rcar_thermal_work(): nctemp=30000
[ 60.190000] rcar_thermal_work(): same temp, do not update thermal zone
[ 60.490000] rcar_thermal_work(): cctemp=30000
[ 60.490000] rcar_thermal_work(): nctemp=30000
[ 60.490000] rcar_thermal_work(): same temp, do not update thermal zone
[ 60.790000] rcar_thermal_work(): cctemp=30000
[ 60.790000] rcar_thermal_work(): nctemp=35000
[ 60.790000] rcar_thermal_work(): temp is now 35000C, update thermal zone
I suspect this may be due to sensor sampling accuracy / fluctuation,
but no formal proof.
Signed-off-by: Patrick Titiano <ptitiano@baylibre.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Mask is already applied preceding the if statement.
Remove the second mask.
Signed-off-by: Patrick Titiano <ptitiano@baylibre.com>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Use SIMPLE_DEV_PM_OPS macro in order to make the code simpler.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Thermal sensor used to need two calibration points which are
in fuse map to get a slope for converting thermal sensor's raw
data to real temperature in degree C. Due to the chip calibration
limitation, hardware team provides an universal formula to get
real temperature from internal thermal sensor raw data:
Slope = 0.4297157 - (0.0015976 * 25C fuse);
Update the formula, as there will be no hot point calibration
data in fuse map from now on.
Signed-off-by: Anson Huang <b20788@freescale.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Loading cursors to the LCD controller's SRAM can be corrupted when the
configured pixel clock is relatively slow. This seems to be caused
when we write back-to-back to the SRAM registers.
There doesn't appear to be any status register we can read to check
when an access has completed.
Inserting a dummy read between the writes appears to fix the problem.
Cc: <stable@vger.kernel.org> # 3.13
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The merge of irq-core-for-linus branch broke it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQEcBAABAgAGBQJTQqWqAAoJEFxbo/MsZsTRCTAH/j9JC0q5OuCPoZlvhHgrOsaI
Z8gZj9MsU5zBiGrBrp96ULHVBxe/PWSCFhPWvxykhS5d/tgpeg0l2B2n7KrsgtrE
YSepxlyGzGK5UUYZmCCiE4fjZA2GrtHtz4tET3lu5lyHWCNgHuApmGDsoIJ+PH13
wCokIJcBpVu+V7nsxSifguud1R9ZEA7gTwp1q5omSerrqNLPPYuxlwSRTDIS4pEl
5mhP5C9ej+u7Dx7h4uRpk7p2YMQy1pnPtJIq3TBH2B/l/u96b+gyXQlLhf289LSP
RwiDhurZprLZmaM8W1qrQwqsoiPY/uQQFObp0icvrarK7SfYtiot+guK3oRn1NI=
=1P/a
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.15-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen build fix from David Vrabel:
"Fix arm build of drivers/xen/events/
The merge of irq-core-for-linus branch broke it"
* tag 'stable/for-linus-3.15-tag2' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
Xen: do hv callback accounting only on x86
Merge second patch-bomb from Andrew Morton:
- the rest of MM
- zram updates
- zswap updates
- exit
- procfs
- exec
- wait
- crash dump
- lib/idr
- rapidio
- adfs, affs, bfs, ufs
- cris
- Kconfig things
- initramfs
- small amount of IPC material
- percpu enhancements
- early ioremap support
- various other misc things
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (156 commits)
MAINTAINERS: update Intel C600 SAS driver maintainers
fs/ufs: remove unused ufs_super_block_third pointer
fs/ufs: remove unused ufs_super_block_second pointer
fs/ufs: remove unused ufs_super_block_first pointer
fs/ufs/super.c: add __init to init_inodecache()
doc/kernel-parameters.txt: add early_ioremap_debug
arm64: add early_ioremap support
arm64: initialize pgprot info earlier in boot
x86: use generic early_ioremap
mm: create generic early_ioremap() support
x86/mm: sparse warning fix for early_memremap
lglock: map to spinlock when !CONFIG_SMP
percpu: add preemption checks to __this_cpu ops
vmstat: use raw_cpu_ops to avoid false positives on preemption checks
slub: use raw_cpu_inc for incrementing statistics
net: replace __this_cpu_inc in route.c with raw_cpu_inc
modules: use raw_cpu_write for initialization of per cpu refcount.
mm: use raw_cpu ops for determining current NUMA node
percpu: add raw_cpu_ops
slub: fix leak of 'name' in sysfs_slab_add
...
If the renamed symbol is defined lib/iomap.c implements ioport_map and
ioport_unmap and currently (nearly) all platforms define the port
accessor functions outb/inb and friend unconditionally. So
HAS_IOPORT_MAP is the better name for this.
Consequently NO_IOPORT is renamed to NO_IOPORT_MAP.
The motivation for this change is to reintroduce a symbol HAS_IOPORT
that signals if outb/int et al are available. I will address that at
least one merge window later though to keep surprises to a minimum and
catch new introductions of (HAS|NO)_IOPORT.
The changes in this commit were done using:
$ git grep -l -E '(NO|HAS)_IOPORT' | xargs perl -p -i -e 's/\b((?:CONFIG_)?(?:NO|HAS)_IOPORT)\b/$1_MAP/'
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"ret" is zero here so we can remove the "!ret" part of the condition.
"uhdr" is alread a __user pointer so we can remove the cast.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes an artificial RapidIO bus root device and establishes
actual device hierarchy by providing reference to real parent devices.
It also introduces device class for RapidIO controller devices (on-chip
or an eternal bridge, known as "mport").
Existing implementation was sufficient for SoC-based platforms that have
a single RapidIO controller. With introduction of devices using
multiple RapidIO controllers and PCIe-to-RapidIO bridges the old scheme
is very limiting or does not work at all. The implemented changes allow
to properly reference platform's local RapidIO mport devices and provide
device details needed for upper layers.
This change to RapidIO device hierarchy does not break any known
existing kernel or user space interfaces.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
Cc: Jerry Jacobs <jerry.jacobs@prodrive-technologies.com>
Cc: Arno Tiemersma <arno.tiemersma@prodrive-technologies.com>
Cc: Rob Landley <rob@landley.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Combine SG entries describing single contiguous memory block into one
Tsi721 BDMA descriptor. This reduces number of hardware descriptors
required for large data transfers and improves performance on the PCIe
side by reducing number of descriptor fetch requests.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zram is ram based block device and can be used by backend of filesystem.
When filesystem deletes a file, it normally doesn't do anything on data
block of that file. It just marks on metadata of that file. This
behavior has no problem on disk based block device, but has problems on
ram based block device, since we can't free memory used for data block.
To overcome this disadvantage, there is REQ_DISCARD functionality. If
block device support REQ_DISCARD and filesystem is mounted with discard
option, filesystem sends REQ_DISCARD to block device whenever some data
blocks are discarded. All we have to do is to handle this request.
This patch implements to flag up QUEUE_FLAG_DISCARD and handle this
REQ_DISCARD request. With it, we can free memory used by zram if it isn't
used.
[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysfs.txt documentation lists the following requirements:
- The buffer will always be PAGE_SIZE bytes in length. On i386, this
is 4096.
- show() methods should return the number of bytes printed into the
buffer. This is the return value of scnprintf().
- show() should always use scnprintf().
Use scnprintf() in show() functions.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we initialized zcomp with single, we couldn't change
max_comp_streams without zram reset but current interface doesn't show
any error to user and even it changes max_comp_streams's value without
any effect so it would make user very confusing.
This patch prevents max_comp_streams's change when zcomp was initialized
as single zcomp and emit the error to user(ex, echo).
[akpm@linux-foundation.org: don't return with the lock held, per Sergey]
[fengguang.wu@intel.com: fix coccinelle warnings]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of returning just NULL, return ERR_PTR from zcomp_create() if
compressing backend creation has failed. ERR_PTR(-EINVAL) for unsupported
compression algorithm request, ERR_PTR(-ENOMEM) for allocation (zcomp or
compression stream) error.
Perform IS_ERR() check of returned from zcomp_create() value in
disksize_store() and set return code to PTR_ERR().
Change suggested by Jerome Marchand.
[akpm@linux-foundation.org: clean up error recovery flow]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While fixing lockdep spew of ->init_lock reported by Sasha Levin [1],
Minchan Kim noted [2] that it's better to move compression backend
allocation (using GPF_KERNEL) out of the ->init_lock lock, same way as
with zram_meta_alloc(), in order to prevent the same lockdep spew.
[1] https://lkml.org/lkml/2014/2/27/337
[2] https://lkml.org/lkml/2014/3/3/32
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce LZ4 compression backend and make it available for selection.
LZ4 support is optional and requires user to set ZRAM_LZ4_COMPRESS config
option. The default compression backend is LZO.
TEST
(x86_64, core i5, 2 cores + 2 hyperthreading, zram disk size 1G,
ext4 file system, 3 compression streams)
iozone -t 3 -R -r 16K -s 60M -I +Z
Test LZO LZ4
----------------------------------------------
Initial write 1642744.62 1317005.09
Rewrite 2498980.88 1800645.16
Read 3957026.38 5877043.75
Re-read 3950997.38 5861847.00
Reverse Read 2937114.56 5047384.00
Stride read 2948163.19 4929587.38
Random read 3292692.69 4880793.62
Mixed workload 1545602.62 3502940.38
Random write 2448039.75 1758786.25
Pwrite 1670051.03 1338329.69
Pread 2530682.00 5097177.62
Fwrite 3232085.62 3275942.56
Fread 6306880.25 6645271.12
So on my system LZ4 is slower in write-only tests, while it performs
better in read-only and mixed (reads + writes) tests.
Official LZ4 benchmarks available here http://code.google.com/p/lz4/
(linux kernel uses revision r90).
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch allows to change max_comp_streams on initialised zcomp.
Introduce zcomp set_max_streams() knob, zcomp_strm_multi_set_max_streams()
and zcomp_strm_single_set_max_streams() callbacks to change streams limit
for zcomp_strm_multi and zcomp_strm_single, accordingly. set_max_streams
for single steam zcomp does nothing.
If user has lowered the limit, then zcomp_strm_multi_set_max_streams()
attempts to immediately free extra streams (as much as it can, depending
on idle streams availability).
Note, this patch does not allow to change stream 'policy' from single to
multi stream (or vice versa) on already initialised compression backend.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Existing zram (zcomp) implementation has only one compression stream
(buffer and algorithm private part), so in order to prevent data
corruption only one write (compress operation) can use this compression
stream, forcing all concurrent write operations to wait for stream lock
to be released. This patch changes zcomp to keep a compression streams
list of user-defined size (via sysfs device attr). Each write operation
still exclusively holds compression stream, the difference is that we
can have N write operations (depending on size of streams list)
executing in parallel. See TEST section later in commit message for
performance data.
Introduce struct zcomp_strm_multi and a set of functions to manage
zcomp_strm stream access. zcomp_strm_multi has a list of idle
zcomp_strm structs, spinlock to protect idle list and wait queue, making
it possible to perform parallel compressions.
The following set of functions added:
- zcomp_strm_multi_find()/zcomp_strm_multi_release()
find and release a compression stream, implement required locking
- zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
create and destroy zcomp_strm_multi
zcomp ->strm_find() and ->strm_release() callbacks are set during
initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release()
correspondingly.
Each time zcomp issues a zcomp_strm_multi_find() call, the following set
of operations performed:
- spin lock strm_lock
- if idle list is not empty, remove zcomp_strm from idle list, spin
unlock and return zcomp stream pointer to caller
- if idle list is empty, current adds itself to wait queue. it will be
awaken by zcomp_strm_multi_release() caller.
zcomp_strm_multi_release():
- spin lock strm_lock
- add zcomp stream to idle list
- spin unlock, wake up sleeper
Minchan Kim reported that spinlock-based locking scheme has demonstrated
a severe perfomance regression for single compression stream case,
comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16)
base spinlock mutex
==Initial write ==Initial write ==Initial write
records: 5 records: 5 records: 5
avg: 1642424.35 avg: 699610.40 avg: 1655583.71
std: 39890.95(2.43%) std: 232014.19(33.16%) std: 52293.96
max: 1690170.94 max: 1163473.45 max: 1697164.75
min: 1568669.52 min: 573429.88 min: 1553410.23
==Rewrite ==Rewrite ==Rewrite
records: 5 records: 5 records: 5
avg: 1611775.39 avg: 501406.64 avg: 1684419.11
std: 17144.58(1.06%) std: 15354.41(3.06%) std: 18367.42
max: 1641800.95 max: 531356.78 max: 1706445.84
min: 1593515.27 min: 488817.78 min: 1655335.73
When only one compression stream available, mutex with spin on owner
tends to perform much better than frequent wait_event()/wake_up(). This
is why single stream implemented as a special case with mutex locking.
Introduce and document zram device attribute max_comp_streams. This
attr shows and stores current zcomp's max number of zcomp streams
(max_strm). Extend zcomp's zcomp_create() with `max_strm' parameter.
`max_strm' limits the number of zcomp_strm structs in compression
backend's idle list (max_comp_streams).
max_comp_streams used during initialisation as follows:
-- passing to zcomp_create() max_strm equals to 1 will initialise zcomp
using single compression stream zcomp_strm_single (mutex-based locking).
-- passing to zcomp_create() max_strm greater than 1 will initialise zcomp
using multi compression stream zcomp_strm_multi (spinlock-based locking).
default max_comp_streams value is 1, meaning that zram with single stream
will be initialised.
Later patch will introduce configuration knob to change max_comp_streams
on already initialised and used zcomp.
TEST
iozone -t 3 -R -r 16K -s 60M -I +Z
test base 1 strm (mutex) 3 strm (spinlock)
-----------------------------------------------------------------------
Initial write 589286.78 583518.39 718011.05
Rewrite 604837.97 596776.38 1515125.72
Random write 584120.11 595714.58 1388850.25
Pwrite 535731.17 541117.38 739295.27
Fwrite 1418083.88 1478612.72 1484927.06
Usage example:
set max_comp_streams to 4
echo 4 > /sys/block/zram0/max_comp_streams
show current max_comp_streams (default value is 1).
cat /sys/block/zram0/max_comp_streams
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is preparation patch to add multi stream support to zcomp.
Introduce struct zcomp_strm_single and a set of functions to manage
zcomp_strm stream access. zcomp_strm_single implements single compession
stream, same way as current zcomp implementation. This moves zcomp_strm
stream control and locking from zcomp, so compressing backend zcomp is not
aware of required locking.
Single and multi streams require different locking schemes. Minchan Kim
reported that spinlock-based locking scheme (which is used in multi stream
implementation) has demonstrated a severe perfomance regression for single
compression stream case, comparing to mutex-based. see
https://lkml.org/lkml/2014/2/18/16
The following set of functions added:
- zcomp_strm_single_find()/zcomp_strm_single_release()
find and release a compression stream, implement required locking
- zcomp_strm_single_create()/zcomp_strm_single_destroy()
create and destroy zcomp_strm_single
New ->strm_find() and ->strm_release() callbacks added to zcomp, which are
set to zcomp_strm_single_find() and zcomp_strm_single_release() during
initialisation. Instead of direct locking and zcomp_strm access from
zcomp_strm_find() and zcomp_strm_release(), zcomp now calls ->strm_find()
and ->strm_release() correspondingly.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ZRAM performs direct LZO compression algorithm calls, making it the one
and only option. While LZO is generally performs well, LZ4 algorithm
tends to have a faster decompression (see http://code.google.com/p/lz4/
for full report)
Name Ratio C.speed D.speed
MB/s MB/s
LZ4 (r101) 2.084 422 1820
LZO 2.06 2.106 414 600
Thus, users who have mostly read (decompress) usage scenarious or mixed
workflow (writes with relatively high read ops number) will benefit from
using LZ4 compression backend.
Introduce compressing backend abstraction zcomp in order to support
multiple compression algorithms with the following set of operations:
.create
.destroy
.compress
.decompress
Schematically zram write() usually contains the following steps:
0) preparation (decompression of partioal IO, etc.)
1) lock buffer_lock mutex (protects meta compress buffers)
2) compress (using meta compress buffers)
3) alloc and map zs_pool object
4) copy compressed data (from meta compress buffers) to object allocated by 3)
5) free previous pool page, assign a new one
6) unlock buffer_lock mutex
As we can see, compressing buffers must remain untouched from 1) to 4),
because, otherwise, concurrent write() can overwrite data. At the same
time, zram_meta must be aware of a) specific compression algorithm memory
requirements and b) necessary locking to protect compression buffers. To
remove requirement a) new struct zcomp_strm introduced, which contains a
compress/decompress `buffer' and compression algorithm `private' part.
While struct zcomp implements zcomp_strm stream handling and locking and
removes requirement b) from zram meta. zcomp ->create() and ->destroy(),
respectively, allocate and deallocate algorithm specific zcomp_strm
`private' part.
Every zcomp has zcomp stream and mutex to protect its compression stream.
Stream usage semantics remains the same -- only one write can hold stream
lock and use its buffers. zcomp_strm_find() turns caller into exclusive
user of a stream (holding stream mutex until zram release stream), and
zcomp_strm_release() makes zcomp stream available (unlock the stream
mutex). Hence no concurrent write (compression) operations possible at
the moment.
iozone -t 3 -R -r 16K -s 60M -I +Z
test base patched
--------------------------------------------------
Initial write 597992.91 591660.58
Rewrite 609674.34 616054.97
Read 2404771.75 2452909.12
Re-read 2459216.81 2470074.44
Reverse Read 1652769.66 1589128.66
Stride read 2202441.81 2202173.31
Random read 2236311.47 2276565.31
Mixed workload 1423760.41 1709760.06
Random write 579584.08 615933.86
Pwrite 597550.02 594933.70
Pread 1703672.53 1718126.72
Fwrite 1330497.06 1461054.00
Fread 3922851.00 3957242.62
Usage examples:
comp = zcomp_create(NAME) /* NAME e.g. "lzo" */
which initialises compressing backend if requested algorithm is supported.
Compress:
zstrm = zcomp_strm_find(comp)
zcomp_compress(comp, zstrm, src, &dst_len)
[..] /* copy compressed data */
zcomp_strm_release(comp, zstrm)
Decompress:
zcomp_decompress(comp, src, src_len, dst);
Free compessing backend and its zcomp stream:
zcomp_destroy(comp)
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
allocate new `zram_meta' in disksize_store() only for uninitialised zram
device, saving a number of allocations and deallocations in case if
disksize_store() was called on currently used device. at the same time
zram_meta stack variable is not necessary, because we can set ->meta
directly. there is also no need in setting QUEUE_FLAG_NONROT queue on
every disksize_store(), set it once during device creation.
[minchan@kernel.org: handle zram->meta alloc fail case]
[minchan@kernel.org: prevent lockdep spew of init_lock]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>