Pull block driver updates from Jens Axboe:
"This is the block driver pull request for this merge window. It sits
on top of for-4.6/core, that was just sent out.
This contains:
- A set of fixes for lightnvm. One from Alan, fixing an overflow,
and the rest from the usual suspects, Javier and Matias.
- A set of fixes for nbd from Markus and Dan, and a fixup from Arnd
for correct usage of the signed 64-bit divider.
- A set of bug fixes for the Micron mtip32xx, from Asai.
- A fix for the brd discard handling from Bart.
- Update the maintainers entry for cciss, since that hardware has
transferred ownership.
- Three bug fixes for bcache from Eric Wheeler.
- Set of fixes for xen-blk{back,front} from Jan and Konrad.
- Removal of the cpqarray driver. It has been disabled in Kconfig
since 2013, and we were initially scheduled to remove it in 3.15.
- Various updates and fixes for NVMe, with the most important being:
- Removal of the per-device NVMe thread, replacing that with a
watchdog timer instead. From Christoph.
- Exposing the namespace WWID through sysfs, from Keith.
- Set of cleanups from Ming Lin.
- Logging the controller device name instead of the underlying
PCI device name, from Sagi.
- And a bunch of fixes and optimizations from the usual suspects
in this area"
* 'for-4.6/drivers' of git://git.kernel.dk/linux-block: (49 commits)
NVMe: Expose ns wwid through single sysfs entry
drivers:block: cpqarray clean up
brd: Fix discard request processing
cpqarray: remove it from the kernel
cciss: update MAINTAINERS
NVMe: Remove unused sq_head read in completion path
bcache: fix cache_set_flush() NULL pointer dereference on OOM
bcache: cleaned up error handling around register_cache()
bcache: fix race of writeback thread starting before complete initialization
NVMe: Create discard zero quirk white list
nbd: use correct div_s64 helper
mtip32xx: remove unneeded variable in mtip_cmd_timeout()
lightnvm: generalize rrpc ppa calculations
lightnvm: remove struct nvm_dev->total_blocks
lightnvm: rename ->nr_pages to ->nr_sects
lightnvm: update closed list outside of intr context
xen/blback: Fit the important information of the thread in 17 characters
lightnvm: fold get bb tbl when using dual/quad plane mode
lightnvm: fix up nonsensical configure overrun checking
xen-blkback: advertise indirect segment support earlier
...
Pull crypto update from Herbert Xu:
"Here is the crypto update for 4.6:
API:
- Convert remaining crypto_hash users to shash or ahash, also convert
blkcipher/ablkcipher users to skcipher.
- Remove crypto_hash interface.
- Remove crypto_pcomp interface.
- Add crypto engine for async cipher drivers.
- Add akcipher documentation.
- Add skcipher documentation.
Algorithms:
- Rename crypto/crc32 to avoid name clash with lib/crc32.
- Fix bug in keywrap where we zero the wrong pointer.
Drivers:
- Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver.
- Add PIC32 hwrng driver.
- Support BCM6368 in bcm63xx hwrng driver.
- Pack structs for 32-bit compat users in qat.
- Use crypto engine in omap-aes.
- Add support for sama5d2x SoCs in atmel-sha.
- Make atmel-sha available again.
- Make sahara hashing available again.
- Make ccp hashing available again.
- Make sha1-mb available again.
- Add support for multiple devices in ccp.
- Improve DMA performance in caam.
- Add hashing support to rockchip"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
crypto: qat - remove redundant arbiter configuration
crypto: ux500 - fix checks of error code returned by devm_ioremap_resource()
crypto: atmel - fix checks of error code returned by devm_ioremap_resource()
crypto: qat - Change the definition of icp_qat_uof_regtype
hwrng: exynos - use __maybe_unused to hide pm functions
crypto: ccp - Add abstraction for device-specific calls
crypto: ccp - CCP versioning support
crypto: ccp - Support for multiple CCPs
crypto: ccp - Remove check for x86 family and model
crypto: ccp - memset request context to zero during import
lib/mpi: use "static inline" instead of "extern inline"
lib/mpi: avoid assembler warning
hwrng: bcm63xx - fix non device tree compatibility
crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode.
crypto: qat - The AE id should be less than the maximal AE number
lib/mpi: Endianness fix
crypto: rockchip - add hash support for crypto engine in rk3288
crypto: xts - fix compile errors
crypto: doc - add skcipher API documentation
crypto: doc - update AEAD AD handling
...
An "old" (.request_fn) DM 'struct request' stores a pointer to the
associated 'struct dm_rq_target_io' in rq->special.
dm_requeue_original_request(), previously named
dm_requeue_unmapped_original_request(), called dm_unprep_request() to
reset rq->special to NULL. But rq_end_stats() would go on to hit a NULL
pointer deference because its call to tio_from_request() returned NULL.
Fix this by calling rq_end_stats() _before_ dm_unprep_request()
Signed-off-by: Bryn M. Reeves <bmr@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: e262f34741 ("dm stats: add support for request-based DM devices")
Cc: stable@vger.kernel.org # 4.2+
Commit 0a927c2f02 ("dm thin: return -ENOSPC when erroring retry list due
to out of data space") was a step in the right direction but didn't go
far enough.
Add a new 'out_of_data_space' flag to 'struct pool' and set it if/when
the pool runs of of data space. This fixes cell_error() and
error_retry_list() to not blindly return -EIO.
We cannot rely on the 'error_if_no_space' feature flag since it is
transient (in that it can be reset once space is added, plus it only
controls whether errors are issued, it doesn't reflect whether the
pool is actually out of space).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Otherwise operations may be attempted that will only ever go on to crash
(since the metadata device is either missing or unreliable if 'fail_io'
is set).
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
clone_bio() now checks if bio_integrity_clone() returned an error rather
than just drop it on the floor.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If a transaction abort has failed then we can no longer use the metadata
device. Typically this happens if the superblock is unreadable.
This fix addresses a crash seen during metadata device failure testing.
Fixes: 8a01a6af75 ("dm thin: prefetch missing metadata pages")
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
smq seems to be performing better than the old mq policy in all
situations, as well as using a quarter of the memory.
Make 'mq' an alias for 'smq' when choosing a cache policy. The tunables
that were present for the old mq are faked, and have no effect. mq
should be considered deprecated now.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
md->queue and q are the same thing in dm_old_init_request_queue() and
dm_mq_init_request_queue().
Also drop the temporary 'struct request_queue *q' in
dm_old_init_request_queue().
Signed-off-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Saves 16 bytes by eliminating 4 4byte holes but more importantly:
numerous members that crossed cachelines were fixed.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Change the map pointer in 'struct mapped_device' from 'struct dm_table
__rcu *' to 'void __rcu *' to avoid the need for the dummy definition.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Allows user to control which NUMA node the memory for DM device
structures (e.g. mapped_device, request_queue, gendisk, blk_mq_tag_set)
is allocated from.
Defaults to NUMA_NO_NODE (-1). Allowable range is from -1 until the
last online NUMA node id.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
fail_path() will print a "Failing path ..." message but reinstate_path()
doesn't print a "Reinstating path ...". Add that message to
reinstate_path() to add symmetry and aid system debugging.
Remove reinstate_path()'s check for the path_selector providing
.reinstate_path hook. All path selectors provide this and any future
ones must too.
activate_path() calls pg_init_done() with SCSI_DH_DEV_OFFLINED but
pg_init_done() doesn't expicitly handle it in its swicth statement. Add
SCSI_DH_DEV_OFFLINED to the default case.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
When bch_cache_set_alloc() fails to kzalloc the cache_set, the
asyncronous closure handling tries to dereference a cache_set that
hadn't yet been allocated inside of cache_set_flush() which is called
by __cache_set_unregister() during cleanup. This appears to happen only
during an OOM condition on bcache_register.
Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Cc: stable@vger.kernel.org
The bch_writeback_thread might BUG_ON in read_dirty() if
dc->sb==BDEV_STATE_DIRTY and bch_sectors_dirty_init has not yet completed
its related initialization. This patch downs the dc->writeback_lock until
after initialization is complete, thus preventing bch_writeback_thread
from proceeding prematurely.
See this thread:
http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3453
Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Tested-by: Marc MERLIN <marc@merlins.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Now that dm-mpath core is lockless in the per-IO fast path it is
critical, for performance, to have the .select_path hook
(rr_select_path) also be as lockless as possible.
The new percpu members of 'struct selector' allow for lockless support
of 'repeat_count' governed repeat use of a previously selected path. If
a path fails while it is 'current_path' the worst case is concurrent IO
might be mapped to the failed path until the .fail_path hook
(rr_fail_path) is called.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
If a path selector has any use for a repeat_count it should be handled
locally and not depend on the dm-mpath core to be concerned with it.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Proper locking of the lists used by the path selectors should be handled
within the selectors (relying on dm-mpath.c code's use of the m->lock
spinlock was reckless).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Preparation for making __multipath_map() avoid taking the m->lock
spinlock -- in favor of using RCU locking.
repeat_count was primarily for bio-based DM multipath's benefit. There
is really no need for it anymore now that DM multipath is request-based.
As such, repeat_count > 1 is no longer honored and a warning is
displayed if the user attempts to use a value > 1. This is a temporary
change for the round-robin path-selector (as a later commit will restore
its support for repeat_count > 1).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
There isn't any need to support both old .request_fn and blk-mq paths
in the blk-mq specific portion of __multipath_map(). Call
blk_mq_alloc_request() directly rather than use blk_get_request().
Similarly, call blk_mq_free_request(), rather than blk_put_request(), in
multipath_release_clone().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Allow the multipath target to avoid making small allocations for each
'struct dm_mpath_io' that is needed for each request.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This will allow DM multipath to use a portion of the blk-mq pdu space
for target data (e.g. struct dm_mpath_io).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Rename various methods to have either a "dm_old" or "dm_mq" prefix.
Improve code comments to assist with understanding the duality of code
that handles both "dm_old" and "dm_mq" cases.
It is no much easier to quickly look at the code and _know_ that a given
method is either 1) "dm_old" only 2) "dm_mq" only 3) common to both.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Remove all fiddley code that propped up this support for a blk-mq
request-queue ontop of all .request_fn devices.
Testing has proven this niche request-based dm-mq mode to be buggy, when
testing fault tolerance with DM multipath, and there is no point trying
to preserve it.
Should help improve efficiency of pure dm-mq code and make code
maintenance less delicate.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
old_stop_queue() was checking blk_queue_stopped() without holding the
q->queue_lock.
dm_requeue_original_request() needed to check blk_queue_stopped(), with
q->queue_lock held, before calling blk_mq_kick_requeue_list(). And a
side-effect of that change is start_queue() must also call
blk_mq_kick_requeue_list().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The blk_mq_tag_set is only needed for dm-mq support. There is point
wasting space in 'struct mapped_device' for non-dm-mq devices.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> # check kzalloc return
Allow user to change these values via module params or sysfs.
'dm_mq_nr_hw_queues' defaults to 1 (max 32).
'dm_mq_queue_depth' defaults to 2048 (up from 64, which proved far too
small under moderate sized workloads -- the dm-multipath device would
continuously block waiting for tags (requests) to become available).
The maximum is BLK_MQ_MAX_DEPTH (currently 10240).
Keep in mind the total number of pre-allocated requests per
request-based dm-mq device is 'dm_mq_nr_hw_queues' * 'dm_mq_queue_depth'
(currently 2048).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
DM multipath is the only request-based DM target -- which only supports
tables with a single target that is immutable. Leverage this fact in
dm_request_fn().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
DM multipath is the only dm-mq target. But that aside, request-based DM
only supports tables with a single target that is immutable. Leverage
this fact in dm_mq_queue_rq() by using the 'immutable_target' stored in
the mapped_device when the table was made active. This saves the need
to even take the read-side of the SRCU via dm_{get,put}_live_table.
If the active DM table does not have an immutable target (e.g. "error"
target was swapped in) then fallback to the slow-path where the target
is looked up from the live table.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The DM_TARGET_WILDCARD feature indicates that the "error" target may
replace any target; even immutable targets. This feature will be useful
to preserve the ability to replace the "multipath" target even once it
is formally converted over to having the DM_TARGET_IMMUTABLE feature.
Also, implicit in the DM_TARGET_WILDCARD feature flag being set is that
.map, .map_rq, .clone_and_map_rq and .release_clone_rq are all defined
in the target_type.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The request-based DM support for checking queue congestion doesn't
require access to the live DM table.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
than if an underlying null_blk device were used directly. One of the
reasons for this drop in performance is that blk_insert_clone_request()
was calling blk_mq_insert_request() with @async=true. This forced the
use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
which ushered in ping-ponging between process context (fio in this case)
and kblockd's kworker to submit the cloned request. The ftrace
function_graph tracer showed:
kworker-2013 => fio-12190
fio-12190 => kworker-2013
...
kworker-2013 => fio-12190
fio-12190 => kworker-2013
...
Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
_not_ use kblockd to submit the cloned requests isn't enough to
eliminate the observed context switches.
In addition to this dm-mq specific blk-core fix, there are 2 DM core
fixes to dm-mq that (when paired with the blk-core fix) completely
eliminate the observed context switching:
1) don't blk_mq_run_hw_queues in blk-mq request completion
Motivated by desire to reduce overhead of dm-mq, punting to kblockd
just increases context switches.
In my testing against a really fast null_blk device there was no benefit
to running blk_mq_run_hw_queues() on completion (and no other blk-mq
driver does this). So hopefully this change doesn't induce the need for
yet another revert like commit 621739b00e !
2) use blk_mq_complete_request() in dm_complete_request()
blk_complete_request() doesn't offer the traditional q->mq_ops vs
.request_fn branching pattern that other historic block interfaces
do (e.g. blk_get_request). Using blk_mq_complete_request() for
blk-mq requests is important for performance. It should be noted
that, like blk_complete_request(), blk_mq_complete_request() doesn't
natively handle partial completions -- but the request-based
DM-multipath target does provide the required partial completion
support by dm.c:end_clone_bio() triggering requeueing of the request
via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.
dm-mq fix#2 is _much_ more important than #1 for eliminating the
context switches.
Before: cpu : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
After: cpu : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472
With these changes multithreaded async read IOPs improved from ~950K
to ~1350K for this dm-mq stacked on null_blk test-case. The raw read
IOPs of the underlying null_blk device for the same workload is ~1950K.
Fixes: 7fb4898e0 ("block: add blk-mq support to blk_insert_cloned_request()")
Fixes: bfebd1cdb ("dm: add full blk-mq support to request-based DM")
Cc: stable@vger.kernel.org # 4.1+
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Rename dm_get_live_table_for_ioctl to dm_grab_bdev_for_ioctl and have it
do the dm_{get,put}_live_table() rather than split those operations.
The dm_grab_bdev_for_ioctl() callers only care about the block_device
associated with a singleton DM device so there isn't any need to retain
a reference to the live DM table. It is sufficient to:
1) dm_get_live_table()
2) bdgrab() the bdev associated with the singleton table's target
3) dm_put_live_table()
4) bdput() the bdev
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
None of the callers actually used the returned target.
Also, just reuse bdev pointer passed to dm_blk_ioctl().
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
This patch replaces uses of ablkcipher with skcipher, and the long
obsolete hash interface with ahash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>