android_kernel_samsung_sm8650

Author	SHA1	Message	Date
Ingo Molnar	bb65a764de	Merge branch 'mce-ripvfix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce Merge memory fault handling fix from Tony Luck. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-07-11 22:37:48 +02:00
Tony Luck	6751ed65dc	x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faults In commit dad1743e5993f1 ("x86/mce: Only restart instruction after machine check recovery if it is safe") we fixed mce_notify_process() to force a signal to the current process if it was not restartable (RIPV bit not set in MCG_STATUS). But doing it here means that the process doesn't get told the virtual address of the fault via siginfo_t->si_addr. This would prevent application level recovery from the fault. Make a new MF_MUST_KILL flag bit for memory_failure() et al. to use so that we will provide the right information with the signal. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: stable@kernel.org # 3.4+	2012-07-11 10:20:47 -07:00
David Rientjes	737b719ed6	mm, slub: ensure irqs are enabled for kmemcheck kmemcheck_alloc_shadow() requires irqs to be enabled, so wait to disable them until after its called for __GFP_WAIT allocations. This fixes a warning for such allocations: WARNING: at kernel/lockdep.c:2739 lockdep_trace_alloc+0x14e/0x1c0() Acked-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-07-10 22:43:52 +03:00
Christoph Lameter	20cea9683e	mm, sl[aou]b: Move kmem_cache_create mutex handling to common code Move the mutex handling into the common kmem_cache_create() function. Then we can also move more checks out of SLAB's kmem_cache_create() into the common code. Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-07-09 12:13:42 +03:00
Christoph Lameter	18004c5d40	mm, sl[aou]b: Use a common mutex definition Use the mutex definition from SLAB and make it the common way to take a sleeping lock. This has the effect of using a mutex instead of a rw semaphore for SLUB. SLOB gains the use of a mutex for kmem_cache_create serialization. Not needed now but SLOB may acquire some more features later (like slabinfo / sysfs support) through the expansion of the common code that will need this. Reviewed-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-07-09 12:13:41 +03:00
Christoph Lameter	97d0660915	mm, sl[aou]b: Common definition for boot state of the slab allocators All allocators have some sort of support for the bootstrap status. Setup a common definition for the boot states and make all slab allocators use that definition. Reviewed-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-07-09 12:13:35 +03:00
Christoph Lameter	039363f38b	mm, sl[aou]b: Extract common code for kmem_cache_create() Kmem_cache_create() does a variety of sanity checks but those vary depending on the allocator. Use the strictest tests and put them into a slab_common file. Make the tests conditional on CONFIG_DEBUG_VM. This patch has the effect of adding sanity checks for SLUB and SLOB under CONFIG_DEBUG_VM and removes the checks in SLAB for !CONFIG_DEBUG_VM. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-07-09 12:13:30 +03:00
Julia Lawall	068ce415be	slub: remove invalid reference to list iterator variable If list_for_each_entry, etc complete a traversal of the list, the iterator variable ends up pointing to an address at an offset from the list head, and not a meaningful structure. Thus this value should not be used after the end of the iterator. The patch replaces s->name by al->name, which is referenced nearby. This problem was found using Coccinelle (http://coccinelle.lip6.fr/). Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-07-09 12:04:34 +03:00
Andy Lutomirski	9ab4233dd0	mm: Hold a file reference in madvise_remove Otherwise the code races with munmap (causing a use-after-free of the vma) or with close (causing a use-after-free of the struct file). The bug was introduced by commit 90ed52ebe481 ("[PATCH] holepunch: fix mmap_sem i_mutex deadlock") Cc: Hugh Dickins <hugh@veritas.com> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Nick Piggin <npiggin@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-07-06 10:34:38 -07:00
Rabin Vincent	6a6dccba2f	mm: cma: don't replace lowmem pages with highmem The filesystem layer expects pages in the block device's mapping to not be in highmem (the mapping's gfp mask is set in bdget()), but CMA can currently replace lowmem pages with highmem pages, leading to crashes in filesystem code such as the one below: Unable to handle kernel NULL pointer dereference at virtual address 00000400 pgd = c0c98000 [00000400] pgd=00c91831, pte=00000000, *ppte=00000000 Internal error: Oops: 817 [#1] PREEMPT SMP ARM CPU: 0 Not tainted (3.5.0-rc5+ #80) PC is at __memzero+0x24/0x80 ... Process fsstress (pid: 323, stack limit = 0xc0cbc2f0) Backtrace: [<c010e3f0>] (ext4_getblk+0x0/0x180) from [<c010e58c>] (ext4_bread+0x1c/0x98) [<c010e570>] (ext4_bread+0x0/0x98) from [<c0117944>] (ext4_mkdir+0x160/0x3bc) r4:c15337f0 [<c01177e4>] (ext4_mkdir+0x0/0x3bc) from [<c00c29e0>] (vfs_mkdir+0x8c/0x98) [<c00c2954>] (vfs_mkdir+0x0/0x98) from [<c00c2a60>] (sys_mkdirat+0x74/0xac) r6:00000000 r5:c152eb40 r4:000001ff r3:c14b43f0 [<c00c29ec>] (sys_mkdirat+0x0/0xac) from [<c00c2ab8>] (sys_mkdir+0x20/0x24) r6:beccdcf0 r5:00074000 r4:beccdbbc [<c00c2a98>] (sys_mkdir+0x0/0x24) from [<c000e3c0>] (ret_fast_syscall+0x0/0x30) Fix this by replacing only highmem pages with highmem. Reported-by: Laura Abbott <lauraa@codeaurora.org> Signed-off-by: Rabin Vincent <rabin@rab.in> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2012-07-06 11:07:04 +02:00
Linus Torvalds	a3da2c6913	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block bits from Jens Axboe: "As vacation is coming up, thought I'd better get rid of my pending changes in my for-linus branch for this iteration. It contains: - Two patches for mtip32xx. Killing a non-compliant sysfs interface and moving it to debugfs, where it belongs. - A few patches from Asias. Two legit bug fixes, and one killing an interface that is no longer in use. - A patch from Jan, making the annoying partition ioctl warning a bit less annoying, by restricting it to !CAP_SYS_RAWIO only. - Three bug fixes for drbd from Lars Ellenberg. - A fix for an old regression for umem, it hasn't really worked since the plugging scheme was changed in 3.0. - A few fixes from Tejun. - A splice fix from Eric Dumazet, fixing an issue with pipe resizing." * 'for-linus' of git://git.kernel.dk/linux-block: scsi: Silence unnecessary warnings about ioctl to partition block: Drop dead function blk_abort_queue() block: Mitigate lock unbalance caused by lock switching block: Avoid missed wakeup in request waitqueue umem: fix up unplugging splice: fix racy pipe->buffers uses drbd: fix null pointer dereference with on-congestion policy when diskless drbd: fix list corruption by failing but already aborted reads drbd: fix access of unallocated pages and kernel panic xen/blkfront: Add WARN to deal with misbehaving backends. blkcg: drop local variable @q from blkg_destroy() mtip32xx: Create debugfs entries for troubleshooting mtip32xx: Remove 'registers' and 'flags' from sysfs blkcg: fix blkg_alloc() failure path block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED block: fix return value on cfq_init() failure mtip32xx: Remove version.h header file inclusion xen/blkback: Copy id field when doing BLKIF_DISCARD.	2012-07-03 15:45:10 -07:00
Glauber Costa	a164f89628	slab: move FULL state transition to an initcall During kmem_cache_init_late(), we transition to the LATE state, and after some more work, to the FULL state, its last state This is quite different from slub, that will only transition to its last state (previously SYSFS), in a (late)initcall, after a lot more of the kernel is ready. This means that in slab, we have no way to taking actions dependent on the initialization of other pieces of the kernel that are supposed to start way after kmem_init_late(), such as cgroups initialization. To achieve more consistency in this behavior, that patch only transitions to the UP state in kmem_init_late. In my analysis, setup_cpu_cache() should be happy to test for >= UP, instead of == FULL. It also has passed some tests I've made. We then only mark FULL state after the reap timers are in place, meaning that no further setup is expected. Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-07-02 13:56:59 +03:00
Feng Tang	d97d476b1b	slab: Fix a typo in commit 8c138b "slab: Get rid of obj_size macro" Commit 8c138b only sits in Pekka's and linux-next tree now, which tries to replace obj_size(cachep) with cachep->object_size, but has a typo in kmem_cache_free() by using "size" instead of "object_size", which casues some regressions. Reported-and-tested-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Christoph Lameter <cl@linux.com> Acked-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-07-02 13:45:52 +03:00
Thierry Reding	0672aa7c23	mm, slab: Build fix for recent kmem_cache changes Commit 3b0efdf ("mm, sl[aou]b: Extract common fields from struct kmem_cache") renamed the kmem_cache structure's "next" field to "list" but forgot to update one instance in leaks_show(). Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-07-02 13:42:18 +03:00
Glauber Costa	a618e89f1e	slab: rename gfpflags to allocflags A consistent name with slub saves us an acessor function. In both caches, this field represents the same thing. We would like to use it from the mem_cgroup code. Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> CC: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-07-02 13:40:06 +03:00
Jiri Kosina	59f91e5dd0	Merge branch 'master' into for-next Conflicts: include/linux/mmzone.h Synced with Linus' tree so that trivial patch can be applied on top of up-to-date code properly. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>	2012-06-29 14:45:58 +02:00
Wanpeng Li	ab8704b8c6	mm/vmscan: cleanup comment error in balance_pgdat Signed-off-by: Wanpeng Li <liwp.linux@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-06-28 11:58:48 +02:00
Wanpeng Li	be7bd59db7	mm: fix page reclaim comment error Since there are five lists in LRU cache, the array nr in get_scan_count should be: nr[0] = anon inactive pages to scan; nr[1] = anon active pages to scan nr[2] = file inactive pages to scan; nr[3] = file active pages to scan Signed-off-by: Wanpeng Li <liwp.linux@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-06-28 11:54:12 +02:00
Alex Shi	597e1c3580	mm/mmu_gather: enable tlb flush range in generic mmu_gather This patch enabled the tlb flush range support in generic mmu layer. Most of arch has self tlb flush range support, like ARM/IA64 etc. X86 arch has no this support in hardware yet. But another instruction 'invlpg' can implement this function in some degree. So, enable this feather in generic layer for x86 now. and maybe useful for other archs in further. Generic mmu_gather struct is protected by micro HAVE_GENERIC_MMU_GATHER. Other archs that has flush range supported own self mmu_gather struct. So, now this change is safe for them. In future we may unify this struct and related functions on multiple archs. Thanks for Peter Zijlstra time and time reminder for multiple architecture code safe! Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-7-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:11 -07:00
Tejun Heo	a91a5ac685	mempool: add @gfp_mask to mempool_create_node() mempool_create_node() currently assumes %GFP_KERNEL. Its only user, blk_init_free_list(), is about to be updated to use other allocation flags - add @gfp_mask argument to the function. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-06-25 11:53:47 +02:00
David Rientjes	c4c0e9e544	mm, mempolicy: fix mbind() to do synchronous migration If the range passed to mbind() is not allocated on nodes set in the nodemask, it migrates the pages to respect the constraint. The final formal of migrate_pages() is a mode of type enum migrate_mode, not a boolean. do_mbind() is currently passing "true" which is the equivalent of MIGRATE_SYNC_LIGHT. This should instead be MIGRATE_SYNC for synchronous page migration. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-20 22:10:42 -07:00
Greg Pearson	48c3b583bb	mm/memblock: fix overlapping allocation when doubling reserved array __alloc_memory_core_early() asks memblock for a range of memory then try to reserve it. If the reserved region array lacks space for the new range, memblock_double_array() is called to allocate more space for the array. If memblock is used to allocate memory for the new array it can end up using a range that overlaps with the range originally allocated in __alloc_memory_core_early(), leading to possible data corruption. With this patch memblock_double_array() now calls memblock_find_in_range() with a narrowed candidate range (in cases where the reserved.regions array is being doubled) so any memory allocated will not overlap with the original range that was being reserved. The range is narrowed by passing in the starting address and size of the previously allocated range. Then the range above the ending address is searched and if a candidate is not found, the range below the starting address is searched. Signed-off-by: Greg Pearson <greg.pearson@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-20 14:39:36 -07:00
Randy Dunlap	eb4546bbbd	mm/memory.c: fix kernel-doc warnings Fix kernel-doc warnings in mm/memory.c: Warning(mm/memory.c:1377): No description found for parameter 'start' Warning(mm/memory.c:1377): Excess function parameter 'address' description in 'zap_page_range' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-20 14:39:36 -07:00
Wanpeng Li	dad7557eb7	mm: fix kernel-doc warnings Fix kernel-doc warnings such as Warning(../mm/page_cgroup.c:432): No description found for parameter 'id' Warning(../mm/page_cgroup.c:432): Excess function parameter 'mem' description in 'swap_cgroup_record' Signed-off-by: Wanpeng Li <liwp@linux.vnet.ibm.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-20 14:39:36 -07:00
David Rientjes	e0897d75f0	mm, thp: print useful information when mmap_sem is unlocked in zap_pmd_range Andrea asked for addr, end, vma->vm_start, and vma->vm_end to be emitted when !rwsem_is_locked(&tlb->mm->mmap_sem). Otherwise, debugging the underlying issue is more difficult. Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-20 14:39:35 -07:00
Hugh Dickins	3a981f482c	memcg: fix use_hierarchy css_is_ancestor oops regression If use_hierarchy is set, reclaim testing soon oopses in css_is_ancestor() called from __mem_cgroup_same_or_subtree() called from page_referenced(): when processes are exiting, it's easy for mm_match_cgroup() to pass along a NULL memcg coming from a NULL mm->owner. Check for that in __mem_cgroup_same_or_subtree(). Return true or false? False because we cannot know if it was in the hierarchy, but also false because it's better not to count a reference from an exiting process. Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-20 14:39:35 -07:00
David Rientjes	61eafb00d5	mm, oom: fix and cleanup oom score calculations The divide in p->signal->oom_score_adj * totalpages / 1000 within oom_badness() was causing an overflow of the signed long data type. This adds both the root bias and p->signal->oom_score_adj before doing the normalization which fixes the issue and also cleans up the calculation. Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-20 14:39:35 -07:00
Joonsoo Kim	43d77867a4	slub: refactoring unfreeze_partials() Current implementation of unfreeze_partials() is so complicated, but benefit from it is insignificant. In addition many code in do {} while loop have a bad influence to a fail rate of cmpxchg_double_slab. Under current implementation which test status of cpu partial slab and acquire list_lock in do {} while loop, we don't need to acquire a list_lock and gain a little benefit when front of the cpu partial slab is to be discarded, but this is a rare case. In case that add_partial is performed and cmpxchg_double_slab is failed, remove_partial should be called case by case. I think that these are disadvantages of current implementation, so I do refactoring unfreeze_partials(). Minimizing code in do {} while loop introduce a reduced fail rate of cmpxchg_double_slab. Below is output of 'slabinfo -r kmalloc-256' when './perf stat -r 33 hackbench 50 process 4000 > /dev/null' is done. before Cmpxchg_double Looping ------------------------ Locked Cmpxchg Double redos 182685 Unlocked Cmpxchg Double redos 0 after Cmpxchg_double Looping ------------------------ Locked Cmpxchg Double redos 177995 Unlocked Cmpxchg Double redos 1 We can see cmpxchg_double_slab fail rate is improved slightly. Bolow is output of './perf stat -r 30 hackbench 50 process 4000 > /dev/null'. before Performance counter stats for './hackbench 50 process 4000' (30 runs): 108517.190463 task-clock # 7.926 CPUs utilized ( +- 0.24% ) 2,919,550 context-switches # 0.027 M/sec ( +- 3.07% ) 100,774 CPU-migrations # 0.929 K/sec ( +- 4.72% ) 124,201 page-faults # 0.001 M/sec ( +- 0.15% ) 401,500,234,387 cycles # 3.700 GHz ( +- 0.24% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 250,576,913,354 instructions # 0.62 insns per cycle ( +- 0.13% ) 45,934,956,860 branches # 423.297 M/sec ( +- 0.14% ) 188,219,787 branch-misses # 0.41% of all branches ( +- 0.56% ) 13.691837307 seconds time elapsed ( +- 0.24% ) after Performance counter stats for './hackbench 50 process 4000' (30 runs): 107784.479767 task-clock # 7.928 CPUs utilized ( +- 0.22% ) 2,834,781 context-switches # 0.026 M/sec ( +- 2.33% ) 93,083 CPU-migrations # 0.864 K/sec ( +- 3.45% ) 123,967 page-faults # 0.001 M/sec ( +- 0.15% ) 398,781,421,836 cycles # 3.700 GHz ( +- 0.22% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 250,189,160,419 instructions # 0.63 insns per cycle ( +- 0.09% ) 45,855,370,128 branches # 425.436 M/sec ( +- 0.10% ) 169,881,248 branch-misses # 0.37% of all branches ( +- 0.43% ) 13.596272341 seconds time elapsed ( +- 0.22% ) No regression is found, but rather we can see slightly better result. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-06-20 10:17:45 +03:00
Joonsoo Kim	d24ac77f71	slub: use __cmpxchg_double_slab() at interrupt disabled place get_freelist(), unfreeze_partials() are only called with interrupt disabled, so __cmpxchg_double_slab() is suitable. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-06-20 10:13:01 +03:00
Andi Kleen	e7b691b085	slab/mempolicy: always use local policy from interrupt context slab_node() could access current->mempolicy from interrupt context. However there's a race condition during exit where the mempolicy is first freed and then the pointer zeroed. Using this from interrupts seems bogus anyways. The interrupt will interrupt a random process and therefore get a random mempolicy. Many times, this will be idle's, which noone can change. Just disable this here and always use local for slab from interrupts. I also cleaned up the callers of slab_node a bit which always passed the same argument. I believe the original mempolicy code did that in fact, so it's likely a regression. v2: send version with correct logic v3: simplify. fix typo. Reported-by: Arun Sharma <asharma@fb.com> Cc: penberg@kernel.org Cc: cl@linux.com Signed-off-by: Andi Kleen <ak@linux.intel.com> [tdmackey@twitter.com: Rework control flow based on feedback from cl@linux.com, fix logic, and cleanup current task_struct reference] Acked-by: David Rientjes <rientjes@google.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David Mackey <tdmackey@twitter.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-06-20 10:01:04 +03:00
Hugh Dickins	9b15b817f3	swap: fix shmem swapping when more than 8 areas Minchan Kim reports that when a system has many swap areas, and tmpfs swaps out to the ninth or more, shmem_getpage_gfp()'s attempts to read back the page cannot locate it, and the read fails with -ENOMEM. Whoops. Yes, I blindly followed read_swap_header()'s pte_to_swp_entry( swp_entry_to_pte()) technique for determining maximum usable swap offset, without stopping to realize that that actually depends upon the pte swap encoding shifting swap offset to the higher bits and truncating it there. Whereas our radix_tree swap encoding leaves offset in the lower bits: it's swap "type" (that is, index of swap area) that was truncated. Fix it by reducing the SWP_TYPE_SHIFT() in swapops.h, and removing the broken radix_to_swp_entry(swp_to_radix_entry()) from read_swap_header(). This does not reduce the usable size of a swap area any further, it leaves it as claimed when making the original commit: no change from 3.0 on x86_64, nor on i386 without PAE; but 3.0's 512GB is reduced to 128GB per swapfile on i386 with PAE. It's not a change I would have risked five years ago, but with x86_64 supported for ten years, I believe it's appropriate now. Hmm, and what if some architecture implements its swap pte with offset encoded below type? That would equally break the maximum usable swap offset check. Happily, they all follow the same tradition of encoding offset above type, but I'll prepare a check on that for next. Reported-and-Reviewed-and-Tested-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org [3.1, 3.2, 3.3, 3.4] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-15 21:48:14 -07:00
Linus Torvalds	a95f9b6e09	Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core updates (RCU and locking) from Ingo Molnar: "Most of the diffstat comes from the RCU slow boot regression fixes, but there's also a debuggability improvements/fixes." * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: memblock: Document memblock_is_region_{memory,reserved}() rcu: Precompute RCU_FAST_NO_HZ timer offsets rcu: Move RCU_FAST_NO_HZ per-CPU variables to rcu_dynticks structure rcu: Update RCU_FAST_NO_HZ tracing for lazy callbacks rcu: RCU_FAST_NO_HZ detection of callback adoption spinlock: Indicate that a lockup is only suspected kdump: Execute kmsg_dump(KMSG_DUMP_PANIC) after smp_send_stop() panic: Make panic_on_oops configurable	2012-06-15 16:52:35 -07:00
Christoph Lameter	8c138bc009	slab: Get rid of obj_size macro The size of the slab object is frequently needed. Since we now have a size field directly in the kmem_cache structure there is no need anymore of the obj_size macro/function. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-06-14 09:20:19 +03:00
Christoph Lameter	3b0efdfa1e	mm, sl[aou]b: Extract common fields from struct kmem_cache Define a struct that describes common fields used in all slab allocators. A slab allocator either uses the common definition (like SLOB) or is required to provide members of kmem_cache with the definition given. After that it will be possible to share code that only operates on those fields of kmem_cache. The patch basically takes the slob definition of kmem cache and uses the field namees for the other allocators. It also standardizes the names used for basic object lengths in allocators: object_size Struct size specified at kmem_cache_create. Basically the payload expected to be used by the subsystem. size The size of memory allocator for each object. This size is larger than object_size and includes padding, alignment and extra metadata for each object (f.e. for debugging and rcu). Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-06-14 09:20:16 +03:00
Christoph Lameter	350260889b	slab: Remove some accessors Those are rather trivial now and its better to see inline what is really going on. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-06-14 09:20:05 +03:00
Christoph Lameter	e571b0ad34	slab: Use page struct fields instead of casting Add fields to the page struct so that it is properly documented that slab overlays the lru fields. This cleans up some casts in slab. Reviewed-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-06-14 09:19:56 +03:00
Christoph Lameter	b5568280c9	slob: Remove various small accessors Those have become so simple that they are no longer needed. Reviewed-by: Joonsoo Kim <js1304@gmail.com> Acked-by: David Rientjes <rientjes@google.com> signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-06-14 09:19:52 +03:00
Christoph Lameter	690d577739	slob: No need to zero mapping since it is no longer in use Reviewed-by: Joonsoo Kim <js1304@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-06-14 09:19:50 +03:00
Christoph Lameter	b8c24c4aef	slob: Define page struct fields used in mm_types.h Define the fields used by slob in mm_types.h and use struct page instead of struct slob_page in slob. This cleans up numerous of typecasts in slob.c and makes readers aware of slob's use of page struct fields. [Also cleans up some bitrot in slob.c. The page struct field layout in slob.c is an old layout and does not match the one in mm_types.h] Reviewed-by: Glauber Costa <glommer@parallels.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Joonsoo Kim <js1304@gmail.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>	2012-06-14 09:19:47 +03:00
Eric Dumazet	047fe36052	splice: fix racy pipe->buffers uses Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered by splice_shrink_spd() called from vmsplice_to_pipe() commit 35f3d14dbbc5 (pipe: add support for shrinking and growing pipes) added capability to adjust pipe->buffers. Problem is some paths don't hold pipe mutex and assume pipe->buffers doesn't change for their duration. Fix this by adding nr_pages_max field in struct splice_pipe_desc, and use it in place of pipe->buffers where appropriate. splice_shrink_spd() loses its struct pipe_inode_info argument. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Tom Herbert <therbert@google.com> Cc: stable <stable@vger.kernel.org> # 2.6.35 Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2012-06-13 21:16:42 +02:00
Sasha Levin	f9f08103eb	mm: frontswap: remove unnecessary check during initialization The check whether frontswap is enabled or not is done in the API functions in the frontswap header, before they are passed to the internal double-underscored frontswap functions. Remove the check from __frontswap_init for consistency. Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-06-11 15:32:08 -04:00
Sasha Levin	d9674dda1c	mm: frontswap: make all branches of if statement in put page consistent Currently it has a complex structure where different things are compared at each branch. Simplify that and make both branches look similar. Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-06-11 15:31:55 -04:00
Sasha Levin	69217b4cd0	mm: frontswap: split frontswap_shrink further to simplify locking Split frontswap_shrink to simplify the locking in the original code. Also, assert that the function that was split still runs under the swap spinlock. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-06-11 15:31:33 -04:00
Sasha Levin	f116695a50	mm: frontswap: split out __frontswap_unuse_pages An attempt at making frontswap_shrink shorter and more readable. This patch splits out walking through the swap list to find an entry with enough pages to unuse. Also, assert that the internal __frontswap_unuse_pages is called under swap lock, since that part of code was previously directly happen inside the lock. Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-06-11 15:30:57 -04:00
Sasha Levin	96253444db	mm: frontswap: split out __frontswap_curr_pages Code was duplicated in two functions, clean it up. Also, assert that the deduplicated code runs under the swap spinlock. Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-06-11 15:30:47 -04:00
Sasha Levin	4bb3e31ef4	mm: frontswap: trivial coding convention issues Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-06-11 15:30:38 -04:00
Sasha Levin	ef38359741	mm: frontswap: remove casting from function calls through ops structure Removes unneeded casts. Reviewed-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2012-06-11 15:30:33 -04:00
Wanpeng Li	331cbdeede	writeback: Fix some comment errors Signed-off-by: Wanpeng Li <liwp@linux.vnet.ibm.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>	2012-06-09 19:54:47 +08:00
Jan Kara	eb608e3a34	block: Convert BDI proportion calculations to flexible proportions Convert calculations of proportion of writeback each bdi does to new flexible proportion code. That allows us to use aging period of fixed wallclock time which gives better proportion estimates given the hugely varying throughput of different devices. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>	2012-06-09 08:37:56 +09:00
David Rientjes	1e11ad8dc4	mm, oom: fix badness score underflow If the privileges given to root threads (3% of allowable memory) or a negative value of /proc/pid/oom_score_adj happen to exceed the amount of rss of a thread, its badness score overflows as a result of commit a7f638f999ff ("mm, oom: normalize oom scores to oom_score_adj scale only for userspace"). Fix this by making the type signed and return 1, meaning the thread is still eligible for kill, if the value is negative. Reported-by: Dave Jones <davej@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-06-08 15:07:35 -07:00

... 2 3 4 5 6 ...

6254 Commits