While mm->start_stack was protected from cross-uid viewing (commit
f83ce3e6b02d5 ("proc: avoid information leaks to non-privileged
processes")), the start_code and end_code values were not. This would
allow the text location of a PIE binary to leak, defeating ASLR.
Note that the value "1" is used instead of "0" for a protected value since
"ps", "killall", and likely other readers of /proc/pid/stat, take
start_code of "0" to mean a kernel thread and will misbehave. Thanks to
Brad Spengler for pointing this out.
Addresses CVE-2011-0726
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Cc: <stable@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Eugene Teo <eugeneteo@kernel.sg>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1. namelen is declared "unsigned short" which hints for "maybe space savings".
Indeed in 2.4 struct proc_dir_entry looked like:
struct proc_dir_entry {
unsigned short low_ino;
unsigned short namelen;
Now, low_ino is "unsigned int", all savings were gone for a long time.
"struct proc_dir_entry" is not that countless to worry about it's size,
anyway.
2. converting from unsigned short to int/unsigned int can only create
problems, we better play it safe.
Space is not really conserved, because of natural alignment for the next
field. sizeof(struct proc_dir_entry) remains the same.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current code fails to print the "[heap]" marking if the heap is split
into multiple mappings.
Fix the check so that the marking is displayed in all possible cases:
1. vma matches exactly the heap
2. the heap vma is merged e.g. with bss
3. the heap vma is splitted e.g. due to locked pages
Test cases. In all cases, the process should have mapping(s) with
[heap] marking:
(1) vma matches exactly the heap
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
int main (void)
{
if (sbrk(4096) != (void *)-1) {
printf("check /proc/%d/maps\n", (int)getpid());
while (1)
sleep(1);
}
return 0;
}
# ./test1
check /proc/553/maps
[1] + Stopped ./test1
# cat /proc/553/maps | head -4
00008000-00009000 r-xp 00000000 01:00 3113640 /test1
00010000-00011000 rw-p 00000000 01:00 3113640 /test1
00011000-00012000 rw-p 00000000 00:00 0 [heap]
4006f000-40070000 rw-p 00000000 00:00 0
(2) the heap vma is merged
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
char foo[4096] = "foo";
char bar[4096];
int main (void)
{
if (sbrk(4096) != (void *)-1) {
printf("check /proc/%d/maps\n", (int)getpid());
while (1)
sleep(1);
}
return 0;
}
# ./test2
check /proc/556/maps
[2] + Stopped ./test2
# cat /proc/556/maps | head -4
00008000-00009000 r-xp 00000000 01:00 3116312 /test2
00010000-00012000 rw-p 00000000 01:00 3116312 /test2
00012000-00014000 rw-p 00000000 00:00 0 [heap]
4004a000-4004b000 rw-p 00000000 00:00 0
(3) the heap vma is splitted (this fails without the patch)
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
int main (void)
{
if ((sbrk(4096) != (void *)-1) && !mlockall(MCL_FUTURE) &&
(sbrk(4096) != (void *)-1)) {
printf("check /proc/%d/maps\n", (int)getpid());
while (1)
sleep(1);
}
return 0;
}
# ./test3
check /proc/559/maps
[1] + Stopped ./test3
# cat /proc/559/maps|head -4
00008000-00009000 r-xp 00000000 01:00 3119108 /test3
00010000-00011000 rw-p 00000000 01:00 3119108 /test3
00011000-00012000 rw-p 00000000 00:00 0 [heap]
00012000-00013000 rw-p 00000000 00:00 0 [heap]
It looks like the bug has been there forever, and since it only results in
some information missing from a procfile, it does not fulfil the -stable
"critical issue" criteria.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This file is readable for the task owner. Hide kernel addresses from
unprivileged users, leave them function names and offsets.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Acked-by: Kees Cook <kees.cook@canonical.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
minix bit operations are only used by minix filesystem and useless by
other modules. Because byte order of inode and block bitmaps is different
on each architecture like below:
m68k:
big-endian 16bit indexed bitmaps
h8300, microblaze, s390, sparc, m68knommu:
big-endian 32 or 64bit indexed bitmaps
m32r, mips, sh, xtensa:
big-endian 32 or 64bit indexed bitmaps for big-endian mode
little-endian bitmaps for little-endian mode
Others:
little-endian bitmaps
In order to move minix bit operations from asm/bitops.h to architecture
independent code in minix filesystem, this provides two config options.
CONFIG_MINIX_FS_BIG_ENDIAN_16BIT_INDEXED is only selected by m68k.
CONFIG_MINIX_FS_NATIVE_ENDIAN is selected by the architectures which use
native byte order bitmaps (h8300, microblaze, s390, sparc, m68knommu,
m32r, mips, sh, xtensa). The architectures which always use little-endian
bitmaps do not select these options.
Finally, we can remove minix bit operations from asm/bitops.h for all
architectures.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andreas Schwab <schwab@linux-m68k.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Michal Simek <monstr@monstr.eu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As the result of conversions, there are no users of ext2 non-atomic bit
operations except for ext2 filesystem itself. Now we can put them into
architecture independent code in ext2 filesystem, and remove from
asm/bitops.h for all architectures.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h. This converts ext2 non-atomic bit operations to
little-endian bit operations.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h. This converts ext2 non-atomic bit operations to
little-endian bit operations.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h. This converts ext2 non-atomic bit operations to
little-endian bit operations.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h. This converts ext2 non-atomic bit operations to
little-endian bit operations.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As a preparation for removing ext2 non-atomic bit operations from
asm/bitops.h. This converts ext2 non-atomic bit operations to
little-endian bit operations.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs/adfs/adfs.h: In function 'append_filetype_suffix':
fs/adfs/adfs.h:115: warning: comparison is always false due to limited range of data type
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stuart Swales <stuart.swales.croftnuisk@gmail.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All of those are rw-r--r-- and all are broken for suid - if you open
a file before the target does suid-root exec, you'll be still able
to access it. For personality it's not a big deal, but for syscall
and stack it's a real problem.
Fix: check that task is tracable for you at the time of read().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
With recent changes there is no longer a security hazard with writing to
/proc/pid/mem. Remove the #ifdef.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This change allows us to take advantage of access_remote_vm(), which in turn
eliminates a security issue with the mem_write() implementation.
The previous implementation of mem_write() was insecure since the target task
could exec a setuid-root binary between the permission check and the actual
write. Holding a reference to the target mm_struct eliminates this
vulnerability.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Avoid a potential race when task exec's and we get a new ->mm but check against
the old credentials in ptrace_may_access().
Holding of the mutex is implemented by factoring out the body of the code into a
helper function __check_mem_permission(). Performing this factorization now
simplifies upcoming changes and minimizes churn in the diff's.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
This change makes mem_write() observe the same constraints as mem_read(). This
is particularly important for mem_write as an accidental leak of the fd across
an exec could result in arbitrary modification of the target process' memory.
IOW, /proc/pid/mem is implicitly close-on-exec.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Morally, the presence of a gate vma is more an attribute of a particular mm than
a particular task. Moreover, dropping the dependency on task_struct will help
make both existing and future operations on mm's more flexible and convenient.
Signed-off-by: Stephen Wilson <wilsons@start.ca>
Reviewed-by: Michel Lespinasse <walken@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Now that all the infrastructure is in place, we will do the
right thing if we remove this special casing.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The filelayout driver sends LAYOUTCOMMIT only when COMMIT goes to
the data server (as opposed to the MDS) and the data server WRITE
is not NFS_FILE_SYNC.
Only whole file layout support means that there is only one IOMODE_RW layout
segment.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn>
Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn>
Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn>
Tested-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Implement all the hooks created in the previous patches.
This requires exporting quite a few functions and adding a few
structure fields.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Any COMMIT compound directed to a data server needs to have the
GETATTR calls suppressed. We here, make sure the field we are testing
(data->lseg) is set and refcounted correctly.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
We create three major hooks for the pnfs code.
pnfs_mark_request_commit() is called during writeback_done from
nfs_mark_request_commit, which gives the driver an opportunity to
claim it wants control over commiting a particular req.
pnfs_choose_commit_list() is called from nfs_scan_list
to choose which list a given req should be added to, based on
where we intend to send it for COMMIT. It is up to the driver
to have preallocated list headers for each destination it may need.
pnfs_commit_list() is how the driver actually takes control, it is
used instead of nfs_commit_list().
In order to pass information between the above functions, we create
a union in nfs_page to hold a lseg (which is possible because the req is
not on any list while in transition), and add some flags to indicate
if we need to use the pnfs code.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Create a preallocated list header to hold nfs_pages for each
non-MDS COMMIT destination. Note this is not necessarily each DS,
but is basically each <DS, fh> pair.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move it up to avoid forward declaration in later patch.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Create a separate support function for later use by data server
commit code.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Create a separate support function for later use by data server
commit code.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a callback that the pnfs layout driver can use to do its own
error handling of the data server's COMMIT response.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Based on consensus reached in Feb 2011 interim IETF meeting regarding
use of LAYOUTCOMMIT, it has been decided that a NFS_DATA_SYNC return
from a WRITE to data server should not initiate a COMMIT.
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Some filesystems (such as ext4) can return the same cookie value for
multiple files. If we try to start a readdir with one of these cookies,
the server will return the first file found with a cookie of the same
value. This can cause the client to enter an infinite loop.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs_opendir() created a context that held much more information than we
need for a readdir. This patch introduces a slimmed-down
nfs_open_dir_context that contains only the cookie and the cred used for
RPC operations. The new context will eventually be used to help detect
readdir loops.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If we're doing a search by readdir cookie, we need to ensure that the
resulting f_pos is updated. To do so, we need to update the
desc->current_index, in the same way that we do in the search by
file offset case.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The map_bh() call will have already set the buffer_head to mapped.
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Instead of always creating a huge (268K) deflate_workspace with the
maximum compression parameters (windowBits=15, memLevel=8), allow the
caller to obtain a smaller workspace by specifying smaller parameter
values.
For example, when capturing oops and panic reports to a medium with
limited capacity, such as NVRAM, compression may be the only way to
capture the whole report. In this case, a small workspace (24K works
fine) is a win, whether you allocate the workspace when you need it (i.e.,
during an oops or panic) or at boot time.
I've verified that this patch works with all accepted values of windowBits
(positive and negative), memLevel, and compression level.
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d_alloc_name return NULL in case error, but we expect errno in
devpts_pty_new.
Addresses http://bugzilla.openvz.org/show_bug.cgi?id=1758
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The test program below will hang because io_getevents() uses
add_wait_queue_exclusive(), which means the wake_up() in io_destroy() only
wakes up one of the threads. Fix this by using wake_up_all() in the aio
code paths where we want to make sure no one gets stuck.
// t.c -- compile with gcc -lpthread -laio t.c
#include <libaio.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
static const int nthr = 2;
void *getev(void *ctx)
{
struct io_event ev;
io_getevents(ctx, 1, 1, &ev, NULL);
printf("io_getevents returned\n");
return NULL;
}
int main(int argc, char *argv[])
{
io_context_t ctx = 0;
pthread_t thread[nthr];
int i;
io_setup(1024, &ctx);
for (i = 0; i < nthr; ++i)
pthread_create(&thread[i], NULL, getev, ctx);
sleep(1);
io_destroy(ctx);
for (i = 0; i < nthr; ++i)
pthread_join(thread[i], NULL);
return 0;
}
Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ADFS (FileCore) storage complies with the RISC OS filetype specification
(12 bits of file type information is stored in the file load address,
rather than using a file extension). The existing driver largely ignores
this information and does not present it to the end user.
It is desirable that stored filetypes be made visible to the end user to
facilitate a precise copy of data and metadata from a hard disc (or image
thereof) into a RISC OS emulator (such as RPCEmu) or to a network share
which can be accessed by real Acorn systems.
This patch implements a per-mount filetype suffix option (use -o
ftsuffix=1) to present any filetype as a ,xyz hexadecimal suffix on each
file. This type suffix is compatible with that used by RISC OS systems
that access network servers using NFS client software and by RPCemu's host
filing system.
Signed-off-by: Stuart Swales <stuart.swales.croftnuisk@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ADFS (FileCore) storage complies with the RISC OS timestamp specification
(40-bit centiseconds since 01 Jan 1900 00:00:00). It is desirable that
stored timestamp precision be maintained to facilitate a precise copy of
data and metadata from a hard disc (or image thereof) into a RISC OS
emulator (such as RPCEmu).
This patch implements a full-precision conversion from ADFS to Unix
timestamp as the existing driver, for ease of calculation with old 32-bit
compilers, uses the common trick of shifting the 40-bits representing
centiseconds around into 32-bits representing seconds thereby losing
precision.
Signed-off-by: Stuart Swales<stuart.swales.croftnuisk@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kernel crashes in fs/adfs module when accessing directories with a large
number of objects on mounted Acorn ADFS E+/F+ format discs (or images) as
the existing code writes off the end of the fixed array of struct
buffer_head pointers.
Additionally, each directory access that didn't crash would leak a buffer
as nr_buffers was not adjusted correctly for E+/F+ discs (was always left
as one less than required).
The patch fixes this by allocating a dynamically-sized set of struct
buffer_head pointers if necessary for the E+/F+ case (many directories
still do in fact fit in 2048 bytes) and sets the correct nr_buffers so
that all buffers are released.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=26072
Tested by tar'ing the contents of my RISC PC's E+ format 20Gb HDD which
contains a number of large directories that previously crashed the kernel.
Signed-off-by: Stuart Swales <stuart.swales.croftnuisk@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When CONFIG_SYSCTL=n, we get the following warning:
fs/coda/sysctl.c:18: warning: `coda_tabl' defined but not used
Fix the warning by making sure coda_table and it's callee function are in
the same context. Also clean up the code by removing extra #ifdef.
[akpm@linux-foundation.org: remove unneeded stub macros]
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Jan Harkes <jaharkes@cs.cmu.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>