227638 Commits

Author SHA1 Message Date
1c2a48cf65 Merge branch 'linus' into x86/apic-cleanups
Conflicts:
	arch/x86/include/asm/io_apic.h

Merge reason: Resolve the conflict, update to a more recent -rc base

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-01-07 14:14:15 +01:00
bfbb23466a dccp: make upper bound for seq_window consistent on 32/64 bit
The 'seq_window' sysctl sets the initial value for the DCCP Sequence Window,
which may range from 32..2^46-1 (RFC 4340, 7.5.2). The patch sets the upper
bound consistently to 2^32-1 on both 32 and 64 bit systems, which should be
sufficient - with a RTT of 1sec and 1-byte packets, a seq_window of 2^32-1
corresponds to a link speed of 34 Gbps.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-01-07 12:22:44 +01:00
763dadd47c dccp: fix bug in updating the GSR
Currently dccp_check_seqno allows any valid packet to update the Greatest
Sequence Number Received, even if that packet's sequence number is less than
the current GSR. This patch adds a check to make sure that the new packet's
sequence number is greater than GSR.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-01-07 12:22:43 +01:00
2cf5be93d1 dccp: fix return value for sequence-invalid packets
Currently dccp_check_seqno returns 0 (indicating a valid packet) if the
acknowledgment number is out of bounds and the sync that RFC 4340 mandates at
this point is currently being rate-limited. This function should return -1,
indicating an invalid packet.

Signed-off-by: Samuel Jero <sj323707@ohio.edu>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2011-01-07 12:22:43 +01:00
6d5db46687 EDAC, MCE: Fix NB error formatting
Minor formatting fixup since the information which core was associated
with the MCE is not always valid.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:26 +01:00
50adbbd8a8 EDAC, MCE: Use BIT_64() to eliminate warnings on 32-bit
Building for X86_32 produces shift count warnings, so use BIT_64() to
eliminate the warnings.

drivers/edac/mce_amd.c:778: warning: left shift count >= width of type
drivers/edac/mce_amd.c:778: warning: left shift count >= width of type

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: bluesmoke-devel@lists.sourceforge.net
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:25 +01:00
bad11e0318 EDAC, MCE: Enable MCE decoding on F15h
Now that everything is inplace, enable MCE decoding on F15h. Make
initcall routine a bit more readable.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:24 +01:00
1b07ca47ff EDAC, MCE: Allow F15h bank 6 MCE injection
F15h adds a sixth MCE bank: adjust bank number check in the injection
code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:23 +01:00
fa7ae8cc8c EDAC, MCE: Shorten error report formatting
Shorten up MCi_STATUS flags and add BD's new deferred and poison types.
Also, simplify formatting.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:22 +01:00
6245288232 EDAC, MCE: Overhaul error fields extraction macros
Make macro names shorter thus making code shorter and more clear.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:21 +01:00
b8f85c477b EDAC, MCE: Add F15h FP MCE decoder
Add decoder for FP MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:20 +01:00
8259a7e572 EDAC, MCE: Add F15 EX MCE decoder
Integrate the single FIROB signature into an expanded table along with
the new BD MCE types.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:19 +01:00
05cd667d66 EDAC, MCE: Add an F15h NB MCE decoder
by (almost) reusing the F10h one since the signatures are the same.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:18 +01:00
b18434cad1 EDAC, MCE: No F15h LS MCE decoder
F15h BD doesn't generate LS MCEs so warn about it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:17 +01:00
70fdb494aa EDAC, MCE: Add F15h CU MCE decoder
MCE bank 2 is redefined from a BU to a CU (Combined Unit) bank on F15h.
Add a decoder function for CU MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:16 +01:00
86039cd401 EDAC, MCE: Add F15h IC MCE decoder
Add support for decoding F15h IC MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:15 +01:00
25a4f8b059 EDAC, MCE: Add F15h DC MCE decoder
Add a decoder for F15h DC MCEs to support the new types of DC MCEs
introduced by the BD microarchitecture.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:14 +01:00
2be64bfac7 EDAC, MCE: Select extended error code mask
F15h enlarges the extended error code of an MCE to a 5-bit field
(MCi_STATUS[20:16]). Add a mask variable which default 0xf is overridden
on F15h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:12 +01:00
a135cef79a amd64_edac: Disable DRAM ECC injection on K8
K8 does not allow for an atomic RMW to a cacheline as F10h does so
disable the error injection interface for it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:38:46 +01:00
390944439f EDAC: Fixup scrubrate manipulation
Make the ->{get|set}_sdram_scrub_rate return the actual scrub rate
bandwidth it succeeded setting and remove superfluous arg pointer used
for that. A negative value returned still means that an error occurred
while setting the scrubrate. Document this for future reference.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:38:31 +01:00
360b7f3c60 amd64_edac: Remove two-stage initialization
Now that all prerequisites are in place, drop the two-stage driver
instances initialization in favor of the following simple init sequence:

1. Probe PCI device: we only test ECC capabilities here and if none exit
early.

2. If the hw supports ECC and it is/can be enabled, we init the per-node
instance.

Remove "amd64_" prefix from static functions touched, while at it.

There actually should be no visible functional change resulting from
this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:03 +01:00
2299ef7114 amd64_edac: Check ECC capabilities initially
Rework the code to check the hardware ECC capabilities at PCI probing
time. We do all further initialization only if we actually can/have ECC
enabled.

While at it:
0. Fix function naming.
1. Simplify/clarify debug output.
2. Remove amd64_ prefix from the static functions
3. Reorganize code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:02 +01:00
ae7bb7c679 amd64_edac: Carve out ECC-related hw settings
This is in preparation for the init path reorganization where we want
only to

1) test whether a particular node supports ECC
2) can it be enabled

and only then do the necessary allocation/initialization. For that,
we need to decouple the ECC settings of the node from the instance's
descriptor.

The should be no functional change introduced by this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:00 +01:00
f1db274e1b amd64_edac: Remove PCI ECS enabling functions
PCI ECS is being enabled by default since 2.6.26 on AMD so this code is
just superfluous now, remove it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:59 +01:00
027dbd6f5d amd64_edac: Remove explicit Kconfig PCI dependency
AMD_NB pulls in the dependency on PCI. Clarify/fix help text while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:58 +01:00
cc4d8860fc amd64_edac: Allocate driver instances dynamically
Remove static allocation in favor of dynamically allocating space for as
many driver instances as northbridges present on the system.

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:57 +01:00
24f9a7fe3f amd64_edac: Rework printk macros
Add a macro per printk level, shorten up error messages. Add relevant
information to KERN_INFO level. No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:56 +01:00
8d5b5d9c7b amd64_edac: Rename CPU PCI devices
Rename variables representing PCI devices to their BKDG names for faster
search and shorter, clearer code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:54 +01:00
b8cfa02f83 amd64_edac: Concentrate per-family init even more
Move the remaining per-family init code into the proper place and
simplify the rest of the initialization. Reorganize error handling in
amd64_init_one_instance().

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:53 +01:00
bbd0c1f675 amd64_edac: Cleanup the CPU PCI device reservation
Shorten code and clarify comments, return proper -E* values on error.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:52 +01:00
0092b20d4c amd64_edac: Simplify CPU family detection
Concentrate CPU family detection in the per-family init function.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:51 +01:00
395ae783b3 amd64_edac: Add per-family init function
Run a per-family init function which does all the settings based on
the family this driver instance is running on. Move the scrubrate
calculation in it and simplify code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:50 +01:00
9f56da0e3c amd64_edac: Use cached extended CPU model
... instead of computing it needlessly again.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:49 +01:00
3ab0e7dc2e amd64_edac: Remove F11h support
F11h doesn't support DRAM ECC so whack it away.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:47 +01:00
b3e19d924b fs: scale mntget/mntput
The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.

The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.

We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.

- check the global sum once every interval (this will delay zero detection
  for some interval, so it's probably a showstopper for vfsmounts).

- keep a local count and only taking the global sum when local reaches 0 (this
  is difficult for vfsmounts, because we can't hold preempt off for the life of
  a reference, so a counter would need to be per-thread or tied strongly to a
  particular CPU which requires more locking).

- keep a local difference of increments and decrements, which allows us to sum
  the total difference and hence find the refcount when summing all CPUs. Then,
  keep a single integer "long" refcount for slow and long lasting references,
  and only take the global sum of local counters when the long refcount is 0.

This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.

This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.

This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:33 +11:00
c6653a838b fs: rename vfsmount counter helpers
Suggested by Andreas, mnt_ prefix is clearer namespace, follows kernel
conventions better, and is easier for tab complete. I introduced these
names so I'll admit they were not good choices.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:33 +11:00
9d55c369bb fs: implement faster dentry memcmp
The standard memcmp function on a Westmere system shows up hot in
profiles in the `git diff` workload (both parallel and single threaded),
and it is likely due to the costs associated with trapping into
microcode, and little opportunity to improve memory access (dentry
name is not likely to take up more than a cacheline).

So replace it with an open-coded byte comparison. This increases code
size by 8 bytes in the critical __d_lookup_rcu function, but the
speedup is huge, averaging 10 runs of each:

git diff st   user   sys   elapsed  CPU
before        1.15   2.57  3.82      97.1
after         1.14   2.35  3.61      96.8

git diff mt   user   sys   elapsed  CPU
before        1.27   3.85  1.46     349
after         1.26   3.54  1.43     333

Elapsed time for single threaded git diff at 95.0% confidence:
        -0.21  +/- 0.01
        -5.45% +/- 0.24%

It's -0.66% +/- 0.06% elapsed time on my Opteron, so rep cmp costs on the
fam10h seem to be relatively smaller, but there is still a win.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:32 +11:00
e1bb578263 fs: prefetch inode data in dcache lookup
This makes single threaded git diff -1.25% +/- 0.05% elapsed time on my
2s12c24t Westmere system, and -0.86% +/- 0.05% on my 2s8c Barcelona, by
prefetching the important first cacheline of the inode in while we do the
actual name compare and other operations on the dentry.

There was no measurable slowdown in the single file stat case, or the creat
case (where negative dentries would be common).

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:32 +11:00
4b936885ab fs: improve scalability of pseudo filesystems
Regardless of how much we possibly try to scale dcache, there is likely
always going to be some fundamental contention when adding or removing children
under the same parent. Pseudo filesystems do not seem need to have connected
dentries because by definition they are disconnected.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:32 +11:00
873feea09e fs: dcache per-inode inode alias locking
dcache_inode_lock can be replaced with per-inode locking. Use existing
inode->i_lock for this. This is slightly non-trivial because we sometimes
need to find the inode from the dentry, which requires d_inode to be
stabilised (either with refcount or d_lock).

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:31 +11:00
ceb5bdc2d2 fs: dcache per-bucket dcache hash locking
We can turn the dcache hash locking from a global dcache_hash_lock into
per-bucket locking.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:31 +11:00
626d607435 bit_spinlock: add required includes
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:31 +11:00
4e35e6070b kernel: add bl_list
Introduce a type of hlist that can support the use of the lowest bit in the
hlist_head. This will be subsequently used to implement per-bucket bit spinlock
for inode and dentry hashes, and may be useful in other cases such as network
hashes.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:31 +11:00
880566e17c xfs: provide simple rcu-walk ACL implementation
This simple implementation just checks for no ACLs on the inode, and
if so, then the rcu-walk may proceed, otherwise fail it.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:30 +11:00
258a5aa8df btrfs: provide simple rcu-walk ACL implementation
This simple implementation just checks for no ACLs on the inode, and
if so, then the rcu-walk may proceed, otherwise fail it.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:30 +11:00
73598611ad ext2,3,4: provide simple rcu-walk ACL implementation
This simple implementation just checks for no ACLs on the inode, and
if so, then the rcu-walk may proceed, otherwise fail it.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:30 +11:00
1e1743ebe3 fs: provide simple rcu-walk generic_check_acl implementation
This simple implementation just checks for no ACLs on the inode, and
if so, then the rcu-walk may proceed, otherwise fail it.

This could easily be extended to put acls under RCU and check them
under seqlock, if need be. But this implementation is enough to show
the rcu-walk aware permissions code for path lookups is working, and
will handle cases where there are no ACLs or ACLs in just the final
element.

This patch implicity converts tmpfs to rcu-aware permission check.
Subsequent patches onvert ext*, xfs, and, btrfs. Each of these uses
acl/permission code in a different way, so convert them all to provide
templates and proof of concept.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
b74c79e993 fs: provide rcu-walk aware permission i_ops
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
34286d6662 fs: rcu-walk aware d_revalidate method
Require filesystems be aware of .d_revalidate being called in rcu-walk
mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
-ECHILD from all implementations.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:29 +11:00
44a7d7a878 fs: cache optimise dentry and inode for rcu-walk
Put dentry and inode fields into top of data structure.  This allows RCU path
traversal to perform an RCU dentry lookup in a path walk by touching only the
first 56 bytes of the dentry.

We also fit in 8 bytes of inline name in the first 64 bytes, so for short
names, only 64 bytes needs to be touched to perform the lookup. We should
get rid of the hash->prev pointer from the first 64 bytes, and fit 16 bytes
of name in there, which will take care of 81% rather than 32% of the kernel
tree.

inode is also rearranged so that RCU lookup will only touch a single cacheline
in the inode, plus one in the i_ops structure.

This is important for directory component lookups in RCU path walking. In the
kernel source, directory names average is around 6 chars, so this works.

When we reach the last element of the lookup, we need to lock it and take its
refcount which requires another cacheline access.

Align dentry and inode operations structs, so members will be at predictable
offsets and we can group common operations into head of structure.

Signed-off-by: Nick Piggin <npiggin@kernel.dk>
2011-01-07 17:50:28 +11:00