From 12b101555f4a67db67a66966a516075bd477741f Mon Sep 17 00:00:00 2001
From: Phil Oester <kernel@linuxace.com>
Date: Fri, 21 Mar 2008 15:01:50 -0700
Subject: [PATCH 01/23] [IPV4]: Fix null dereference in ip_defrag

Been seeing occasional panics in my testing of 2.6.25-rc in ip_defrag.
Offending line in ip_defrag is here:

	net = skb->dev->nd_net

where dev is NULL.  Bisected the problem down to commit
ac18e7509e7df327e30d6e073a787d922eaf211d ([NETNS][FRAGS]: Make the
inet_frag_queue lookup work in namespaces).

Below patch (idea from Patrick McHardy) fixes the problem for me.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/ip_fragment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index a2e92f9709db..3b2e5adca838 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -568,7 +568,7 @@ int ip_defrag(struct sk_buff *skb, u32 user)
 
 	IP_INC_STATS_BH(IPSTATS_MIB_REASMREQDS);
 
-	net = skb->dev->nd_net;
+	net = skb->dev ? skb->dev->nd_net : skb->dst->dev->nd_net;
 	/* Start by cleaning up the memory. */
 	if (atomic_read(&net->ipv4.frags.mem) > net->ipv4.frags.high_thresh)
 		ip_evictor(net);

From 1233823b0847190976d69a86d7bb1287992ba2c7 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Fri, 21 Mar 2008 15:40:47 -0700
Subject: [PATCH 02/23] [SCTP]: Fix build warnings with IPV6 disabled.

Introduced by 270637abff0cdf848b910b9f96ad342e1da61c66
("[SCTP]: Fix a race between module load and protosw access")

Reported by Gabriel C:

In file included from net/sctp/sm_statetable.c:50:
include/net/sctp/sctp.h: In function 'sctp_v6_pf_init':
include/net/sctp/sctp.h:392: warning: 'return' with a value, in function returning void
In file included from net/sctp/sm_statefuns.c:62:
include/net/sctp/sctp.h: In function 'sctp_v6_pf_init':
include/net/sctp/sctp.h:392: warning: 'return' with a value, in function returning void
 ...

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/sctp/sctp.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 57ed3e323d97..ea806732b084 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -389,7 +389,7 @@ void sctp_v6_del_protocol(void);
 
 #else /* #ifdef defined(CONFIG_IPV6) */
 
-static inline void sctp_v6_pf_init(void) { return 0; }
+static inline void sctp_v6_pf_init(void) { return; }
 static inline void sctp_v6_pf_exit(void) { return; }
 static inline int sctp_v6_protosw_init(void) { return 0; }
 static inline void sctp_v6_protosw_exit(void) { return; }

From 7512cbf6efc97644812f137527a54b8e92b6a90a Mon Sep 17 00:00:00 2001
From: Pavel Emelyanov <xemul@openvz.org>
Date: Fri, 21 Mar 2008 15:58:52 -0700
Subject: [PATCH 03/23] [DLCI]: Fix tiny race between module unload and
 sock_ioctl.

This is a narrow pedantry :) but the dlci_ioctl_hook check and call
should not be parted with the mutex lock.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/socket.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index b6d35cd72a50..9d3fbfbc8535 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -909,11 +909,10 @@ static long sock_ioctl(struct file *file, unsigned cmd, unsigned long arg)
 			if (!dlci_ioctl_hook)
 				request_module("dlci");
 
-			if (dlci_ioctl_hook) {
-				mutex_lock(&dlci_ioctl_mutex);
+			mutex_lock(&dlci_ioctl_mutex);
+			if (dlci_ioctl_hook)
 				err = dlci_ioctl_hook(cmd, argp);
-				mutex_unlock(&dlci_ioctl_mutex);
-			}
+			mutex_unlock(&dlci_ioctl_mutex);
 			break;
 		default:
 			err = sock->ops->ioctl(sock, cmd, arg);

From 64658743fdd40021e3ac91e8ff260ad06578dd23 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Fri, 21 Mar 2008 17:01:38 -0700
Subject: [PATCH 04/23] [SPARC64]: Remove most limitations to kernel image
 size.

Currently kernel images are limited to 8MB in size, and this causes
problems especially when enabling features that take up a lot of
kernel image space such as lockdep.

The code now will align the kernel image size up to 4MB and map that
many locked TLB entries.  So, the only practical limitation is the
number of available locked TLB entries which is 16 on Cheetah and 64
on pre-Cheetah sparc64 cpus.  Niagara cpus don't actually have hw
locked TLB entry support.  Rather, the hypervisor transparently
provides support for "locked" TLB entries since it runs with physical
addressing and does the initial TLB miss processing.

Fully utilizing this change requires some help from SILO, a patch for
which will be submitted to the maintainer.  Essentially, SILO will
only currently map up to 8MB for the kernel image and that needs to be
increased.

Note that neither this patch nor the SILO bits will help with network
booting.  The openfirmware code will only map up to a certain amount
of kernel image during a network boot and there isn't much we can to
about that other than to implemented a layered network booting
facility.  Solaris has this, and calls it "wanboot" and we may
implement something similar at some point.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc64/kernel/head.S       |   8 +-
 arch/sparc64/kernel/smp.c        |  17 +--
 arch/sparc64/kernel/trampoline.S | 190 +++++++++++--------------------
 arch/sparc64/mm/init.c           |  38 +++----
 include/asm-sparc64/hvtramp.h    |   2 +-
 include/asm-sparc64/spitfire.h   |   2 +
 6 files changed, 97 insertions(+), 160 deletions(-)

diff --git a/arch/sparc64/kernel/head.S b/arch/sparc64/kernel/head.S
index 44b105c04dd3..34f8ff57c56b 100644
--- a/arch/sparc64/kernel/head.S
+++ b/arch/sparc64/kernel/head.S
@@ -288,8 +288,12 @@ sun4v_chip_type:
 	/* Leave arg2 as-is, prom_mmu_ihandle_cache */
 	mov	-1, %l3
 	stx	%l3, [%sp + 2047 + 128 + 0x28]	! arg3: mode (-1 default)
-	sethi	%hi(8 * 1024 * 1024), %l3
-	stx	%l3, [%sp + 2047 + 128 + 0x30]	! arg4: size (8MB)
+	/* 4MB align the kernel image size. */
+	set	(_end - KERNBASE), %l3
+	set	((4 * 1024 * 1024) - 1), %l4
+	add	%l3, %l4, %l3
+	andn	%l3, %l4, %l3
+	stx	%l3, [%sp + 2047 + 128 + 0x30]	! arg4: roundup(ksize, 4MB)
 	sethi	%hi(KERNBASE), %l3
 	stx	%l3, [%sp + 2047 + 128 + 0x38]	! arg5: vaddr (KERNBASE)
 	stx	%g0, [%sp + 2047 + 128 + 0x40]	! arg6: empty
diff --git a/arch/sparc64/kernel/smp.c b/arch/sparc64/kernel/smp.c
index cc454731d879..5a1126b363a4 100644
--- a/arch/sparc64/kernel/smp.c
+++ b/arch/sparc64/kernel/smp.c
@@ -284,14 +284,17 @@ static void ldom_startcpu_cpuid(unsigned int cpu, unsigned long thread_reg)
 {
 	extern unsigned long sparc64_ttable_tl0;
 	extern unsigned long kern_locked_tte_data;
-	extern int bigkernel;
 	struct hvtramp_descr *hdesc;
 	unsigned long trampoline_ra;
 	struct trap_per_cpu *tb;
 	u64 tte_vaddr, tte_data;
 	unsigned long hv_err;
+	int i;
 
-	hdesc = kzalloc(sizeof(*hdesc), GFP_KERNEL);
+	hdesc = kzalloc(sizeof(*hdesc) +
+			(sizeof(struct hvtramp_mapping) *
+			 num_kernel_image_mappings - 1),
+			GFP_KERNEL);
 	if (!hdesc) {
 		printk(KERN_ERR "ldom_startcpu_cpuid: Cannot allocate "
 		       "hvtramp_descr.\n");
@@ -299,7 +302,7 @@ static void ldom_startcpu_cpuid(unsigned int cpu, unsigned long thread_reg)
 	}
 
 	hdesc->cpu = cpu;
-	hdesc->num_mappings = (bigkernel ? 2 : 1);
+	hdesc->num_mappings = num_kernel_image_mappings;
 
 	tb = &trap_block[cpu];
 	tb->hdesc = hdesc;
@@ -312,13 +315,11 @@ static void ldom_startcpu_cpuid(unsigned int cpu, unsigned long thread_reg)
 	tte_vaddr = (unsigned long) KERNBASE;
 	tte_data = kern_locked_tte_data;
 
-	hdesc->maps[0].vaddr = tte_vaddr;
-	hdesc->maps[0].tte   = tte_data;
-	if (bigkernel) {
+	for (i = 0; i < hdesc->num_mappings; i++) {
+		hdesc->maps[i].vaddr = tte_vaddr;
+		hdesc->maps[i].tte   = tte_data;
 		tte_vaddr += 0x400000;
 		tte_data  += 0x400000;
-		hdesc->maps[1].vaddr = tte_vaddr;
-		hdesc->maps[1].tte   = tte_data;
 	}
 
 	trampoline_ra = kimage_addr_to_ra(hv_cpu_startup);
diff --git a/arch/sparc64/kernel/trampoline.S b/arch/sparc64/kernel/trampoline.S
index 4ae2e525d68b..56ff55211341 100644
--- a/arch/sparc64/kernel/trampoline.S
+++ b/arch/sparc64/kernel/trampoline.S
@@ -105,7 +105,7 @@ startup_continue:
 	wr		%g2, 0, %tick_cmpr
 
 	/* Call OBP by hand to lock KERNBASE into i/d tlbs.
-	 * We lock 2 consequetive entries if we are 'bigkernel'.
+	 * We lock 'num_kernel_image_mappings' consequetive entries.
 	 */
 	sethi		%hi(prom_entry_lock), %g2
 1:	ldstub		[%g2 + %lo(prom_entry_lock)], %g1
@@ -119,6 +119,29 @@ startup_continue:
 	add		%l2, -(192 + 128), %sp
 	flushw
 
+	/* Setup the loop variables:
+	 * %l3: VADDR base
+	 * %l4: TTE base
+	 * %l5: Loop iterator, iterates from 0 to 'num_kernel_image_mappings'
+	 * %l6: Number of TTE entries to map
+	 * %l7: Highest TTE entry number, we count down
+	 */
+	sethi		%hi(KERNBASE), %l3
+	sethi		%hi(kern_locked_tte_data), %l4
+	ldx		[%l4 + %lo(kern_locked_tte_data)], %l4
+	clr		%l5
+	sethi		%hi(num_kernel_image_mappings), %l6
+	lduw		[%l6 + %lo(num_kernel_image_mappings)], %l6
+	add		%l6, 1, %l6
+
+	mov		15, %l7
+	BRANCH_IF_ANY_CHEETAH(g1,g5,2f)
+
+	mov		63, %l7
+2:
+
+3:
+	/* Lock into I-MMU */
 	sethi		%hi(call_method), %g2
 	or		%g2, %lo(call_method), %g2
 	stx		%g2, [%sp + 2047 + 128 + 0x00]
@@ -132,63 +155,26 @@ startup_continue:
 	sethi		%hi(prom_mmu_ihandle_cache), %g2
 	lduw		[%g2 + %lo(prom_mmu_ihandle_cache)], %g2
 	stx		%g2, [%sp + 2047 + 128 + 0x20]
-	sethi		%hi(KERNBASE), %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x28]
-	sethi		%hi(kern_locked_tte_data), %g2
-	ldx		[%g2 + %lo(kern_locked_tte_data)], %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x30]
 
-	mov		15, %g2
-	BRANCH_IF_ANY_CHEETAH(g1,g5,1f)
+	/* Each TTE maps 4MB, convert index to offset.  */
+	sllx		%l5, 22, %g1
 
-	mov		63, %g2
-1:
+	add		%l3, %g1, %g2
+	stx		%g2, [%sp + 2047 + 128 + 0x28]	! VADDR
+	add		%l4, %g1, %g2
+	stx		%g2, [%sp + 2047 + 128 + 0x30]	! TTE
+
+	/* TTE index is highest minus loop index.  */
+	sub		%l7, %l5, %g2
 	stx		%g2, [%sp + 2047 + 128 + 0x38]
+
 	sethi		%hi(p1275buf), %g2
 	or		%g2, %lo(p1275buf), %g2
 	ldx		[%g2 + 0x08], %o1
 	call		%o1
 	 add		%sp, (2047 + 128), %o0
 
-	sethi		%hi(bigkernel), %g2
-	lduw		[%g2 + %lo(bigkernel)], %g2
-	brz,pt		%g2, do_dtlb
-	 nop
-
-	sethi		%hi(call_method), %g2
-	or		%g2, %lo(call_method), %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x00]
-	mov		5, %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x08]
-	mov		1, %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x10]
-	sethi		%hi(itlb_load), %g2
-	or		%g2, %lo(itlb_load), %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x18]
-	sethi		%hi(prom_mmu_ihandle_cache), %g2
-	lduw		[%g2 + %lo(prom_mmu_ihandle_cache)], %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x20]
-	sethi		%hi(KERNBASE + 0x400000), %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x28]
-	sethi		%hi(kern_locked_tte_data), %g2
-	ldx		[%g2 + %lo(kern_locked_tte_data)], %g2
-	sethi		%hi(0x400000), %g1
-	add		%g2, %g1, %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x30]
-
-	mov		14, %g2
-	BRANCH_IF_ANY_CHEETAH(g1,g5,1f)
-
-	mov		62, %g2
-1:
-	stx		%g2, [%sp + 2047 + 128 + 0x38]
-	sethi		%hi(p1275buf), %g2
-	or		%g2, %lo(p1275buf), %g2
-	ldx		[%g2 + 0x08], %o1
-	call		%o1
-	 add		%sp, (2047 + 128), %o0
-
-do_dtlb:
+	/* Lock into D-MMU */
 	sethi		%hi(call_method), %g2
 	or		%g2, %lo(call_method), %g2
 	stx		%g2, [%sp + 2047 + 128 + 0x00]
@@ -202,65 +188,30 @@ do_dtlb:
 	sethi		%hi(prom_mmu_ihandle_cache), %g2
 	lduw		[%g2 + %lo(prom_mmu_ihandle_cache)], %g2
 	stx		%g2, [%sp + 2047 + 128 + 0x20]
-	sethi		%hi(KERNBASE), %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x28]
-	sethi		%hi(kern_locked_tte_data), %g2
-	ldx		[%g2 + %lo(kern_locked_tte_data)], %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x30]
 
-	mov		15, %g2
-	BRANCH_IF_ANY_CHEETAH(g1,g5,1f)
+	/* Each TTE maps 4MB, convert index to offset.  */
+	sllx		%l5, 22, %g1
 
-	mov		63, %g2
-1:
+	add		%l3, %g1, %g2
+	stx		%g2, [%sp + 2047 + 128 + 0x28]	! VADDR
+	add		%l4, %g1, %g2
+	stx		%g2, [%sp + 2047 + 128 + 0x30]	! TTE
 
+	/* TTE index is highest minus loop index.  */
+	sub		%l7, %l5, %g2
 	stx		%g2, [%sp + 2047 + 128 + 0x38]
+
 	sethi		%hi(p1275buf), %g2
 	or		%g2, %lo(p1275buf), %g2
 	ldx		[%g2 + 0x08], %o1
 	call		%o1
 	 add		%sp, (2047 + 128), %o0
 
-	sethi		%hi(bigkernel), %g2
-	lduw		[%g2 + %lo(bigkernel)], %g2
-	brz,pt		%g2, do_unlock
+	add		%l5, 1, %l5
+	cmp		%l5, %l6
+	bne,pt		%xcc, 3b
 	 nop
 
-	sethi		%hi(call_method), %g2
-	or		%g2, %lo(call_method), %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x00]
-	mov		5, %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x08]
-	mov		1, %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x10]
-	sethi		%hi(dtlb_load), %g2
-	or		%g2, %lo(dtlb_load), %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x18]
-	sethi		%hi(prom_mmu_ihandle_cache), %g2
-	lduw		[%g2 + %lo(prom_mmu_ihandle_cache)], %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x20]
-	sethi		%hi(KERNBASE + 0x400000), %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x28]
-	sethi		%hi(kern_locked_tte_data), %g2
-	ldx		[%g2 + %lo(kern_locked_tte_data)], %g2
-	sethi		%hi(0x400000), %g1
-	add		%g2, %g1, %g2
-	stx		%g2, [%sp + 2047 + 128 + 0x30]
-
-	mov		14, %g2
-	BRANCH_IF_ANY_CHEETAH(g1,g5,1f)
-
-	mov		62, %g2
-1:
-
-	stx		%g2, [%sp + 2047 + 128 + 0x38]
-	sethi		%hi(p1275buf), %g2
-	or		%g2, %lo(p1275buf), %g2
-	ldx		[%g2 + 0x08], %o1
-	call		%o1
-	 add		%sp, (2047 + 128), %o0
-
-do_unlock:
 	sethi		%hi(prom_entry_lock), %g2
 	stb		%g0, [%g2 + %lo(prom_entry_lock)]
 	membar		#StoreStore | #StoreLoad
@@ -269,47 +220,36 @@ do_unlock:
 	 nop
 
 niagara_lock_tlb:
+	sethi		%hi(KERNBASE), %l3
+	sethi		%hi(kern_locked_tte_data), %l4
+	ldx		[%l4 + %lo(kern_locked_tte_data)], %l4
+	clr		%l5
+	sethi		%hi(num_kernel_image_mappings), %l6
+	lduw		[%l6 + %lo(num_kernel_image_mappings)], %l6
+	add		%l6, 1, %l6
+
+1:
 	mov		HV_FAST_MMU_MAP_PERM_ADDR, %o5
-	sethi		%hi(KERNBASE), %o0
+	sllx		%l5, 22, %g2
+	add		%l3, %g2, %o0
 	clr		%o1
-	sethi		%hi(kern_locked_tte_data), %o2
-	ldx		[%o2 + %lo(kern_locked_tte_data)], %o2
+	add		%l4, %g2, %o2
 	mov		HV_MMU_IMMU, %o3
 	ta		HV_FAST_TRAP
 
 	mov		HV_FAST_MMU_MAP_PERM_ADDR, %o5
-	sethi		%hi(KERNBASE), %o0
+	sllx		%l5, 22, %g2
+	add		%l3, %g2, %o0
 	clr		%o1
-	sethi		%hi(kern_locked_tte_data), %o2
-	ldx		[%o2 + %lo(kern_locked_tte_data)], %o2
+	add		%l4, %g2, %o2
 	mov		HV_MMU_DMMU, %o3
 	ta		HV_FAST_TRAP
 
-	sethi		%hi(bigkernel), %g2
-	lduw		[%g2 + %lo(bigkernel)], %g2
-	brz,pt		%g2, after_lock_tlb
+	add		%l5, 1, %l5
+	cmp		%l5, %l6
+	bne,pt		%xcc, 1b
 	 nop
 
-	mov		HV_FAST_MMU_MAP_PERM_ADDR, %o5
-	sethi		%hi(KERNBASE + 0x400000), %o0
-	clr		%o1
-	sethi		%hi(kern_locked_tte_data), %o2
-	ldx		[%o2 + %lo(kern_locked_tte_data)], %o2
-	sethi		%hi(0x400000), %o3
-	add		%o2, %o3, %o2
-	mov		HV_MMU_IMMU, %o3
-	ta		HV_FAST_TRAP
-
-	mov		HV_FAST_MMU_MAP_PERM_ADDR, %o5
-	sethi		%hi(KERNBASE + 0x400000), %o0
-	clr		%o1
-	sethi		%hi(kern_locked_tte_data), %o2
-	ldx		[%o2 + %lo(kern_locked_tte_data)], %o2
-	sethi		%hi(0x400000), %o3
-	add		%o2, %o3, %o2
-	mov		HV_MMU_DMMU, %o3
-	ta		HV_FAST_TRAP
-
 after_lock_tlb:
 	wrpr		%g0, (PSTATE_PRIV | PSTATE_PEF), %pstate
 	wr		%g0, 0, %fprs
diff --git a/arch/sparc64/mm/init.c b/arch/sparc64/mm/init.c
index b5c30416fdac..466fd6cffac9 100644
--- a/arch/sparc64/mm/init.c
+++ b/arch/sparc64/mm/init.c
@@ -166,7 +166,7 @@ unsigned long sparc64_kern_pri_context __read_mostly;
 unsigned long sparc64_kern_pri_nuc_bits __read_mostly;
 unsigned long sparc64_kern_sec_context __read_mostly;
 
-int bigkernel = 0;
+int num_kernel_image_mappings;
 
 #ifdef CONFIG_DEBUG_DCFLUSH
 atomic_t dcpage_flushes = ATOMIC_INIT(0);
@@ -572,7 +572,7 @@ static unsigned long kern_large_tte(unsigned long paddr);
 static void __init remap_kernel(void)
 {
 	unsigned long phys_page, tte_vaddr, tte_data;
-	int tlb_ent = sparc64_highest_locked_tlbent();
+	int i, tlb_ent = sparc64_highest_locked_tlbent();
 
 	tte_vaddr = (unsigned long) KERNBASE;
 	phys_page = (prom_boot_mapping_phys_low >> 22UL) << 22UL;
@@ -582,27 +582,20 @@ static void __init remap_kernel(void)
 
 	/* Now lock us into the TLBs via Hypervisor or OBP. */
 	if (tlb_type == hypervisor) {
-		hypervisor_tlb_lock(tte_vaddr, tte_data, HV_MMU_DMMU);
-		hypervisor_tlb_lock(tte_vaddr, tte_data, HV_MMU_IMMU);
-		if (bigkernel) {
-			tte_vaddr += 0x400000;
-			tte_data += 0x400000;
+		for (i = 0; i < num_kernel_image_mappings; i++) {
 			hypervisor_tlb_lock(tte_vaddr, tte_data, HV_MMU_DMMU);
 			hypervisor_tlb_lock(tte_vaddr, tte_data, HV_MMU_IMMU);
+			tte_vaddr += 0x400000;
+			tte_data += 0x400000;
 		}
 	} else {
-		prom_dtlb_load(tlb_ent, tte_data, tte_vaddr);
-		prom_itlb_load(tlb_ent, tte_data, tte_vaddr);
-		if (bigkernel) {
-			tlb_ent -= 1;
-			prom_dtlb_load(tlb_ent,
-				       tte_data + 0x400000, 
-				       tte_vaddr + 0x400000);
-			prom_itlb_load(tlb_ent,
-				       tte_data + 0x400000, 
-				       tte_vaddr + 0x400000);
+		for (i = 0; i < num_kernel_image_mappings; i++) {
+			prom_dtlb_load(tlb_ent - i, tte_data, tte_vaddr);
+			prom_itlb_load(tlb_ent - i, tte_data, tte_vaddr);
+			tte_vaddr += 0x400000;
+			tte_data += 0x400000;
 		}
-		sparc64_highest_unlocked_tlb_ent = tlb_ent - 1;
+		sparc64_highest_unlocked_tlb_ent = tlb_ent - i;
 	}
 	if (tlb_type == cheetah_plus) {
 		sparc64_kern_pri_context = (CTX_CHEETAH_PLUS_CTX0 |
@@ -1352,12 +1345,9 @@ void __init paging_init(void)
 	shift = kern_base + PAGE_OFFSET - ((unsigned long)KERNBASE);
 
 	real_end = (unsigned long)_end;
-	if ((real_end > ((unsigned long)KERNBASE + 0x400000)))
-		bigkernel = 1;
-	if ((real_end > ((unsigned long)KERNBASE + 0x800000))) {
-		prom_printf("paging_init: Kernel > 8MB, too large.\n");
-		prom_halt();
-	}
+	num_kernel_image_mappings = DIV_ROUND_UP(real_end - KERNBASE, 1 << 22);
+	printk("Kernel: Using %d locked TLB entries for main kernel image.\n",
+	       num_kernel_image_mappings);
 
 	/* Set kernel pgd to upper alias so physical page computations
 	 * work.
diff --git a/include/asm-sparc64/hvtramp.h b/include/asm-sparc64/hvtramp.h
index c7dd6ad056df..b2b9b947b3a4 100644
--- a/include/asm-sparc64/hvtramp.h
+++ b/include/asm-sparc64/hvtramp.h
@@ -16,7 +16,7 @@ struct hvtramp_descr {
 	__u64			fault_info_va;
 	__u64			fault_info_pa;
 	__u64			thread_reg;
-	struct hvtramp_mapping	maps[2];
+	struct hvtramp_mapping	maps[1];
 };
 
 extern void hv_cpu_startup(unsigned long hvdescr_pa);
diff --git a/include/asm-sparc64/spitfire.h b/include/asm-sparc64/spitfire.h
index 63b7040e8134..985ea7e31992 100644
--- a/include/asm-sparc64/spitfire.h
+++ b/include/asm-sparc64/spitfire.h
@@ -63,6 +63,8 @@ extern void cheetah_enable_pcache(void);
 	 SPITFIRE_HIGHEST_LOCKED_TLBENT : \
 	 CHEETAH_HIGHEST_LOCKED_TLBENT)
 
+extern int num_kernel_image_mappings;
+
 /* The data cache is write through, so this just invalidates the
  * specified line.
  */

From 69d1506731168d6845a76a303b2c45f7c05f3f2c Mon Sep 17 00:00:00 2001
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 22 Mar 2008 15:47:05 -0700
Subject: [PATCH 05/23] [TCP]: Let skbs grow over a page on fast peers

While testing the virtio-net driver on KVM with TSO I noticed
that TSO performance with a 1500 MTU is significantly worse
compared to the performance of non-TSO with a 16436 MTU.  The
packet dump shows that most of the packets sent are smaller
than a page.

Looking at the code this actually is quite obvious as it always
stop extending the packet if it's the first packet yet to be
sent and if it's larger than the MSS.  Since each extension is
bound by the page size, this means that (given a 1500 MTU) we're
very unlikely to construct packets greater than a page, provided
that the receiver and the path is fast enough so that packets can
always be sent immediately.

The fix is also quite obvious.  The push calls inside the loop
is just an optimisation so that we don't end up doing all the
sending at the end of the loop.  Therefore there is no specific
reason why it has to do so at MSS boundaries.  For TSO, the
most natural extension of this optimisation is to do the pushing
once the skb exceeds the TSO size goal.

This is what the patch does and testing with KVM shows that the
TSO performance with a 1500 MTU easily surpasses that of a 16436
MTU and indeed the packet sizes sent are generally larger than
16436.

I don't see any obvious downsides for slower peers or connections,
but it would be prudent to test this extensively to ensure that
those cases don't regress.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/tcp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 071e83a894ad..39b629ac2404 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -735,7 +735,7 @@ static ssize_t do_tcp_sendpages(struct sock *sk, struct page **pages, int poffse
 		if (!(psize -= copy))
 			goto out;
 
-		if (skb->len < mss_now || (flags & MSG_OOB))
+		if (skb->len < size_goal || (flags & MSG_OOB))
 			continue;
 
 		if (forced_push(tp)) {
@@ -981,7 +981,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
 			if ((seglen -= copy) == 0 && iovlen == 0)
 				goto out;
 
-			if (skb->len < mss_now || (flags & MSG_OOB))
+			if (skb->len < size_goal || (flags & MSG_OOB))
 				continue;
 
 			if (forced_push(tp)) {

From 6440cc9e0f48ade57af7be28008cbfa6a991f287 Mon Sep 17 00:00:00 2001
From: Stephen Hemminger <shemminger@vyatta.com>
Date: Sat, 22 Mar 2008 17:59:58 -0700
Subject: [PATCH 06/23] [IPV4] fib_trie: fix warning from rcu_assign_poinger

This gets rid of a warning caused by the test in rcu_assign_pointer.
I tried to fix rcu_assign_pointer, but that devolved into a long set
of discussions about doing it right that came to no real solution.
Since the test in rcu_assign_pointer for constant NULL would never
succeed in fib_trie, just open code instead.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/fib_trie.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 1ff446d0fa8b..f6cdc012eec5 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -177,10 +177,13 @@ static inline struct tnode *node_parent_rcu(struct node *node)
 	return rcu_dereference(ret);
 }
 
+/* Same as rcu_assign_pointer
+ * but that macro() assumes that value is a pointer.
+ */
 static inline void node_set_parent(struct node *node, struct tnode *ptr)
 {
-	rcu_assign_pointer(node->parent,
-			   (unsigned long)ptr | NODE_TYPE(node));
+	smp_wmb();
+	node->parent = (unsigned long)ptr | NODE_TYPE(node);
 }
 
 static inline struct node *tnode_get_child(struct tnode *tn, unsigned int i)

From 421f099bc555c5f1516fdf5060de1d6bb5f51002 Mon Sep 17 00:00:00 2001
From: Julia Lawall <julia@diku.dk>
Date: Sat, 22 Mar 2008 18:04:16 -0700
Subject: [PATCH 07/23] [IPV6] net/ipv6/ndisc.c: remove unused variable

The variable hlen is initialized but never used otherwise.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
type T;
identifier i;
constant C;
@@

(
extern T i;
|
- T i;
  <+... when != i
- i = C;
  ...+>
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/ndisc.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 0d33a7d32125..51557c27a0cd 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -1420,7 +1420,6 @@ void ndisc_send_redirect(struct sk_buff *skb, struct neighbour *neigh,
 	u8 *opt;
 	int rd_len;
 	int err;
-	int hlen;
 	u8 ha_buf[MAX_ADDR_LEN], *ha = NULL;
 
 	dev = skb->dev;
@@ -1491,7 +1490,6 @@ void ndisc_send_redirect(struct sk_buff *skb, struct neighbour *neigh,
 		return;
 	}
 
-	hlen = 0;
 
 	skb_reserve(buff, LL_RESERVED_SPACE(dev));
 	ip6_nd_hdr(sk, buff, dev, &saddr_buf, &ipv6_hdr(skb)->saddr,

From 53a6201fdfa04accc91ea1a7accce8e8bc37ef8e Mon Sep 17 00:00:00 2001
From: Julia Lawall <julia@diku.dk>
Date: Sat, 22 Mar 2008 18:05:33 -0700
Subject: [PATCH 08/23] [9P] net/9p/trans_fd.c: remove unused variable

The variable cb is initialized but never used otherwise.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
type T;
identifier i;
constant C;
@@

(
extern T i;
|
- T i;
  <+... when != i
- i = C;
  ...+>
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/9p/trans_fd.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 1aa9d5175398..4e8d4e724b96 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -861,7 +861,6 @@ static void p9_mux_free_request(struct p9_conn *m, struct p9_req *req)
 
 static void p9_mux_flush_cb(struct p9_req *freq, void *a)
 {
-	p9_conn_req_callback cb;
 	int tag;
 	struct p9_conn *m;
 	struct p9_req *req, *rreq, *rptr;
@@ -872,7 +871,6 @@ static void p9_mux_flush_cb(struct p9_req *freq, void *a)
 		freq->tcall->params.tflush.oldtag);
 
 	spin_lock(&m->lock);
-	cb = NULL;
 	tag = freq->tcall->params.tflush.oldtag;
 	req = NULL;
 	list_for_each_entry_safe(rreq, rptr, &m->req_list, req_list) {

From 2572c149a2f52232ce690ddb9c6fd0c90ffd61cd Mon Sep 17 00:00:00 2001
From: Eliezer Tamir <eliezert@broadcom.com>
Date: Sun, 23 Mar 2008 03:07:45 -0700
Subject: [PATCH 09/23] BNX2X: prevent ethtool from setting port type

On 10GBaseT boards setting the type to TP will cause the driver to try
to configure 1GBaseT.
Since there are currently no boards that support setting of the port
type, disable this for now.

Signed-off-by: Eliezer Tamir <eliezert@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/bnx2x.c | 36 ++----------------------------------
 1 file changed, 2 insertions(+), 34 deletions(-)

diff --git a/drivers/net/bnx2x.c b/drivers/net/bnx2x.c
index 8af142ccf373..de32b3fba322 100644
--- a/drivers/net/bnx2x.c
+++ b/drivers/net/bnx2x.c
@@ -63,8 +63,8 @@
 #include "bnx2x.h"
 #include "bnx2x_init.h"
 
-#define DRV_MODULE_VERSION      "1.40.22"
-#define DRV_MODULE_RELDATE      "2007/11/27"
+#define DRV_MODULE_VERSION      "1.42.3"
+#define DRV_MODULE_RELDATE      "2008/3/9"
 #define BNX2X_BC_VER    	0x040200
 
 /* Time in jiffies before concluding the transmitter is hung. */
@@ -8008,38 +8008,6 @@ static int bnx2x_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
 	   cmd->duplex, cmd->port, cmd->phy_address, cmd->transceiver,
 	   cmd->autoneg, cmd->maxtxpkt, cmd->maxrxpkt);
 
-	switch (cmd->port) {
-	case PORT_TP:
-		if (!(bp->supported & SUPPORTED_TP)) {
-			DP(NETIF_MSG_LINK, "TP not supported\n");
-			return -EINVAL;
-		}
-
-		if (bp->phy_flags & PHY_XGXS_FLAG) {
-			bnx2x_link_reset(bp);
-			bnx2x_link_settings_supported(bp, SWITCH_CFG_1G);
-			bnx2x_phy_deassert(bp);
-		}
-		break;
-
-	case PORT_FIBRE:
-		if (!(bp->supported & SUPPORTED_FIBRE)) {
-			DP(NETIF_MSG_LINK, "FIBRE not supported\n");
-			return -EINVAL;
-		}
-
-		if (!(bp->phy_flags & PHY_XGXS_FLAG)) {
-			bnx2x_link_reset(bp);
-			bnx2x_link_settings_supported(bp, SWITCH_CFG_10G);
-			bnx2x_phy_deassert(bp);
-		}
-		break;
-
-	default:
-		DP(NETIF_MSG_LINK, "Unknown port type\n");
-		return -EINVAL;
-	}
-
 	if (cmd->autoneg == AUTONEG_ENABLE) {
 		if (!(bp->supported & SUPPORTED_Autoneg)) {
 			DP(NETIF_MSG_LINK, "Aotoneg not supported\n");

From da990a2402aeaee84837f29054c4628eb02f7493 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Sun, 23 Mar 2008 03:35:12 -0700
Subject: [PATCH 10/23] [SUNGEM]: Fix NAPI assertion failure.

As reported by Johannes Berg:

I started getting this warning with recent kernels:

[  773.908927] ------------[ cut here ]------------
[  773.908954] Badness at net/core/dev.c:2204
 ...

If we loop more than once in gem_poll(), we'll
use more than the real budget in our gem_rx()
calls, thus eventually trigger the caller's
assertions in net_rx_action().

Subtract "work_done" from "budget" for the second
arg to gem_rx() to fix the bug.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/sungem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/sungem.c b/drivers/net/sungem.c
index 97212799c513..4291458955ef 100644
--- a/drivers/net/sungem.c
+++ b/drivers/net/sungem.c
@@ -912,7 +912,7 @@ static int gem_poll(struct napi_struct *napi, int budget)
 		 * rx ring - must call napi_disable(), which
 		 * schedule_timeout()'s if polling is already disabled.
 		 */
-		work_done += gem_rx(gp, budget);
+		work_done += gem_rx(gp, budget - work_done);
 
 		if (work_done >= budget)
 			return work_done;

From dbee0d3f4603b9d0e56234a0743321fe4dad31ca Mon Sep 17 00:00:00 2001
From: Wang Chen <wangchen@cn.fujitsu.com>
Date: Sun, 23 Mar 2008 21:45:36 -0700
Subject: [PATCH 11/23] [ATM]: When proc_create() fails, do some error handling
 work and return -ENOMEM.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/atm/clip.c | 19 ++++++++++++++++---
 net/atm/lec.c  |  4 ++++
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/net/atm/clip.c b/net/atm/clip.c
index d30167c0b48e..2ab1e36098fd 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -947,6 +947,8 @@ static const struct file_operations arp_seq_fops = {
 };
 #endif
 
+static void atm_clip_exit_noproc(void);
+
 static int __init atm_clip_init(void)
 {
 	neigh_table_init_no_netlink(&clip_tbl);
@@ -963,18 +965,22 @@ static int __init atm_clip_init(void)
 		struct proc_dir_entry *p;
 
 		p = proc_create("arp", S_IRUGO, atm_proc_root, &arp_seq_fops);
+		if (!p) {
+			printk(KERN_ERR "Unable to initialize "
+			       "/proc/net/atm/arp\n");
+			atm_clip_exit_noproc();
+			return -ENOMEM;
+		}
 	}
 #endif
 
 	return 0;
 }
 
-static void __exit atm_clip_exit(void)
+static void atm_clip_exit_noproc(void)
 {
 	struct net_device *dev, *next;
 
-	remove_proc_entry("arp", atm_proc_root);
-
 	unregister_inetaddr_notifier(&clip_inet_notifier);
 	unregister_netdevice_notifier(&clip_dev_notifier);
 
@@ -1005,6 +1011,13 @@ static void __exit atm_clip_exit(void)
 	clip_tbl_hook = NULL;
 }
 
+static void __exit atm_clip_exit(void)
+{
+	remove_proc_entry("arp", atm_proc_root);
+
+	atm_clip_exit_noproc();
+}
+
 module_init(atm_clip_init);
 module_exit(atm_clip_exit);
 MODULE_AUTHOR("Werner Almesberger");
diff --git a/net/atm/lec.c b/net/atm/lec.c
index 0e450d12f035..a2efa7ff41f1 100644
--- a/net/atm/lec.c
+++ b/net/atm/lec.c
@@ -1250,6 +1250,10 @@ static int __init lane_module_init(void)
 	struct proc_dir_entry *p;
 
 	p = proc_create("lec", S_IRUGO, atm_proc_root, &lec_seq_fops);
+	if (!p) {
+		printk(KERN_ERR "Unable to initialize /proc/net/atm/lec\n");
+		return -ENOMEM;
+	}
 #endif
 
 	register_atm_ioctl(&lane_ioctl_ops);

From 4b1b366721101f2f0d2350fbdccb679f7909cf57 Mon Sep 17 00:00:00 2001
From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Sun, 23 Mar 2008 21:51:12 -0700
Subject: [PATCH 12/23] connector: convert to single-threaded workqueue

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

We don't need one cqueue thread for each CPU.  cqueue is used for
receiving userspace datagrams, which are very rare and thus will
happily live with a single queue.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/connector/cn_queue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/connector/cn_queue.c b/drivers/connector/cn_queue.c
index 5732ca3259f9..b6fe7e7a2c2f 100644
--- a/drivers/connector/cn_queue.c
+++ b/drivers/connector/cn_queue.c
@@ -146,7 +146,7 @@ struct cn_queue_dev *cn_queue_alloc_dev(char *name, struct sock *nls)
 
 	dev->nls = nls;
 
-	dev->cn_queue = create_workqueue(dev->name);
+	dev->cn_queue = create_singlethread_workqueue(dev->name);
 	if (!dev->cn_queue) {
 		kfree(dev);
 		return NULL;

From 8f3ea33a5078a09eba12bfe57424507809367756 Mon Sep 17 00:00:00 2001
From: Martin Devera <devik@cdi.cz>
Date: Sun, 23 Mar 2008 22:00:38 -0700
Subject: [PATCH 13/23] sch_htb: fix "too many events" situation

HTB is event driven algorithm and part of its work is to apply
scheduled events at proper times. It tried to defend itself from
livelock by processing only limited number of events per dequeue.
Because of faster computers some users already hit this hardcoded
limit.

This patch limits processing up to 2 jiffies (why not 1 jiffie ?
because it might stop prematurely when only fraction of jiffie
remains).

Signed-off-by: Martin Devera <devik@cdi.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sched/sch_htb.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 795c761ad99f..66148cc4759e 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -711,9 +711,11 @@ static void htb_charge_class(struct htb_sched *q, struct htb_class *cl,
  */
 static psched_time_t htb_do_events(struct htb_sched *q, int level)
 {
-	int i;
-
-	for (i = 0; i < 500; i++) {
+	/* don't run for longer than 2 jiffies; 2 is used instead of
+	   1 to simplify things when jiffy is going to be incremented
+	   too soon */
+	unsigned long stop_at = jiffies + 2;
+	while (time_before(jiffies, stop_at)) {
 		struct htb_class *cl;
 		long diff;
 		struct rb_node *p = rb_first(&q->wait_pq[level]);
@@ -731,9 +733,8 @@ static psched_time_t htb_do_events(struct htb_sched *q, int level)
 		if (cl->cmode != HTB_CAN_SEND)
 			htb_add_to_wait_tree(q, cl, diff);
 	}
-	if (net_ratelimit())
-		printk(KERN_WARNING "htb: too many events !\n");
-	return q->now + PSCHED_TICKS_PER_SEC / 10;
+	/* too much load - let's continue on next jiffie */
+	return q->now + PSCHED_TICKS_PER_SEC / HZ;
 }
 
 /* Returns class->node+prio from id-tree where classe's id is >= id. NULL

From 1f17131bb46065141069dee9fbcc4bdd0e9c2a2e Mon Sep 17 00:00:00 2001
From: "Robert P. J. Day" <rpjday@crashcourse.ca>
Date: Sun, 23 Mar 2008 22:48:29 -0700
Subject: [PATCH 14/23] [SPARC64]: Use shorter list_splice_init() for brevity.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc64/kernel/ds.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/sparc64/kernel/ds.c b/arch/sparc64/kernel/ds.c
index bd76482077be..edb74f5a1186 100644
--- a/arch/sparc64/kernel/ds.c
+++ b/arch/sparc64/kernel/ds.c
@@ -972,8 +972,7 @@ static void process_ds_work(void)
 	LIST_HEAD(todo);
 
 	spin_lock_irqsave(&ds_lock, flags);
-	list_splice(&ds_work_list, &todo);
-	INIT_LIST_HEAD(&ds_work_list);
+	list_splice_init(&ds_work_list, &todo);
 	spin_unlock_irqrestore(&ds_lock, flags);
 
 	list_for_each_entry_safe(qp, tmp, &todo, list) {

From 6d008153234c4cccae7bb0170defeea18258db4a Mon Sep 17 00:00:00 2001
From: Roland McGrath <roland@redhat.com>
Date: Sun, 23 Mar 2008 22:50:16 -0700
Subject: [PATCH 15/23] [SPARC64]: exec PT_DTRACE

The PT_DTRACE flag is meaningless and obsolete.
Don't touch it.

Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 arch/sparc64/kernel/process.c     | 3 ---
 arch/sparc64/kernel/sys_sparc32.c | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/arch/sparc64/kernel/process.c b/arch/sparc64/kernel/process.c
index e116e38b160e..acf8c5250aa9 100644
--- a/arch/sparc64/kernel/process.c
+++ b/arch/sparc64/kernel/process.c
@@ -731,9 +731,6 @@ asmlinkage int sparc_execve(struct pt_regs *regs)
 		current_thread_info()->xfsr[0] = 0;
 		current_thread_info()->fpsaved[0] = 0;
 		regs->tstate &= ~TSTATE_PEF;
-		task_lock(current);
-		current->ptrace &= ~PT_DTRACE;
-		task_unlock(current);
 	}
 out:
 	return error;
diff --git a/arch/sparc64/kernel/sys_sparc32.c b/arch/sparc64/kernel/sys_sparc32.c
index deaba2bd0535..2455fa498876 100644
--- a/arch/sparc64/kernel/sys_sparc32.c
+++ b/arch/sparc64/kernel/sys_sparc32.c
@@ -678,9 +678,6 @@ asmlinkage long sparc32_execve(struct pt_regs *regs)
 		current_thread_info()->xfsr[0] = 0;
 		current_thread_info()->fpsaved[0] = 0;
 		regs->tstate &= ~TSTATE_PEF;
-		task_lock(current);
-		current->ptrace &= ~PT_DTRACE;
-		task_unlock(current);
 	}
 out:
 	return error;

From cfe666b145cecffe784d98e60ffe201a5dc57ac3 Mon Sep 17 00:00:00 2001
From: Paul Mackerras <paulus@samba.org>
Date: Mon, 24 Mar 2008 17:41:22 +1100
Subject: [PATCH 16/23] [POWERPC] Don't use 64k pages for ioremap on pSeries

On pSeries, the hypervisor doesn't let us map in the eHEA ethernet
adapter using 64k pages, and thus the ehea driver will fail if 64k
pages are configured.  This works around the problem by always
using 4k pages for ioremap on pSeries (but not on other platforms).
A better fix would be to check whether the partition could ever
have an eHEA adapter, and only force 4k pages if it could, but this
will do for 2.6.25.

This is based on an earlier patch by Tony Breeds.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/mm/hash_utils_64.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
index 590f1f67c874..a83dfa3cf40c 100644
--- a/arch/powerpc/mm/hash_utils_64.c
+++ b/arch/powerpc/mm/hash_utils_64.c
@@ -351,9 +351,14 @@ static void __init htab_init_page_sizes(void)
 		mmu_vmalloc_psize = MMU_PAGE_64K;
 		if (mmu_linear_psize == MMU_PAGE_4K)
 			mmu_linear_psize = MMU_PAGE_64K;
-		if (cpu_has_feature(CPU_FTR_CI_LARGE_PAGE))
-			mmu_io_psize = MMU_PAGE_64K;
-		else
+		if (cpu_has_feature(CPU_FTR_CI_LARGE_PAGE)) {
+			/*
+			 * Don't use 64k pages for ioremap on pSeries, since
+			 * that would stop us accessing the HEA ethernet.
+			 */
+			if (!machine_is(pseries))
+				mmu_io_psize = MMU_PAGE_64K;
+		} else
 			mmu_ci_restrictions = 1;
 	}
 #endif /* CONFIG_PPC_64K_PAGES */

From 1428a9fa586cb80acf98289f797f58b8bd662598 Mon Sep 17 00:00:00 2001
From: Olaf Hering <olaf@aepfle.de>
Date: Tue, 18 Mar 2008 06:53:05 +1100
Subject: [PATCH 17/23] [POWERPC] Fix crash in init_ipic_sysfs on efika

The global primary_ipic in arch/powerpc/sysdev/ipic.c can remain NULL
if ipic_init() fails, which will happen on machines that don't have an
ipic interrupt controller.  init_ipic_sysfs() will crash in that case.

Acked-by: Grant Likely <grant.likely@secretlab.ca>

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/sysdev/ipic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/ipic.c b/arch/powerpc/sysdev/ipic.c
index ae0dbf4c1d66..0f2dfb0aaa6a 100644
--- a/arch/powerpc/sysdev/ipic.c
+++ b/arch/powerpc/sysdev/ipic.c
@@ -906,7 +906,7 @@ static int __init init_ipic_sysfs(void)
 {
 	int rc;
 
-	if (!primary_ipic->regs)
+	if (!primary_ipic || !primary_ipic->regs)
 		return -ENODEV;
 	printk(KERN_DEBUG "Registering ipic with sysfs...\n");
 

From b8c19eb16a7e6df57d0f6d67e42ce026e5d5930b Mon Sep 17 00:00:00 2001
From: Grant Likely <grant.likely@secretlab.ca>
Date: Sat, 22 Mar 2008 14:20:29 +1100
Subject: [PATCH 18/23] [POWERPC] mpc5200-fec: Fix possible NULL dereference in
 mdio driver

If the reg property is missing from the phy node (unlikely, but possible),
then the kernel will oops with a NULL pointer dereference.  This fixes
it by checking the pointer first.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 drivers/net/fec_mpc52xx_phy.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/fec_mpc52xx_phy.c b/drivers/net/fec_mpc52xx_phy.c
index 1837584c4504..6a3ac4ea97e9 100644
--- a/drivers/net/fec_mpc52xx_phy.c
+++ b/drivers/net/fec_mpc52xx_phy.c
@@ -109,7 +109,8 @@ static int mpc52xx_fec_mdio_probe(struct of_device *of, const struct of_device_i
 		int irq = irq_of_parse_and_map(child, 0);
 		if (irq != NO_IRQ) {
 			const u32 *id = of_get_property(child, "reg", NULL);
-			bus->irq[*id] = irq;
+			if (id)
+				bus->irq[*id] = irq;
 		}
 	}
 

From 9560aea4e9d17cb75113c6051e800222fd5c71a4 Mon Sep 17 00:00:00 2001
From: Grant Likely <grant.likely@secretlab.ca>
Date: Sat, 22 Mar 2008 14:41:05 +1100
Subject: [PATCH 19/23] [POWERPC] mpc5200: Fix null dereference if bestcomm
 fails to initialize

If the bestcomm initialization fails, calls to the task allocate
function should fail gracefully instead of oopsing with a NULL deref.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/sysdev/bestcomm/bestcomm.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/sysdev/bestcomm/bestcomm.c b/arch/powerpc/sysdev/bestcomm/bestcomm.c
index f589999361e0..b18cab55a76d 100644
--- a/arch/powerpc/sysdev/bestcomm/bestcomm.c
+++ b/arch/powerpc/sysdev/bestcomm/bestcomm.c
@@ -52,6 +52,10 @@ bcom_task_alloc(int bd_count, int bd_size, int priv_size)
 	int i, tasknum = -1;
 	struct bcom_task *tsk;
 
+	/* Don't try to do anything if bestcomm init failed */
+	if (!bcom_eng)
+		return NULL;
+
 	/* Get and reserve a task num */
 	spin_lock(&bcom_eng->lock);
 

From 7ea6fd7e2df041297298b5feb5b7b78a2b1a5310 Mon Sep 17 00:00:00 2001
From: Anatolij Gustschin <agust@denx.de>
Date: Sat, 22 Mar 2008 21:49:05 +1100
Subject: [PATCH 20/23] [POWERPC] Fix Oops with TQM5200 on TQM5200

The "bestcomm-core" driver defines its of_match table as follows

static struct of_device_id mpc52xx_bcom_of_match[] = {
	{ .type = "dma-controller", .compatible = "fsl,mpc5200-bestcomm", },
	{ .type = "dma-controller", .compatible = "mpc5200-bestcomm", },
	{},
};

so while registering the driver, the driver's probe function won't be
called, because the device tree node doesn't have a device_type
property.  Thus the driver's bcom_engine structure won't be allocated.
Referencing this structure later causes observed Oops.

Checking bcom_eng pointer for NULL before referencing data pointed
by it prevents oopsing, but fec driver still doesn't work (because
of the lost bestcomm match and resulted task allocation failure).
Actually the compatible property exists and should match and so
the fec driver should work.

This removes .type = "dma-controller" from the bestcomm driver's
mpc52xx_bcom_of_match table to solve the problem.

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/sysdev/bestcomm/bestcomm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/sysdev/bestcomm/bestcomm.c b/arch/powerpc/sysdev/bestcomm/bestcomm.c
index b18cab55a76d..64ec7d629363 100644
--- a/arch/powerpc/sysdev/bestcomm/bestcomm.c
+++ b/arch/powerpc/sysdev/bestcomm/bestcomm.c
@@ -488,8 +488,8 @@ mpc52xx_bcom_remove(struct of_device *op)
 }
 
 static struct of_device_id mpc52xx_bcom_of_match[] = {
-	{ .type = "dma-controller", .compatible = "fsl,mpc5200-bestcomm", },
-	{ .type = "dma-controller", .compatible = "mpc5200-bestcomm", },
+	{ .compatible = "fsl,mpc5200-bestcomm", },
+	{ .compatible = "mpc5200-bestcomm", },
 	{},
 };
 

From 92896bd9fd75b1c993b92874d339a8088bb75560 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 24 Mar 2008 11:07:15 -0700
Subject: [PATCH 21/23] Don't 'printk()' while holding xtime lock for writing

The printk() can deadlock because it can wake up klogd(), and
task enqueueing will try to read the time in order to set a hrtimer.

Reported-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Debugged-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 kernel/time/timekeeping.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 671af612b768..a3fa587c350c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -191,8 +191,12 @@ static void change_clocksource(void)
 
 	tick_clock_notify();
 
+	/*
+	 * We're holding xtime lock and waking up klogd would deadlock
+	 * us on enqueue.  So no printing!
 	printk(KERN_INFO "Time: %s clocksource has been installed.\n",
 	       clock->name);
+	 */
 }
 #else
 static inline void change_clocksource(void) { }

From b9e76a00749521f2b080fa8a4fb15f66538ab756 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Mon, 24 Mar 2008 11:22:39 -0700
Subject: [PATCH 22/23] x86-32: Pass the full resource data to ioremap()

It appears that 64-bit PCI resources cannot possibly ever have worked on
x86-32 even when the RESOURCES_64BIT config option was set, because any
driver that tried to [pci_]ioremap() the resource would have been unable
to do so because the high 32 bits would have been silently dropped on
the floor by the ioremap() routines that only used "unsigned long".

Change them to use "resource_size_t" instead, which properly encodes the
whole 64-bit resource data if RESOURCES_64BIT is enabled.

Acked-by: H. Peter Anvin <hpa@kernel.org>
Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 arch/x86/mm/ioremap.c   | 6 +++---
 include/asm-x86/io_32.h | 6 +++---
 include/asm-x86/io_64.h | 6 +++---
 lib/iomap.c             | 2 +-
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 8fe576baa148..4afaba0ed722 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -106,7 +106,7 @@ static int ioremap_change_attr(unsigned long vaddr, unsigned long size,
  * have to convert them into an offset in a page-aligned mapping, but the
  * caller shouldn't need to know that small detail.
  */
-static void __iomem *__ioremap(unsigned long phys_addr, unsigned long size,
+static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size,
 			       enum ioremap_mode mode)
 {
 	unsigned long pfn, offset, last_addr, vaddr;
@@ -193,13 +193,13 @@ static void __iomem *__ioremap(unsigned long phys_addr, unsigned long size,
  *
  * Must be freed with iounmap.
  */
-void __iomem *ioremap_nocache(unsigned long phys_addr, unsigned long size)
+void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size)
 {
 	return __ioremap(phys_addr, size, IOR_MODE_UNCACHED);
 }
 EXPORT_SYMBOL(ioremap_nocache);
 
-void __iomem *ioremap_cache(unsigned long phys_addr, unsigned long size)
+void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size)
 {
 	return __ioremap(phys_addr, size, IOR_MODE_CACHED);
 }
diff --git a/include/asm-x86/io_32.h b/include/asm-x86/io_32.h
index 58d2c45cd0b1..d4d8fbd9378c 100644
--- a/include/asm-x86/io_32.h
+++ b/include/asm-x86/io_32.h
@@ -114,13 +114,13 @@ static inline void * phys_to_virt(unsigned long address)
  * If the area you are trying to map is a PCI BAR you should have a
  * look at pci_iomap().
  */
-extern void __iomem *ioremap_nocache(unsigned long offset, unsigned long size);
-extern void __iomem *ioremap_cache(unsigned long offset, unsigned long size);
+extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size);
+extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size);
 
 /*
  * The default ioremap() behavior is non-cached:
  */
-static inline void __iomem *ioremap(unsigned long offset, unsigned long size)
+static inline void __iomem *ioremap(resource_size_t offset, unsigned long size)
 {
 	return ioremap_nocache(offset, size);
 }
diff --git a/include/asm-x86/io_64.h b/include/asm-x86/io_64.h
index f64a59cc396d..db0be2011a3c 100644
--- a/include/asm-x86/io_64.h
+++ b/include/asm-x86/io_64.h
@@ -158,13 +158,13 @@ extern void early_iounmap(void *addr, unsigned long size);
  * it's useful if some control registers are in such an area and write combining
  * or read caching is not desirable:
  */
-extern void __iomem *ioremap_nocache(unsigned long offset, unsigned long size);
-extern void __iomem *ioremap_cache(unsigned long offset, unsigned long size);
+extern void __iomem *ioremap_nocache(resource_size_t offset, unsigned long size);
+extern void __iomem *ioremap_cache(resource_size_t offset, unsigned long size);
 
 /*
  * The default ioremap() behavior is non-cached:
  */
-static inline void __iomem *ioremap(unsigned long offset, unsigned long size)
+static inline void __iomem *ioremap(resource_size_t offset, unsigned long size)
 {
 	return ioremap_nocache(offset, size);
 }
diff --git a/lib/iomap.c b/lib/iomap.c
index db004a9ff509..dd6ca48fe6b0 100644
--- a/lib/iomap.c
+++ b/lib/iomap.c
@@ -256,7 +256,7 @@ EXPORT_SYMBOL(ioport_unmap);
  * */
 void __iomem *pci_iomap(struct pci_dev *dev, int bar, unsigned long maxlen)
 {
-	unsigned long start = pci_resource_start(dev, bar);
+	resource_size_t start = pci_resource_start(dev, bar);
 	unsigned long len = pci_resource_len(dev, bar);
 	unsigned long flags = pci_resource_flags(dev, bar);
 

From d3073779f8362d64b804882f5f41c208c4a5e11e Mon Sep 17 00:00:00 2001
From: Roland Dreier <rdreier@cisco.com>
Date: Mon, 24 Mar 2008 12:03:03 -0400
Subject: [PATCH 23/23] SVCRDMA: Use only 1 RDMA read scatter entry for iWARP
 adapters

The iWARP protocol limits RDMA read requests to a single scatter
entry.  NFS/RDMA has code in rdma_read_max_sge() that is supposed to
limit the sge_count for RDMA read requests to 1, but the code to do
that is inside an #ifdef RDMA_TRANSPORT_IWARP block.  In the mainline
kernel at least, RDMA_TRANSPORT_IWARP is an enum and not a
preprocessor #define, so the #ifdef'ed code is never compiled.

In my test of a kernel build with -j8 on an NFS/RDMA mount, this
problem eventually leads to trouble starting with:

    svcrdma: Error posting send = -22
    svcrdma : RDMA_READ error = -22

and things go downhill from there.

The trivial fix is to delete the #ifdef guard.  The check seems to be
a remnant of when the NFS/RDMA code was not merged and needed to
compile against multiple kernel versions, although I don't think it
ever worked as intended.  In any case now that the code is upstream
there's no need to test whether the RDMA_TRANSPORT_IWARP constant is
defined or not.

Without this patch, my kernel build on an NFS/RDMA mount using NetEffect
adapters quickly and 100% reproducibly failed with an error like:

    ld: final link failed: Software caused connection abort

With the patch applied I was able to complete a kernel build on the
same setup.

(Tom Tucker says this is "actually an _ancient_ remnant when it had to
compile against iWARP vs. non-iWARP enabled OFA trees.")

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Acked-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index ab54a736486e..971271602dd0 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -237,14 +237,12 @@ static void rdma_set_ctxt_sge(struct svc_rdma_op_ctxt *ctxt,
 
 static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
 {
-#ifdef RDMA_TRANSPORT_IWARP
 	if ((RDMA_TRANSPORT_IWARP ==
 	     rdma_node_get_transport(xprt->sc_cm_id->
 				     device->node_type))
 	    && sge_count > 1)
 		return 1;
 	else
-#endif
 		return min_t(int, sge_count, xprt->sc_max_sge);
 }