From 027476642811f8559cbe00ef6cc54db230e48a20 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Mon, 2 Dec 2013 11:08:06 +0200 Subject: [PATCH 1/8] drm/i915: Take modeset locks around intel_modeset_setup_hw_state() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Some lower level things get angry if we don't have modeset locks during intel_modeset_setup_hw_state(). Actually the resume and lid_notify codepaths alreday hold the locks, but the init codepath doesn't, so fix that. Note: This slipped through since we only disable pipes if the plane/pipe linking doesn't match. Which is only relevant on older gen3 mobile machines, if the BIOS fails to set up our preferred linking. Signed-off-by: Ville Syrjälä Cc: stable@vger.kernel.org Tested-and-reported-by: Paul Bolle [danvet: Add note now that I could confirm my theory with the log files Paul Bolle provided.] Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/intel_display.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 080f6fd4e839..114db51996bd 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -11046,7 +11046,9 @@ void intel_modeset_gem_init(struct drm_device *dev) intel_setup_overlay(dev); + drm_modeset_lock_all(dev); intel_modeset_setup_hw_state(dev, false); + drm_modeset_unlock_all(dev); } void intel_modeset_cleanup(struct drm_device *dev) From edd5b13313551d6b04acfb90d3db58ed3cf3c814 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Mon, 2 Dec 2013 17:39:09 +0000 Subject: [PATCH 2/8] drm/i915: Do not clobber config status after a forced restore of hw state We call intel_modeset_setup_hw_state() along two paths, driver load/resume and after a lid event notification. During initialisation of the driver, it is imperative that we reset the config state. This correctly sets up the initial connector statuses and prepares the hardware for a thorough probing. However, during a lid event, we only want to undo the damage caused by the bios by resetting our last known mode. In this cirumstance, we do not want to clobber our desired state. In order to try and keep sanity between the config state and our own tracking, do the drm_mode_config_reset() first along the load/resume paths before reading out the hw state and apply any definite known corrections. v2: "As discussed on irc I don't think we should force the connector state to anything here: Imo connector->status should reflect what we believe to be the true output connection state, whereas connector->encoder reflects whether this connector is wired up to a pipe. And since we no longer reject modeset on disconnected connectors and never nuked the pipe if the connector gets disconnected there's no reason for that - such policy is userspace's job. This regression has been introduced in commit 2e9388923e83bc4e2726f170a984621f1d582e77 Author: Daniel Vetter Date: Thu Oct 11 20:08:24 2012 +0200 drm/i915/crt: explicitly set up HOTPLUG_BITS on resume" so sayeth Daniel. Signed-off-by: Chris Wilson Cc: Daniel Vetter Cc: stable@vger.kernel.org (v3.8 and later) Cc: Paulo Zanoni Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_drv.c | 1 + drivers/gpu/drm/i915/intel_display.c | 3 +-- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 2e367a1c6a64..5b7b7e06cb3a 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -651,6 +651,7 @@ static int __i915_drm_thaw(struct drm_device *dev, bool restore_gtt_mappings) intel_modeset_init_hw(dev); drm_modeset_lock_all(dev); + drm_mode_config_reset(dev); intel_modeset_setup_hw_state(dev, true); drm_modeset_unlock_all(dev); diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 114db51996bd..8e44b16950cf 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -11036,8 +11036,6 @@ void intel_modeset_setup_hw_state(struct drm_device *dev, } intel_modeset_check_state(dev); - - drm_mode_config_reset(dev); } void intel_modeset_gem_init(struct drm_device *dev) @@ -11047,6 +11045,7 @@ void intel_modeset_gem_init(struct drm_device *dev) intel_setup_overlay(dev); drm_modeset_lock_all(dev); + drm_mode_config_reset(dev); intel_modeset_setup_hw_state(dev, false); drm_modeset_unlock_all(dev); } From 5ae68b413214e847a2b5c6d3c65778482542bc1a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Mon, 2 Dec 2013 11:23:39 +0200 Subject: [PATCH 3/8] drm/i915: Skip clock checks on BDW MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We don't have clock state readout support for DDI, so skip the pipe config clock checks on all DDI platforms. Signed-off-by: Ville Syrjälä Reviewed-by: Damien Lespiau Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/intel_display.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 8e44b16950cf..8b8bde7dce53 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -9135,7 +9135,7 @@ intel_pipe_config_compare(struct drm_device *dev, if (IS_G4X(dev) || INTEL_INFO(dev)->gen >= 5) PIPE_CONF_CHECK_I(pipe_bpp); - if (!IS_HASWELL(dev)) { + if (!HAS_DDI(dev)) { PIPE_CONF_CHECK_CLOCK_FUZZY(adjusted_mode.crtc_clock); PIPE_CONF_CHECK_CLOCK_FUZZY(port_clock); } From 0d1430a3f4b7cfd8779b78740a4182321f3ca7f3 Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Wed, 4 Dec 2013 14:52:06 +0000 Subject: [PATCH 4/8] drm/i915: Hold mutex across i915_gem_release Inorder to serialise the closing of the file descriptor and its subsequent release of client requests with i915_gem_free_request(), we need to hold the struct_mutex in i915_gem_release(). Failing to do so has the potential to trigger an OOPS, later with a use-after-free. Testcase: igt/gem_close_race Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70874 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71029 Reported-by: Eric Anholt Signed-off-by: Chris Wilson Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_dma.c | 2 ++ drivers/gpu/drm/i915/i915_gem_context.c | 2 -- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 0cab2d045135..ac9dac99b585 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1848,8 +1848,10 @@ void i915_driver_lastclose(struct drm_device * dev) void i915_driver_preclose(struct drm_device * dev, struct drm_file *file_priv) { + mutex_lock(&dev->struct_mutex); i915_gem_context_close(dev, file_priv); i915_gem_release(dev, file_priv); + mutex_unlock(&dev->struct_mutex); } void i915_driver_postclose(struct drm_device *dev, struct drm_file *file) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 72a3df32292f..4a05956162c1 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -347,10 +347,8 @@ void i915_gem_context_close(struct drm_device *dev, struct drm_file *file) { struct drm_i915_file_private *file_priv = file->driver_priv; - mutex_lock(&dev->struct_mutex); idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL); idr_destroy(&file_priv->context_idr); - mutex_unlock(&dev->struct_mutex); } static struct i915_hw_context * From f742a55231b950ed71ac8cc8beec0944ddda312c Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Fri, 6 Dec 2013 10:17:53 +0100 Subject: [PATCH 5/8] drm/i915: fix pm init ordering Shovel a bit more of the the code into the setup function, and call it earlier. Otherwise lockdep is unhappy since we cancel the delayed resume work before it's initialized. While at it also shovel the pc8 setup code into the same functions. I wanted to also ditch the header declaration of the hws pc8 functions, but for unfathomable reasons that stuff is in intel_display.c instead of intel_pm.c. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71980 Tested-by: Guo Jinxian Reviewed-by: Damien Lespiau Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_dma.c | 10 +--------- drivers/gpu/drm/i915/i915_drv.h | 2 -- drivers/gpu/drm/i915/intel_drv.h | 1 + drivers/gpu/drm/i915/intel_pm.c | 11 ++++++++++- 4 files changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index ac9dac99b585..197db0993a24 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -1490,16 +1490,9 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags) spin_lock_init(&dev_priv->uncore.lock); spin_lock_init(&dev_priv->mm.object_stat_lock); mutex_init(&dev_priv->dpio_lock); - mutex_init(&dev_priv->rps.hw_lock); mutex_init(&dev_priv->modeset_restore_lock); - mutex_init(&dev_priv->pc8.lock); - dev_priv->pc8.requirements_met = false; - dev_priv->pc8.gpu_idle = false; - dev_priv->pc8.irqs_disabled = false; - dev_priv->pc8.enabled = false; - dev_priv->pc8.disable_count = 2; /* requirements_met + gpu_idle */ - INIT_DELAYED_WORK(&dev_priv->pc8.enable_work, hsw_enable_pc8_work); + intel_pm_setup(dev); intel_display_crc_init(dev); @@ -1603,7 +1596,6 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags) } intel_irq_init(dev); - intel_pm_init(dev); intel_uncore_sanitize(dev); /* Try to make sure MCHBAR is enabled before poking at it */ diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index ccdbecca070d..6f04fa4b31fd 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1901,9 +1901,7 @@ void i915_queue_hangcheck(struct drm_device *dev); void i915_handle_error(struct drm_device *dev, bool wedged); extern void intel_irq_init(struct drm_device *dev); -extern void intel_pm_init(struct drm_device *dev); extern void intel_hpd_init(struct drm_device *dev); -extern void intel_pm_init(struct drm_device *dev); extern void intel_uncore_sanitize(struct drm_device *dev); extern void intel_uncore_early_sanitize(struct drm_device *dev); diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index a18e88b3e425..79f91f26e288 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -821,6 +821,7 @@ void intel_update_sprite_watermarks(struct drm_plane *plane, uint32_t sprite_width, int pixel_size, bool enabled, bool scaled); void intel_init_pm(struct drm_device *dev); +void intel_pm_setup(struct drm_device *dev); bool intel_fbc_enabled(struct drm_device *dev); void intel_update_fbc(struct drm_device *dev); void intel_gpu_ips_init(struct drm_i915_private *dev_priv); diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 6e0d5e075b15..e0dec95c764e 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -6130,10 +6130,19 @@ int vlv_freq_opcode(int ddr_freq, int val) return val; } -void intel_pm_init(struct drm_device *dev) +void intel_pm_setup(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; + mutex_init(&dev_priv->rps.hw_lock); + + mutex_init(&dev_priv->pc8.lock); + dev_priv->pc8.requirements_met = false; + dev_priv->pc8.gpu_idle = false; + dev_priv->pc8.irqs_disabled = false; + dev_priv->pc8.enabled = false; + dev_priv->pc8.disable_count = 2; /* requirements_met + gpu_idle */ + INIT_DELAYED_WORK(&dev_priv->pc8.enable_work, hsw_enable_pc8_work); INIT_DELAYED_WORK(&dev_priv->rps.delayed_resume_work, intel_gen6_powersave_work); } From acc240d41ea1ab9c488a79219fb313b5b46265ae Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Thu, 5 Dec 2013 15:42:34 +0100 Subject: [PATCH 6/8] drm/i915: Fix use-after-free in do_switch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit So apparently under ridiculous amounts of memory pressure we can get into trouble in do_switch when we try to move the old hw context backing storage object onto the active lists. With list debugging enabled that usually results in us chasing a poisoned pointer - which means we've hit upon a vma that has been removed from all lrus with list_del (and then deallocated, so it's a real use-after free). Ian Lister has done some great callchain chasing and noticed that we can reenter do_switch: i915_gem_do_execbuffer() i915_switch_context() do_switch() from = ring->last_context; i915_gem_object_pin() i915_gem_object_bind_to_gtt() ret = drm_mm_insert_node_in_range_generic(); // If the above call fails then it will try i915_gem_evict_something() // If that fails it will call i915_gem_evict_everything() ... i915_gem_evict_everything() i915_gpu_idle() i915_switch_context(DEFAULT_CONTEXT) Like with everything else where the shrinker or eviction code can invalidate pointers we need to reload relevant state. Note that there's no need to recheck whether a context switch is still required because: - Doing a switch to the same context is harmless (besides wasting a bit of energy). - This can only happen with the default context. But since that one's pinned we'll never call down into evict_everything under normal circumstances. Note that there's a little driver bringup fun involved namely that we could recourse into do_switch for the initial switch. Atm we're fine since we assign the context pointer only after the call to do_switch at driver load or resume time. And in the gpu reset case we skip the entire setup sequence (which might be a bug on its own, but definitely not this one here). Cc'ing stable since apparently ChromeOS guys are seeing this in the wild (and not just on artificial stress tests), see the reference. Note that in upstream code doesn't calle evict_everything directly from evict_something, that's an extension in this product branch. But we can still hit upon this bug (and apparently we do, see the linked backtraces). I've noticed this while trying to construct a testcase for this bug and utterly failed to provoke it. It looks like we need to driver the system squarly into the lowmem wall and provoke the shrinker to evict the context object by doing the last-ditch evict_everything call. Aside: There's currently no means to get a badly-fragmenting hw context object away from a bad spot in the upstream code. We should fix this by at least adding some code to evict_something to handle hw contexts. References: https://code.google.com/p/chromium/issues/detail?id=248191 Reported-by: Ian Lister Cc: Ian Lister Cc: stable@vger.kernel.org Cc: Ben Widawsky Cc: Stéphane Marchesin Cc: Bloomfield, Jon Tested-by: Rafael Barbalho Reviewed-by: Ian Lister Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 4a05956162c1..b0f42b9ca037 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -421,11 +421,21 @@ static int do_switch(struct i915_hw_context *to) if (ret) return ret; - /* Clear this page out of any CPU caches for coherent swap-in/out. Note + /* + * Pin can switch back to the default context if we end up calling into + * evict_everything - as a last ditch gtt defrag effort that also + * switches to the default context. Hence we need to reload from here. + */ + from = ring->last_context; + + /* + * Clear this page out of any CPU caches for coherent swap-in/out. Note * that thanks to write = false in this call and us not setting any gpu * write domains when putting a context object onto the active list * (when switching away from it), this won't block. - * XXX: We need a real interface to do this instead of trickery. */ + * + * XXX: We need a real interface to do this instead of trickery. + */ ret = i915_gem_object_set_to_gtt_domain(to->obj, false); if (ret) { i915_gem_object_unpin(to->obj); From ad071acb53110c8efd26ff1e5b5d57449b43833b Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Mon, 9 Dec 2013 10:37:24 +0000 Subject: [PATCH 7/8] drm/i915: Repeat eviction search after idling the GPU With the advent of hw context support, we gained some objects that are pinned for the duration of their request. That is we can make aperture space available by idling the GPU and in the process performing a context switch back to the always-pinned default context. As such, we should not conclude that there is no space in the aperture for the current object until we have unpinned any such context objects. Note that we also have the problem of outstanding pageflips preventing eviction of their framebuffer objects to resolve. Testcase: igt/gem_ctx_exec/eviction Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72507 Signed-off-by: Chris Wilson Tested-by: lu hua Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_gem_evict.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index b7376533633d..8f3adc7d0dc8 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -88,6 +88,7 @@ i915_gem_evict_something(struct drm_device *dev, struct i915_address_space *vm, } else drm_mm_init_scan(&vm->mm, min_size, alignment, cache_level); +search_again: /* First see if there is a large enough contiguous idle region... */ list_for_each_entry(vma, &vm->inactive_list, mm_list) { if (mark_free(vma, &unwind_list)) @@ -115,10 +116,17 @@ none: list_del_init(&vma->exec_list); } - /* We expect the caller to unpin, evict all and try again, or give up. - * So calling i915_gem_evict_vm() is unnecessary. + /* Can we unpin some objects such as idle hw contents, + * or pending flips? */ - return -ENOSPC; + ret = nonblocking ? -ENOSPC : i915_gpu_idle(dev); + if (ret) + return ret; + + /* Only idle the GPU and repeat the search once */ + i915_gem_retire_requests(dev); + nonblocking = true; + goto search_again; found: /* drm_mm doesn't allow any other other operations while From 6c719faca2aceca72f1bf5b1645c1734ed3e9447 Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Tue, 10 Dec 2013 13:20:59 +0100 Subject: [PATCH 8/8] drm/i915: don't update the dri1 breadcrumb with modesetting The update is horribly racy since it doesn't protect at all against concurrent closing of the master fd. And it can't really since that requires us to grab a mutex. Instead of jumping through hoops and offloading this to a worker thread just block this bit of code for the modesetting driver. Note that the race is fairly easy to hit since we call the breadcrumb function for any interrupt. So the vblank interrupt (which usually keeps going for a bit) is enough. But even if we'd block this and only update the breadcrumb for user interrupts from the CS we could hit this race with kms/gem userspace: If a non-master is waiting somewhere (and hence has interrupts enabled) and the master closes its fd (probably due to crashing). v2: Add a code comment to explain why fixing this for real isn't really worth it. Also improve the commit message a bit. v3: Fix the spelling in the comment. Reported-by: Eugene Shatokhin Cc: Eugene Shatokhin Cc: stable@vger.kernel.org Acked-by: Chris Wilson Tested-by: Eugene Shatokhin Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_dma.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 197db0993a24..5c648425c1e0 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -83,6 +83,14 @@ void i915_update_dri1_breadcrumb(struct drm_device *dev) drm_i915_private_t *dev_priv = dev->dev_private; struct drm_i915_master_private *master_priv; + /* + * The dri breadcrumb update races against the drm master disappearing. + * Instead of trying to fix this (this is by far not the only ums issue) + * just don't do the update in kms mode. + */ + if (drm_core_check_feature(dev, DRIVER_MODESET)) + return; + if (dev->primary->master) { master_priv = dev->primary->master->driver_priv; if (master_priv->sarea_priv)