Chris Wilson noticed that since commit 1f83fee08d625f8d0130f9fe5ef7b17c2e022f3c [v3.9] Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Thu Nov 15 17:17:22 2012 +0100 drm/i915: clear up wedged transitions X can again get -EIO when it does not expect it. And even worse score a SIGBUS when accessing gtt mmaps. The established ABI is that we _only_ return an -EIO from execbuf - all other ioctls should just work. And since the reset code moves all bos out of gpu domains and clears out all the last_seqno/ring tracking there really shouldn't be any reason for non-execbuf code to ever touch the hw and see an -EIO. After some extensive discussions we've noticed that these spurios -EIO are caused by i915_gem_wait_for_error: http://www.mail-archive.com/intel-gfx@lists.freedesktop.org/msg20540.html That is easy to fix by returning 0 instead of -EIO, since grabbing the dev->struct_mutex does not yet mean that we actually want to touch the hw. And so there is no reason at all to fail with -EIO. But that's not the entire since, since often (at least it's easily googleable) dmesg indicates that the reset fails and we declare the gpu wedged. Then, quite a bit later X wakes up with the "Timed out waiting for the gpu reset to complete" DRM_ERROR message in wait_for_errror and brings down the desktop with an -EIO/SIGBUS. So clearly we're missing a wakeup somewhere, since the gpu reset just doesn't take 10 seconds to complete. And indeed we're do handle the terminally wedged state wrong. Fix this all up. References: https://bugs.freedesktop.org/show_bug.cgi?id=63921 References: https://bugs.freedesktop.org/show_bug.cgi?id=64073 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Damien Lespiau <damien.lespiau@intel.com> Cc: stable@vger.kernel.org Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
************************************************************ * For the very latest on DRI development, please see: * * http://dri.freedesktop.org/ * ************************************************************ The Direct Rendering Manager (drm) is a device-independent kernel-level device driver that provides support for the XFree86 Direct Rendering Infrastructure (DRI). The DRM supports the Direct Rendering Infrastructure (DRI) in four major ways: 1. The DRM provides synchronized access to the graphics hardware via the use of an optimized two-tiered lock. 2. The DRM enforces the DRI security policy for access to the graphics hardware by only allowing authenticated X11 clients access to restricted regions of memory. 3. The DRM provides a generic DMA engine, complete with multiple queues and the ability to detect the need for an OpenGL context switch. 4. The DRM is extensible via the use of small device-specific modules that rely extensively on the API exported by the DRM module. Documentation on the DRI is available from: http://dri.freedesktop.org/wiki/Documentation http://sourceforge.net/project/showfiles.php?group_id=387 http://dri.sourceforge.net/doc/ For specific information about kernel-level support, see: The Direct Rendering Manager, Kernel Support for the Direct Rendering Infrastructure http://dri.sourceforge.net/doc/drm_low_level.html Hardware Locking for the Direct Rendering Infrastructure http://dri.sourceforge.net/doc/hardware_locking_low_level.html A Security Analysis of the Direct Rendering Infrastructure http://dri.sourceforge.net/doc/security_low_level.html