xf86-video-fbturbo
6 months agoHad to build this for Slackware. master
Pete [Fri, 10 Nov 2023 18:56:58 +0000 (10:56 -0800)]
Had to build this for Slackware.

Not great to do this without hooking into autotools.

14 months agofurther probe y forward mitigation metrics origin/HEAD origin/master v0.2
Yatao Li [Thu, 23 Feb 2023 19:13:54 +0000 (19:13 +0000)]
further probe y forward mitigation metrics

14 months agowrap up for release v0.1
Yatao Li [Wed, 22 Feb 2023 03:24:27 +0000 (03:24 +0000)]
wrap up for release

14 months agoadd json compile database
Yatao Li [Wed, 22 Feb 2023 03:15:28 +0000 (03:15 +0000)]
add json compile database

14 months agofigured out g2d bitblt y forward error pattern
Yatao Li [Wed, 22 Feb 2023 02:22:54 +0000 (02:22 +0000)]
figured out g2d bitblt y forward error pattern

14 months agoimplement corner fixup
Yatao Li [Tue, 21 Feb 2023 18:22:58 +0000 (18:22 +0000)]
implement corner fixup

14 months agofigured out g2d bitblt x forward error pattern
Yatao Li [Tue, 21 Feb 2023 02:53:17 +0000 (02:53 +0000)]
figured out g2d bitblt x forward error pattern

19 months agofix g2d bitblt return values
Yatao Li [Sat, 17 Sep 2022 16:15:29 +0000 (00:15 +0800)]
fix g2d bitblt return values

19 months agog2d rotation is working. g2d bitblt is not (glitches), disabled that.
Yatao Li [Sat, 17 Sep 2022 15:49:44 +0000 (23:49 +0800)]
g2d rotation is working. g2d bitblt is not (glitches), disabled that.

20 months agoadd fullscreen rotation test
Yatao Li [Tue, 13 Sep 2022 10:39:47 +0000 (18:39 +0800)]
add fullscreen rotation test

20 months agobenchmarking new g2d
Yatao Li [Tue, 13 Sep 2022 09:56:12 +0000 (17:56 +0800)]
benchmarking new g2d

20 months agoadd test-display2 playground:
Yatao Li [Tue, 13 Sep 2022 07:55:01 +0000 (15:55 +0800)]
add test-display2 playground:

20 months agoupdate ioctl headers
Yatao Li [Sun, 11 Sep 2022 21:35:05 +0000 (05:35 +0800)]
update ioctl headers

20 months agoremove neon assembly, backing_store_tuner, mali, ump
Yatao Li [Sat, 10 Sep 2022 14:30:40 +0000 (22:30 +0800)]
remove neon assembly, backing_store_tuner, mali, ump

8 years agoMerge pull request #45 from jacmet/drop-dri2-include
Siarhei Siamashka [Tue, 6 Oct 2015 22:08:01 +0000 (01:08 +0300)]
Merge pull request #45 from jacmet/drop-dri2-include

sunxi_x_g2d: drop unused dri2 include

8 years agosunxi_x_g2d: drop unused dri2 include
Peter Korsgaard [Sat, 3 Oct 2015 17:01:38 +0000 (19:01 +0200)]
sunxi_x_g2d: drop unused dri2 include

The driver doesn't use DRI for anything.

Signed-off-by: Peter Korsgaard <peter@korsgaard.com>
9 years agosunxi: probe both screen layers at disp probing
Jerome Oufella [Thu, 5 Mar 2015 20:31:07 +0000 (15:31 -0500)]
sunxi: probe both screen layers at disp probing

Some boards use an inverted screen layer configuration, making the
original code unable to enable disp layers functionality properly.

This commit adds a fallback mechanism to the actual disp probing
sequence, allowing those cases to be properly handled.

Signed-off-by: Jérôme Oufella <jerome.oufella@savoirfairelinux.com>
9 years agoCheck if the kernel framebuffer driver returns errors on bad ioctls
Siarhei Siamashka [Sat, 20 Sep 2014 21:25:42 +0000 (00:25 +0300)]
Check if the kernel framebuffer driver returns errors on bad ioctls

When probing for the copyarea ioctl, we want to be sure that the
kernel just does not return 0 (success) for any unsupported ioctls.
The rockchip vendor kernels have been reported to have this issue.

In the case if the support for the Raspberry Pi specific copyarea
ioctl was detected by mistake, moving windows or scrolling was
broken.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agovdpau: Report DRI2 VDPAU driver name as 'sunxi' on Allwinner hardware
Siarhei Siamashka [Sun, 12 Jan 2014 02:42:36 +0000 (04:42 +0200)]
vdpau: Report DRI2 VDPAU driver name as 'sunxi' on Allwinner hardware

Try to load the 'sunxi_cedar_mod' kernel module. And if it loads
successfully, then report the DRI2 VDPAU name as 'sunxi'. This allows to
use libvdpau-sunxi without setting the VDPAU_DRIVER environment variable.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDon't use hardware cursor for rotated desktop
Siarhei Siamashka [Sat, 29 Mar 2014 22:28:01 +0000 (00:28 +0200)]
Don't use hardware cursor for rotated desktop

Fixes https://github.com/ssvb/xf86-video-fbturbo/issues/30

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoPost-release version bump to 0.5.1
Siarhei Siamashka [Sun, 12 Jan 2014 02:36:59 +0000 (04:36 +0200)]
Post-release version bump to 0.5.1

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoPre-release version bump to 0.4.0
Siarhei Siamashka [Sat, 11 Jan 2014 19:43:36 +0000 (21:43 +0200)]
Pre-release version bump to 0.4.0

It makes sense to make a formal release. Providing the
pre-generated 'configure' script should make it less
likely for people to mess with autotools and encounter
troubles:

    https://github.com/ssvb/xf86-video-fbturbo/issues/28
    https://github.com/ssvb/xf86-video-fbturbo/issues/25

Also it's likely that this particular xf86-video-fbturbo
git master snapshot was used in:

    http://www.raspberrypi.org/archives/5580

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoautotools: add missing headers to Makefile.am to fix 'make distcheck'
Siarhei Siamashka [Sat, 11 Jan 2014 20:22:42 +0000 (22:22 +0200)]
autotools: add missing headers to Makefile.am to fix 'make distcheck'

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agomali: detect and workaround mismatch between back and front buffers
Siarhei Siamashka [Sun, 8 Dec 2013 23:44:36 +0000 (01:44 +0200)]
mali: detect and workaround mismatch between back and front buffers

After window creation or resize, the mali blob on the client side
requests two dri2 buffers (for back and front) from the ddx. The
problem is that the 'swap' and 'get_buffer' operations are executed
out of order relative to each other and we may have different
possible patterns of dri2 communication:

1. swap swap swap swap get_buffer swap get_buffer swap swap ...
2. swap swap swap get_buffer swap swap get_buffer swap swap ...

A major annoyance is that both mali blob on the client side and
the ddx driver in xserver need have the same idea about which one
of there two buffers goes to front and which goes to back. Older
commit https://github.com/ssvb/xf86-video-fbturbo/commit/30b4ca27d1c4
tried to address this problem in a mostly empirical way and managed
to solve it at least for the synthetic test gles-rgb-cycle-demo and
for most of the real programs (such as Qt5 applications, etc.)

However appears that this heuristics is not 100% reliable in all
cases. The Extreme Tux Racer game run in glshim manages to trigger
the back and front buffers mismatch. Which manifests itself as
erratic penguin movement.

This patch adds a special check, which now randomly samples certain
bytes from the dri2 buffers to see which one of them has been
modified by the client application between buffer swaps. If we see
that the rendering actually happens to the front buffer instead of
the back buffer, then we just change the roles of these buffers.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoAuto-detect mali DRI device path
Daniel Drake [Mon, 11 Nov 2013 21:54:38 +0000 (15:54 -0600)]
Auto-detect mali DRI device path

When using exynos_drm, /dev/dri/card0 is now the exynos-drm node, and
/dev/dri/card1 is mali.

Instead of hardcoding mali at card0, use libdrm to automatically
provide the correct device node path.

Signed-off-by: Daniel Drake <drake@endlessm.com>
10 years agoSet proper ELF attributes for the ARM assembly functions
Siarhei Siamashka [Sat, 26 Oct 2013 15:14:51 +0000 (18:14 +0300)]
Set proper ELF attributes for the ARM assembly functions

Fixes linking related fragility, which could result in crashes
when doing Thumb2->ARM function calls.

Reported-by: Luc Verhaegen <libv@skynet.be>
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoump: set ump_alternative_fb_secure_id to invalid
Luc Verhaegen [Sat, 19 Oct 2013 14:47:47 +0000 (16:47 +0200)]
ump: set ump_alternative_fb_secure_id to invalid

This avoids a kernel oops due to the badly implemented and badly
checked ump interface.

Signed-off-by: Luc Verhaegen <libv@skynet.be>
10 years agoconfigure: check for ump/ump.h
Luc Verhaegen [Sat, 19 Oct 2013 14:46:06 +0000 (16:46 +0200)]
configure: check for ump/ump.h

And disable building ump when it is not there.

Signed-off-by: Luc Verhaegen <libv@skynet.be>
10 years agofix up dri driverName to select the lima driver
Luc Verhaegen [Sat, 19 Oct 2013 00:35:31 +0000 (02:35 +0200)]
fix up dri driverName to select the lima driver

The binary driver is unaffected by it, only when mesa-dri is fully
installed does it do something.

Signed-off-by: Luc Verhaegen <libv@skynet.be>
10 years agoFix the 'forgotten else' regression to use NEON on Cortex-A8 again
Siarhei Siamashka [Thu, 17 Oct 2013 17:06:30 +0000 (20:06 +0300)]
Fix the 'forgotten else' regression to use NEON on Cortex-A8 again

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoUse ARM LDM instead of VFP for uncached reads on Marvell PJ4
Siarhei Siamashka [Wed, 16 Oct 2013 23:59:36 +0000 (02:59 +0300)]
Use ARM LDM instead of VFP for uncached reads on Marvell PJ4

Marvell PJ4 core used in CuBox very poorly handles VFP uncached
reads from the framebuffer. Using WMMX or ARM LDM reads is much
faster, with LDM instructions having a minor advantage. This
improves framebuffer read performance from ~50MB/s to ~100MB/s.

WMMX runtime detection and PJ4 core identification is also added
as part of this fix.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoRPi: implement threshold for deciding between CPU and DMA blits
Siarhei Siamashka [Tue, 8 Oct 2013 14:27:53 +0000 (17:27 +0300)]
RPi: implement threshold for deciding between CPU and DMA blits

Benchmarking with x11perf, modified to support wider range of sizes
for the scroll operation. Tests have been run at the stock 700MHz CPU
clock frequency and with 1280x720 32bpp desktop.

$ DISPLAY=:0 ./x11perf -scroll5 -scroll10 -scroll15 -scroll20 \
                       -scroll30 -scroll50 -scroll100

== CPU ==

1000000 trep @   0.0289 msec ( 34600.0/sec): Scroll 5x5 pixels
1000000 trep @   0.0387 msec ( 25800.0/sec): Scroll 10x10 pixels
1000000 trep @   0.0459 msec ( 21800.0/sec): Scroll 15x15 pixels
 450000 trep @   0.0576 msec ( 17300.0/sec): Scroll 20x20 pixels
 350000 trep @   0.0817 msec ( 12200.0/sec): Scroll 30x30 pixels
 200000 trep @   0.1564 msec (  6390.0/sec): Scroll 50x50 pixels
 100000 trep @   0.4446 msec (  2250.0/sec): Scroll 100x100 pixels

== fb_copyarea (DMA) acceleration ==

1000000 trep @   0.0307 msec ( 32500.0/sec): Scroll 5x5 pixels
1000000 trep @   0.0353 msec ( 28300.0/sec): Scroll 10x10 pixels
1000000 trep @   0.0397 msec ( 25200.0/sec): Scroll 15x15 pixels
1000000 trep @   0.0464 msec ( 21600.0/sec): Scroll 20x20 pixels
 400000 trep @   0.0645 msec ( 15500.0/sec): Scroll 30x30 pixels
 250000 trep @   0.1177 msec (  8500.0/sec): Scroll 50x50 pixels
 100000 trep @   0.2783 msec (  3590.0/sec): Scroll 100x100 pixels

This shows that the ioctls overhead and the DMA setup cost is not so
significant for the Raspberry Pi. DMA already becomes a bit faster
than CPU at 10x10 size of the blit operation.

Even though there is no significant difference between CPU and DMA
for extremely small sizes of operations (the other overhead is clearly
dominating), setting a threshold is not going to harm:

== mixed CPU / fb_copyarea (DMA) with 90 pixels threshold ==

1000000 trep @   0.0291 msec ( 34300.0/sec): Scroll 5x5 pixels
1000000 trep @   0.0345 msec ( 29000.0/sec): Scroll 10x10 pixels
1000000 trep @   0.0395 msec ( 25300.0/sec): Scroll 15x15 pixels
1000000 trep @   0.0466 msec ( 21400.0/sec): Scroll 20x20 pixels
 400000 trep @   0.0650 msec ( 15400.0/sec): Scroll 30x30 pixels
 250000 trep @   0.1181 msec (  8470.0/sec): Scroll 50x50 pixels
 100000 trep @   0.2784 msec (  3590.0/sec): Scroll 100x100 pixels

If some other ARM devices also implement Raspberry Pi compatible
accelerated fb_copyarea ioctl, then the threshold selection may
be reconsidered.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoAllow to disable Raspberry Pi fb_copyarea acceleration via xorg.conf
Siarhei Siamashka [Mon, 7 Oct 2013 17:47:28 +0000 (20:47 +0300)]
Allow to disable Raspberry Pi fb_copyarea acceleration via xorg.conf

Now acceleration is only used in the case if the AccelMethod option
is not set (so that it is assumed to be a default choice) or when
it is explicitly set to "COPYAREA". Any other value (for example
"CPU") disables acceleration.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoMention Raspberry Pi in the 2D acceleration section of the README
Siarhei Siamashka [Thu, 3 Oct 2013 22:24:50 +0000 (01:24 +0300)]
Mention Raspberry Pi in the 2D acceleration section of the README

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoSupport DMA-optimized fb_copyarea from the Raspberry Pi kernel
Siarhei Siamashka [Mon, 17 Jun 2013 15:28:13 +0000 (18:28 +0300)]
Support DMA-optimized fb_copyarea from the Raspberry Pi kernel

This provides basic 2D acceleration support for Raspberry Pi
to speed up moving windows and scrolling.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoRename the driver from "xf86-video-sunxifb" to "xf86-video-fbturbo"
Siarhei Siamashka [Sun, 22 Sep 2013 17:08:53 +0000 (20:08 +0300)]
Rename the driver from "xf86-video-sunxifb" to "xf86-video-fbturbo"

Because a wide range of embedded ARM devices are actually supported
(Allwinner A1X/A20, Raspberry Pi, ODROID-X, Rockchip, ...) and
are getting some sort of performance improvement and/or hardware
acceleration, the DDX driver needs a vendor neutral name.

Resolves https://github.com/ssvb/xf86-video-fbturbo/issues/10

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agomali: /var/log/Xorg.0.log warning about insufficient framebuffer size
Siarhei Siamashka [Mon, 9 Sep 2013 23:25:42 +0000 (02:25 +0300)]
mali: /var/log/Xorg.0.log warning about insufficient framebuffer size

In the case if the framebuffer reservation size is too small for
efficient use of the hardware overlays and zero-copy buffers flipping,
log a hint about fixing this problem in /var/log/Xorg.0.log

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agomali: added a sanity check for the UMP framebuffer wrappers size
Siarhei Siamashka [Mon, 9 Sep 2013 22:33:46 +0000 (01:33 +0300)]
mali: added a sanity check for the UMP framebuffer wrappers size

Even though we are primarily using the UMP buffer obtained by the
GET_UMP_SECURE_ID_SUNXI_FB ioctl, another UMP buffer obtained by
the GET_UMP_SECURE_ID_BUF1 ioctl should also span over the whole
framebuffer. Otherwise we may have troubles with the window resize
bug recovery and buffer flipping.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoA big README update
Siarhei Siamashka [Sat, 7 Sep 2013 22:19:42 +0000 (01:19 +0300)]
A big README update

The instructions, links, etc.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agosunxi: workaround a negative YUV overlay position bug
Siarhei Siamashka [Sat, 7 Sep 2013 01:10:07 +0000 (04:10 +0300)]
sunxi: workaround a negative YUV overlay position bug

The Allwinner A10/A13 display controller hardware is expected to
support negative coordinates of the top left corners of the layers.
But there is some bug either in the kernel driver or in the hardware,
which messes up the picture on screen when the Y coordinate is negative
for YUV layer. Negative X coordinates are not affected. RGB formats
are not affected too (no matter whether the RGB layer is scaled or not).

We fix this by just recalculating which part of the buffer in memory
corresponds to Y=0 on screen and adjust the input buffer settings.

Fixes https://github.com/ssvb/xf86-video-sunxifb/issues/16

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agomali: support sunxi hardware overlay also with r5g6b5 format
Siarhei Siamashka [Sat, 7 Sep 2013 00:13:37 +0000 (03:13 +0300)]
mali: support sunxi hardware overlay also with r5g6b5 format

Now zero copy and tear free buffer swapping is also supported
for 16bpp desktop.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agosunxi: Only enable scaler for the layer when it is really necessary
Siarhei Siamashka [Fri, 6 Sep 2013 23:22:37 +0000 (02:22 +0300)]
sunxi: Only enable scaler for the layer when it is really necessary

Now the scaler is enabled for the sunxi disp layer only when we want
to use it for YUV format with XV. Whenever the layer is configured
for RGB format or deactivated, the scaler gets disabled.

This should make the driver more friendly to the other potential
scaled layer users. The total number of available scalers is only
2 for Allwinner A10 and only 1 for Allwinner A13.

The potential drawback is that now we may get an error when trying
to enable the scaler (if somebody else has used up all the available
scalers) instead of always having it reserved and ready for use.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: Fix the kernel oops regression when DRI2HWOverlay=false
Siarhei Siamashka [Tue, 13 Aug 2013 00:03:39 +0000 (03:03 +0300)]
DRI2: Fix the kernel oops regression when DRI2HWOverlay=false

Recent changes broke the configuration when "DRI2HWOverlay" option
is set to "false". This patch adds the missing UMP secure ids
initialization and resolves the problem.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: Rename all SunxiMaliDRI2 instances to 'mali' for clarity
Siarhei Siamashka [Sun, 4 Aug 2013 20:17:34 +0000 (23:17 +0300)]
DRI2: Rename all SunxiMaliDRI2 instances to 'mali' for clarity

Do this to keep the variables naming style consistent across the
source file (earlier these variables had different names like
'self', 'drvpriv', 'private').

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: Ensure correct ordering of frames after window resize
Siarhei Siamashka [Sun, 4 Aug 2013 02:33:06 +0000 (05:33 +0300)]
DRI2: Ensure correct ordering of frames after window resize

In double buffer mode, explicitly mark the buffers as designated
for odd or even frame position when putting them into queue. And
when swapping the buffers, use these flags to re-synchronize if
it is necessary. This prevents problems after window resize (when
gles-rgb-cycle-demo could expose a mismatch between the color
name in the window title and the actual window color).

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agotest: use spacebar as a slow motion hotkey for gles-rgb-cycle-demo
Siarhei Siamashka [Sun, 4 Aug 2013 00:07:08 +0000 (03:07 +0300)]
test: use spacebar as a slow motion hotkey for gles-rgb-cycle-demo

Whenever something goes wrong in high fps mode, it may be interesting
to slow down the demo to check whether the actual background color
matches the expected color (shown in the window title).

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: Debugging code for testing the frames order correctness
Siarhei Siamashka [Sat, 3 Aug 2013 08:08:05 +0000 (11:08 +0300)]
DRI2: Debugging code for testing the frames order correctness

If DEBUG_WITH_RGB_PATTERN is defined, then we check that the
frames colors are changed as "R -> G -> B -> R -> G -> ..."
pattern and print debugging messages when this is not the
case. Such color change pattern can be generated by the
"test/gles-rgb-cycle-demo.c" program.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: erase the offscreen framebuffer part on first buffer allocation
Siarhei Siamashka [Wed, 31 Jul 2013 22:51:44 +0000 (01:51 +0300)]
DRI2: erase the offscreen framebuffer part on first buffer allocation

Do this mostly for security reasons. We don't want any application
to see whatever was last rendered by the previous GLES application
by just peeking into a freshly allocated DRI2 buffer.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: Don't waste overlay on a strange 1x1 window created by gnome-shell
Siarhei Siamashka [Wed, 31 Jul 2013 22:29:06 +0000 (01:29 +0300)]
DRI2: Don't waste overlay on a strange 1x1 window created by gnome-shell

We manage only a single hardware overlay. That's a precious shared
resource, which we want to use for zero-copy fullscreen compositing
in gnome-shell. The strange 1x1 window does not really need it.
Fixes https://github.com/ssvb/xf86-video-sunxifb/issues/2

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: Added new "SwapbuffersWait" option for xorg.conf
Siarhei Siamashka [Wed, 31 Jul 2013 00:36:52 +0000 (03:36 +0300)]
DRI2: Added new "SwapbuffersWait" option for xorg.conf

When enabled, it tries to avoid tearing in OpenGL ES applications.
Works on sunxi hardware in the case if the hardware overlay (sunxi
disp layer) is used for a DRI2 window. The name of this option and
the description in the man page has been borrowed from intel and
radeon drivers.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: Implemented double buffering when using the hardware overlay
Siarhei Siamashka [Tue, 30 Jul 2013 21:25:33 +0000 (00:25 +0300)]
DRI2: Implemented double buffering when using the hardware overlay

That's the right thing to do and fixes issues such as
    https://github.com/ssvb/xf86-video-sunxifb/issues/6

As a result, now the framebuffer size may need to be larger in
order to accomodate two DRI2 buffers in the offscreen part of
the framebuffer. The users of sunxi hardware are advised to
increase the value of fb0_framebuffer_num variable in fex file
to 3 for 32bpp mode and to 5 for 16bpp mode.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoExplicitly include "gcstruct.h" for GCOps
Siarhei Siamashka [Mon, 29 Jul 2013 16:10:02 +0000 (19:10 +0300)]
Explicitly include "gcstruct.h" for GCOps

Should fix https://github.com/ssvb/xf86-video-sunxifb/issues/14
and prevent FTBFS on some systems.

Reported-by: Fred Chien <cfsghost@gmail.com>
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: Rely less on the information from DRI2BufferRec
Siarhei Siamashka [Mon, 29 Jul 2013 14:06:20 +0000 (17:06 +0300)]
DRI2: Rely less on the information from DRI2BufferRec

When moving further to our own DRI2 buffers bookkeeping, we can't
really trust the information from DRI2BufferRec anymore. So just add
a copy of all the missing bits of information to UMPBufferInfoRec
and use it instead.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agotest: configurable delay between frames in gles-rgb-cycle-demo
Siarhei Siamashka [Sat, 27 Jul 2013 13:51:58 +0000 (16:51 +0300)]
test: configurable delay between frames in gles-rgb-cycle-demo

By allowing to set the delay between frames with milliseconds
precision in the command line, we can use it to test vsync.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: CPU copy fallback path does not drop half of the frames anymore
Siarhei Siamashka [Fri, 26 Jul 2013 23:17:06 +0000 (02:17 +0300)]
DRI2: CPU copy fallback path does not drop half of the frames anymore

The recent commit 9e0a87319b90e3e364fde7cffd24662926f5a4fa (its part
that suppressed buffers reuse in the Xorg DRI2 framework) introduced
a regression. Half of the frames stoppped reaching the screen on
the CPU copy fallback path because the Mali blob now ended up
rendering them to the "wrong" buffer.

It just confirms that we need to completely move from the standard
DRI2 framework in the Xorg server to our own buffers bookkeeping
logic. This patch fixes the regression by introducing a single UMP
buffer per window, which is shared between back and front DRI2
buffers. We can do this because double buffering does not make much
sense on the fallback path at the moment (we can't set scanout from
this buffer and anyway have to copy this data elsewhere immediately
after we get it from Mali).

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: only pay attention to back buffers requests
Siarhei Siamashka [Fri, 26 Jul 2013 13:11:25 +0000 (16:11 +0300)]
DRI2: only pay attention to back buffers requests

Bail out earlier for the uninteresting types of DRI2 buffer
requests (by just returning a dummy null UMP buffer). Makes
the code a bit more simple on the common path.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agotest: new gles-rgb-cycle-demo for testing the correctness of DRI2
Siarhei Siamashka [Wed, 24 Jul 2013 22:40:05 +0000 (01:40 +0300)]
test: new gles-rgb-cycle-demo for testing the correctness of DRI2

The test program cycles through 3 colors (red, green, blue), so
it is easier to see if we get the color change pattern wrong.
Also the X11 window title is updated to indicate the current
color information. If we have any problems with window
decorations handling, they are likely to be exposed.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: Refine the workaround for Mali r3p0 window resizing issue
Siarhei Siamashka [Wed, 24 Jul 2013 22:19:35 +0000 (01:19 +0300)]
DRI2: Refine the workaround for Mali r3p0 window resizing issue

Using the secure id 1 (framebuffer) to trick the Mali blob into
requesting DRI2 buffers again was not a very good idea. The problem
is that the blob still writes something there and corrupts the
framebuffer. So instead we try to assign secure id 2 to a dummy
4KiB UMP buffer allocated in memory and use it for the same purpose.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: Workaround window resize bug in Mali r3p0 blob
Siarhei Siamashka [Wed, 24 Jul 2013 02:25:47 +0000 (05:25 +0300)]
DRI2: Workaround window resize bug in Mali r3p0 blob

The Mali blob is doing something like this:

 1. Request BackLeft DRI2 buffer (buffer A) and render to it
 2. Swap buffers
 3. Request BackLeft DRI2 buffer (buffer B)
 4. Check window size, and if it has changed - go back to step 1.
 5. Render to the current back buffer (either buffer A or B)
 6. Swap buffers
 7. Go back to step 4

A very serious show stopper problem is that the Mali blob ignores
DRI2-InvalidateBuffers events and just uses GetGeometry polling
to check whether the window size has changed. Unfortunately this
is racy and we may end up with a size mismatch between buffer A
and buffer B. This is particularly easy to trigger when the window
size changes exactly between steps 1 and 3.

See test/gles-yellow-blue-flip.c program which demonstrates this.
Qt5 applications also trigger this bug.

We workaround the issue by explicitly tracking the requests for
BackLeft buffers and checking whether the sizes of these buffers
match at step 1 and step 3. However the real challenge here is
notifying the client application that these buffers are no good,
so that it can request them again. As DRI2-InvalidateBuffers
events are ignored, we are in a pretty difficult situation.
Fortunately I remembered a weird behaviour observed earlier:

    https://groups.google.com/forum/#!msg/linux-sunxi/qnxpVaqp1Ys/aVTq09DVih0J

Actually if we return UMP secure ID value 1 for the second DRI2
buffer request, the blob responds to this by spitting out the
following error message:

    [EGL-X11] [2274] DETECTED ONLY ONE FRAMEBUFFER - FORCING A RESIZE
    [EGL-X11] [2274] DRI2 UMP ID 0x3 retrieved
    [EGL-X11] [2274] DRI2 WINDOW UMP SECURE ID CHANGED (0x3 -> 0x3)

And then it proceeds by re-trying to request a pair of DRI2 buffers.
But that's exactly the behaviour we want! As a down side, some ugly
flashing can be seen on screen at the time when this workaround kicks
in, but then everything normalizes. And unfortunately, the race
condition is still not totally eliminated because the blob is
apparently getting DRI2 buffer sizes from the separate GetGeometry
requests instead of using the information provided by DRI2GetBuffers.
But now the problem is at least very hard to trigger.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoFix XV border artifacts when using gstreamer 1.0
Harm Hanemaaijer [Sat, 20 Jul 2013 07:30:19 +0000 (09:30 +0200)]
Fix XV border artifacts when using gstreamer 1.0

Since version 1.0, gstreamer (when using xvimagesink) often
allocates a larger XV image for the video with padding on all
four sides and then calls XvPutImage() to render a part of this
image. With the current XV implementation this results in
artifacts on the borders of the image, with a green bar at the
bottom.

I am observing this when playing a 1280x720 video on a 1920x1080
screen at 32bpp, the size of the video window doesn't matter.

This problem seems to be an exaggeration of the one described in
https://bugzilla.gnome.org/show_bug.cgi?id=685305.

The solution appears to be to use the source area dimensions as
requested in the XvPutImage() call, as opposed to the dimensions
of the originally allocated image, and to honour the offsets
(src_x, src_y) when setting the source region on the display
controller. With this relatively simple change, the problem seems
to go away, and gstreamer 1.0 (which is faster than gstreamer 0.10
due to a zero-copy strategy) provides an acceptable solution for
video playback.

Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
10 years agoDon't initialize XV if we can't reserve a scalable sunxi-disp layer
Siarhei Siamashka [Fri, 19 Jul 2013 00:45:42 +0000 (03:45 +0300)]
Don't initialize XV if we can't reserve a scalable sunxi-disp layer

In the case if an attempt to reserve a scalable sunxi-disp layer
failed, don't initialize XV at all. Otherwise any attempt to use
XV overlay is not going to work correctly and just results in
the following dmesg spam:

[  728.280000] [DISP] not supported yuv channel format:18 in img_sw_para_to_reg

This may happen on Allwinner A13 if scaler mode is enabled in
.fex file (A13 only has one DEFE scaler). Allwinner A10 also
can have similar troubles in dual-head configuration if scaler
mode is enabled for one or both screens (A10 has two DEFE scalers).

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoUpdate man page and README to reflect diverse platform support
Harm Hanemaaijer [Thu, 18 Jul 2013 22:35:14 +0000 (00:35 +0200)]
Update man page and README to reflect diverse platform support

Update the man page and bring it up-to-date, reflecting the fact
that the driver also supports non-sunxi platforms. Add description
of the "XVHWOverlay" option.

Also a small update to the README for similar reasons.

Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
10 years agoAdd option to disable XV hardware overlay
Harm Hanemaaijer [Thu, 18 Jul 2013 19:00:21 +0000 (21:00 +0200)]
Add option to disable XV hardware overlay

Add the "XVHWOverlay" boolean xorg.conf option to make it possible
to disable the XV acceleration feature using display layers on
sunxi hardware.

Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
10 years agoconfigure.ac: workaround libump/pthreads build issue
Siarhei Siamashka [Wed, 17 Jul 2013 14:34:21 +0000 (17:34 +0300)]
configure.ac: workaround libump/pthreads build issue

In some systems libump library is built without an explicit pthreads
dependency. As the issue has been already confirmed to affect both
sunxi and odroid users (and maybe the users of the other mali400
based hardware), it is easier to just workaround the problem locally.
Otherwise we would need to hunt down all the libump packagers and
beg for the fix.

More details are at https://github.com/ssvb/xf86-video-sunxifb/issues/11

Reported-by: Patrick Wood
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDefine ARRAY_SIZE macro if it is not provided by Xorg headers
Siarhei Siamashka [Wed, 17 Jul 2013 11:02:45 +0000 (14:02 +0300)]
Define ARRAY_SIZE macro if it is not provided by Xorg headers

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoAdded initial XV extension support for sunxi hardware
Siarhei Siamashka [Tue, 16 Jul 2013 02:51:31 +0000 (05:51 +0300)]
Added initial XV extension support for sunxi hardware

Proper layer sharing between XV and DRI2 still needs to be implemented.
Additionally we still need NEON and/or G2D "textured overlay" as a
fallback solution for the composited desktop (NEON optimized XV is going
to be useful for a wide range of ARM devices). A bit of performance
tuning is also necessary.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agosunxi: disp ioctl wrappers for YUV overlay and color key support
Siarhei Siamashka [Thu, 11 Jul 2013 21:56:19 +0000 (00:56 +0300)]
sunxi: disp ioctl wrappers for YUV overlay and color key support

They are needed for a basic XV extension implementation.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoAdd CPU optimization for PutImage
Harm Hanemaaijer [Fri, 7 Jun 2013 00:06:00 +0000 (02:06 +0200)]
Add CPU optimization for PutImage

Benchmark tests reveal that xorg's fb layer PutImage implementation
does not follow on optimal code path for requests without special
raster operations, which is due to the use of a slower general blit
function instead of the pixman library. This affects Xlib PutImage
requests and some ShmPutImage requests. In the case of ShmPutImage,
xorg directs ShmPutImage requests to PutImage only if the width of
the part of the image to be copied is equal to the full width of
the image, resulting in relatively poor performance. If the width
of the part of the image that is copied is smaller than the full
image, then xorg uses CopyArea which results in the use of the
already optimal pixman blit functions. The sub-optimal path is
commonly triggered by applications such as window managers and web
browsers.

To fix this unnecessary performance flaw, PutImage is replaced with
a version that uses pixman for the common case of GXcopy and all
plane masks sets. This change is device-independent and only uses
pixman CPU blit functions that is already present in the xorg server.

Using the low-level benchmark program benchx
(https://github.com/hglm/benchx.git), the following speed-ups were
measured (1920x1080x32bpp) on an Allwinner A10 device:

ShmPutImageFullWidth (5 x 5): Speed up 9%
ShmPutImageFullWidth (7 x 7): Slow down 5%
ShmPutImageFullWidth (22 x 22): Speed up 8%
ShmPutImageFullWidth (49 x 49): Speed up 19%
ShmPutImageFullWidth (73 x 73): Speed up 55%
ShmPutImageFullWidth (109 x 109): Speed up 50%
ShmPutImageFullWidth (163 x 163): Speed up 37%
ShmPutImageFullWidth (244 x 244): Speed up 111%
ShmPutImageFullWidth (366 x 366): Speed up 77%
ShmPutImageFullWidth (549 x 549): Speed up 92%
AlignedShmPutImageFullWidth (5 x 5): Slow down 14%
AlignedShmPutImageFullWidth (7 x 7): Slow down 6%
AlignedShmPutImageFullWidth (15 x 15): Speed up 10%
AlignedShmPutImageFullWidth (22 x 22): Speed up 9%
AlignedShmPutImageFullWidth (33 x 33): Speed up 21%
AlignedShmPutImageFullWidth (49 x 49): Speed up 28%
AlignedShmPutImageFullWidth (73 x 73): Speed up 30%
AlignedShmPutImageFullWidth (109 x 109): Speed up 47%
AlignedShmPutImageFullWidth (163 x 163): Speed up 38%
AlignedShmPutImageFullWidth (244 x 244): Speed up 63%
AlignedShmPutImageFullWidth (366 x 366): Speed up 84%
AlignedShmPutImageFullWidth (549 x 549): Speed up 89%

At 16bpp the speed-up is even greater:

ShmPutImageFullWidth (5 x 5): Slow down 8%
ShmPutImageFullWidth (7 x 7): Slow down 8%
ShmPutImageFullWidth (10 x 10): Slow down 6%
ShmPutImageFullWidth (22 x 22): Speed up 9%
ShmPutImageFullWidth (33 x 33): Speed up 20%
ShmPutImageFullWidth (49 x 49): Speed up 27%
ShmPutImageFullWidth (73 x 73): Speed up 69%
ShmPutImageFullWidth (109 x 109): Speed up 74%
ShmPutImageFullWidth (163 x 163): Speed up 100%
ShmPutImageFullWidth (244 x 244): Speed up 111%
ShmPutImageFullWidth (366 x 366): Speed up 133%
ShmPutImageFullWidth (549 x 549): Speed up 123%
AlignedShmPutImageFullWidth (5 x 5): Speed up 6%
AlignedShmPutImageFullWidth (7 x 7): Slow down 9%
AlignedShmPutImageFullWidth (10 x 10): Slow down 10%
AlignedShmPutImageFullWidth (33 x 33): Speed up 17%
AlignedShmPutImageFullWidth (49 x 49): Speed up 34%
AlignedShmPutImageFullWidth (73 x 73): Speed up 49%
AlignedShmPutImageFullWidth (109 x 109): Speed up 53%
AlignedShmPutImageFullWidth (163 x 163): Speed up 69%
AlignedShmPutImageFullWidth (244 x 244): Speed up 82%
AlignedShmPutImageFullWidth (366 x 366): Speed up 116%
AlignedShmPutImageFullWidth (549 x 549): Speed up 110%

Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
10 years agoCPU: use VFP overlapped blit on VFP-capable hardware by default
Siarhei Siamashka [Wed, 12 Jun 2013 18:35:43 +0000 (21:35 +0300)]
CPU: use VFP overlapped blit on VFP-capable hardware by default

This should be useful for Raspberry Pi. When reading uncached source buffers,
the VFP optimized overlapped two-pass blit is roughly 2-3 times slower than
memcpy in cached memory. Which makes it reasonably competitive compared to
ShadowFB (considering that ShadowFB allocates an extra buffer, does extra
memory copies which take time and thrash L2 cache, etc.). It even provides
a slight performance advantage in a more or less realistic use case
(scrolling in xterm), which needs reads from the framebuffer:

==== Before (xf86-video-fbdev with ShadowFB) ====

$ time DISPLAY=:0 xterm +j -maximized -e cat longtext.txt

real    1m50.245s
user    0m1.750s
sys     0m0.800s

==== After (xf86-video-sunxifb without ShadowFB) ====

$ time DISPLAY=:0 xterm +j -maximized -e cat longtext.txt

real    1m27.709s
user    0m1.690s
sys     0m0.920s

We get decent results even when reading from the framebuffer. However
in many typical workloads (excluding scrolling and dragging windows)
the framebuffer is primarily used as write-only. In write-only use
cases ShadowFB is just pure overhead. So getting rid of it is a
very good idea as this improves overall graphics performance.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoFix segfault on exit (introduced by the new backing store code)
Siarhei Siamashka [Wed, 12 Jun 2013 20:04:47 +0000 (23:04 +0300)]
Fix segfault on exit (introduced by the new backing store code)

A small typo in a function argument and C compiler happily accepting
void pointers instead of something else is a dangerous combo. Need to
be more careful.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoBacking store heuristics for improving windows dragging performance
Siarhei Siamashka [Wed, 12 Jun 2013 16:52:41 +0000 (19:52 +0300)]
Backing store heuristics for improving windows dragging performance

This patch implements a heuristics, which enables backing store for some
windows. When backing store is enabled for a window, the window gets a
backing pixmap (via automatic redirection provided by composite extension).
It acts a bit similar to ShadowFB, but for individual windows.

The advantage of backing store is that we can avoid "expose event -> redraw"
animated trail in the exposed area when dragging another window on top of it.
Dragging windows becomes much smoother and faster.

But the disadvantage of backing store is the same as for ShadowFB. That's a
loss of precious RAM, extra buffer copy when somebody tries to update window
content, potentially skip of some frames on fast animation (they just do
not reach screen). Also hardware accelerated scrolling does not currently
work for the windows with backing store enabled.

We try to make the best use of backing store by enabling backing store for
all the windows that are direct children of root, except the one which has
keyboard focus (either directly or via one of its children). In practice this
heuristics seems to provide nearly perfect results:
 1) dragging windows is fast and smooth.
 2) the top level window with the keyboard focus (typically the application
    that a user is working with) is G2D accelerated and does not suffer from
    any intermediate buffer copy overhead.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoDRI2: Move DebugMsg macro to a common header
Siarhei Siamashka [Mon, 10 Jun 2013 19:24:25 +0000 (22:24 +0300)]
DRI2: Move DebugMsg macro to a common header

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoEnable G2D acceleration by default on sun4i hardware
Siarhei Siamashka [Fri, 7 Jun 2013 22:56:05 +0000 (01:56 +0300)]
Enable G2D acceleration by default on sun4i hardware

With the fallback to CPU backend for unsupported blits and also
threshold for avoiding small blits, now G2D should always provide
best overall performance.

The users of recent versions of xf86-video-sunxifb are supposed
to also have a reasonably recent version of linux-sunxi kernel.
Which includes the following fix:
  https://github.com/linux-sunxi/linux-sunxi/commit/3d49345343a1535b

The users of old kernels are going to see screen corruption on
dragging windows and scrolling. They just should upgrade :)

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoG2D: Fallback to NEON optimized CPU backend for unsupported blits
Siarhei Siamashka [Fri, 7 Jun 2013 22:02:55 +0000 (01:02 +0300)]
G2D: Fallback to NEON optimized CPU backend for unsupported blits

The G2D driver only supports framebuffer->framebuffer blits and
also can't be used to accelerate dragging windows to the right
(without hacking the kernel driver to do two-pass blit there).
This patch adds fallback to NEON optimized CPU backend instead
of resorting to use poorly performing fbBlt in these cases.

Note: we assume that ioctls normally do not fail (even if they
      do, the slow old style fallback to fbBlt is not the worst
      thing to worry about).

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoCPU: Added ARM VFP two-pass overlapped blit implementation
Siarhei Siamashka [Wed, 5 Jun 2013 00:07:41 +0000 (03:07 +0300)]
CPU: Added ARM VFP two-pass overlapped blit implementation

Using VFP, we can load up to 128 bytes with a single VLDM instruction.
But before this patch, only NEON implementation was available. Just
because it showed better results on Allwinner A10 compared to VFP.
And this DDX driver used to primarily target just sunxi hardware.

But looks like it makes sense to also target other devices (at least
ODROID-X, which has the same Mali400 GPU and can use the same DRI2
integration for EGL and GLESv2 support). And on the other ARM devices,
VFP aligned reads generally work better than NEON. The benchmark
results are listed below:

            1280x720, 32bpp, testing "x11perf -scroll500"

== Exynos 5250, Cortex-A15, Non-cacheable streaming enhancement disabled ==

NEON : 10000 trep @   3.7101 msec (   270.0/sec): Scroll 500x500 pixels
VFP  : 10000 trep @   2.6678 msec (   375.0/sec): Scroll 500x500 pixels

== Exynos 5250, Cortex-A15, Non-cacheable streaming enhancement enabled ==

NEON : 15000 trep @   2.2568 msec (   443.0/sec): Scroll 500x500 pixels
VFP  : 15000 trep @   2.3016 msec (   434.0/sec): Scroll 500x500 pixels

== Exynos 4412, Cortex-A9 ==

NEON : 10000 trep @   4.5125 msec (   222.0/sec): Scroll 500x500 pixels
VFP  : 10000 trep @   2.7015 msec (   370.0/sec): Scroll 500x500 pixels

== TI DM3730, Cortex-A8 ==

NEON : 15000 trep @   2.2303 msec (   448.0/sec): Scroll 500x500 pixels
VFP  : 10000 trep @   3.0670 msec (   326.0/sec): Scroll 500x500 pixels

== Allwinner A10, Cortex-A8 ==

NEON : 10000 trep @   2.5559 msec (   391.0/sec): Scroll 500x500 pixels
VFP  : 10000 trep @   3.0580 msec (   327.0/sec): Scroll 500x500 pixels

== Raspberry Pi, BCM2708, ARM1176 ==

VFP  :  3000 trep @   8.7699 msec (   114.0/sec): Scroll 500x500 pixels

The benchmark numbers in this particular test setup roughly represent
memory copy bandwidth measured in MB/s (when doing overlapped blits
inside of a writecombine mapped framebuffer).

-----------------------------------------------------------------------

Note: the use of VFP two-pass overlapped copy instead of ShadowFB is
      still not enabled by default when running on Raspberry Pi
      because the performance results are not so great.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoCPU: add ARM memcpy assembly function
Siarhei Siamashka [Mon, 3 Jun 2013 00:17:11 +0000 (03:17 +0300)]
CPU: add ARM memcpy assembly function

This is my old ARM9E/ARM11 memcpy code from
    https://garage.maemo.org/projects/mplayer/
with some tuning for Raspberry Pi (aligned prefetch added).

Will be used by VFP optimized overlapped blt function.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
10 years agoG2D: Implement "double speed" 16bpp blits
Harm Hanemaaijer [Fri, 24 May 2013 18:56:06 +0000 (20:56 +0200)]
G2D: Implement "double speed" 16bpp blits

When source and destination coordinates allow it, a 16bpp screen-to-
screen blit is divided into up to three segments: two optional one
pixel wide edges and an  aligned middle segment that is copied in
32-bit mode.

This patch adds the low-level function sunxi_g2d_blit_r5g6b5_in_three
and adds logic to the general blit function to use it for 16bpp to
16bpp blits if the source and destination coordinates allow it. This
patch automatically enables the use of this optimization in the
sunxi G2D X driver. The area threshold for using G2D for
16bpp-to-16bpp blits was introduced in a previous patch.

Benchmarks:

1920x1080x16bpp@60Hz, ShadowFB disabled:

x11perf -scroll100
Before:
 350000 trep @   0.0881 msec ( 11400.0/sec): Scroll 100x100 pixels
After:
 350000 trep @   0.0819 msec ( 12200.0/sec): Scroll 100x100 pixels

x11perf -scroll500
Before:
  20000 trep @   1.3547 msec (   738.0/sec): Scroll 500x500 pixels
After:
  35000 trep @   0.8005 msec (  1250.0/sec): Scroll 500x500 pixels

Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
10 years agoG2D: Implement an area threshold for using G2D blits.
Harm Hanemaaijer [Fri, 24 May 2013 16:38:04 +0000 (18:38 +0200)]
G2D: Implement an area threshold for using G2D blits.

Due to the overhead of G2D for small screen-to-screen blits, CPU blits
are faster for small areas. This patch introduces are threshold below
which CPU blits are triggered. It is currently set to 1000 for 32bpp
and 2500 for 16bpp based on test results.

Some benchmarks:

1920x1080x16bppx60Hz, ShadowFB disabled:

x11perf -scroll10

Before:
1500000 trep @   0.0239 msec ( 41800.0/sec): Scroll 10x10 pixels
After:
2500000 trep @   0.0110 msec ( 90900.0/sec): Scroll 10x10 pixels

x11perf -copywinwin10

Before:
1200000 trep @   0.0247 msec ( 40500.0/sec): Copy 10x10 from window to window
After:
1800000 trep @   0.0146 msec ( 68600.0/sec): Copy 10x10 from window to window

Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
11 years agotest: race condition on DRI2 buffers allocation when going fullscreen
Siarhei Siamashka [Mon, 22 Apr 2013 20:19:54 +0000 (23:19 +0300)]
test: race condition on DRI2 buffers allocation when going fullscreen

This test program exposes a problem related to window resizing
(or going fullscreen), which is may happen exactly between "back"
and "front" DRI2 buffers allocation.

The xtrace log with some annotations:

000:<:004c:  8: DRI2-Request(151,3): CreateDrawable drawable=0x02200001
000:<:004d: 16: DRI2-Request(151,5): GetBuffers drawable=0x02200001 attachments={attachment=BackLeft(0x00000001)};
000:>:004d:52: Reply to GetBuffers: width=480 height=480 buffers={attachment=BackLeft(0x00000001)
                                    name=0x00000157 pitch=1920 cpp=4 flags=0x00000000};

Get the BackLeft buffer.

000:<:004e:  4: Request(43): GetInputFocus
000:>:004e:32: Reply to GetInputFocus: revert-to=PointerRoot(0x01) focus=0x02200001
000:<:004f: 24: Request(16): InternAtom only-if-exists=false(0x00) name='_NET_WM_STATE'
000:>:004f:32: Reply to InternAtom: atom=0xff("_NET_WM_STATE")
000:<:0050: 32: Request(16): InternAtom only-if-exists=false(0x00) name='_NET_WM_STATE_FULLSCREEN'
000:>:0050:32: Reply to InternAtom: atom=0x102("_NET_WM_STATE_FULLSCREEN")
000:<:0051: 44: Request(25): SendEvent propagate=false(0x00) destination=0x00000170 event-mask=SubstructureNotify,SubstructureRedirect
                             ClientMessage(33) format=0x20 window=0x02200001 type=0xff("_NET_WM_STATE")
                             data=0x01,0x00,0x00,0x00,0x02,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00;
000:<:0052:  4: Request(43): GetInputFocus
000:>:0052: Event DRI2-InvalidateBuffers(102) drawable=0x02200001

Here the X server attempts to notify the client side DRI2 code in the Mali blob
that the DRI2 buffer must be requested again. But this event gets happily ignored.

000:>:0052: Event Expose(12) window=0x02200001 x=0 y=0 width=1920 height=1080 count=0x0000
000:>:0052:32: Reply to GetInputFocus: revert-to=PointerRoot(0x01) focus=0x02200001
000:<:0053:  8: Request(3): GetWindowAttributes window=0x02200001
000:<:0054:  8: Request(14): GetGeometry drawable=0x02200001
000:>:0053:44: Reply to GetWindowAttributes: backing-store=NotUseful(0x00) visual=0x00000021 class=InputOutput(0x0001)
                                             bit-gravity=Forget(0x00) win-gravity=NorthWest(0x01) backing-planes=0xffffffff
                                             backing-pixel=0x00000000 save-under=false(0x00) map-is-installed=true(0x01)
                                             map-state=Viewable(0x02) override-redirect=false(0x00) colormap=0x00000020
                                             all-event-masks=PointerMotion,Exposure,StructureNotify,FocusChange,PropertyChange
                                             your-event-mask=PointerMotion,Exposure do-not-propagate-mask=0 unused=0x0000
000:>:0054:32: Reply to GetGeometry: depth=0x18 root=0x00000170 x=0 y=0 width=1920 height=1080 border-width=0
001:<:000c: 12: Request(98): QueryExtension name='DRI2'
001:>:000c:32: Reply to QueryExtension: present=true(0x01) major-opcode=151 first-event=101 first-error=0
001:<:000d: 32: DRI2-Request(151,8): SwapBuffers drawable=0x02200001 target_msc_hi=0 target_msc_lo=0
                                     divisor_hi=0 divisor_lo=0 remainder_hi=0 remainder_lo=0
001:>:000d: Event DRI2-BufferSwapComplete(101) drawable=0x00000002 ust_hi=35651585 ust_lo=0 msc_hi=0 msc_lo=0 sbc_hi=0 sbc_lo=1

Here the DRI2 code from the Mali blob tries to swap buffers (with the
hope that the allocated BackLeft would go to front)

001:>:000d:32: Reply to SwapBuffers: swap_hi=0 swap_lo=4096
000:<:0055:  8: DRI2-Request(151,3): CreateDrawable drawable=0x02200001
000:<:0056: 16: DRI2-Request(151,5): GetBuffers drawable=0x02200001 attachments={attachment=BackLeft(0x00000001)};
000:>:0056:52: Reply to GetBuffers: width=1920 height=1080 buffers={attachment=BackLeft(0x00000001)
                                    name=0x00000159 pitch=7680 cpp=4 flags=0x00000000};

And requests for the new BackLeft DRI2 buffer.

000:<:0057:  4: Request(43): GetInputFocus
000:>:0057:32: Reply to GetInputFocus: revert-to=PointerRoot(0x01) focus=0x02200001
000:<:0058:  8: Request(3): GetWindowAttributes window=0x02200001
000:<:0059:  8: Request(14): GetGeometry drawable=0x02200001
000:>:0058:44: Reply to GetWindowAttributes: backing-store=NotUseful(0x00) visual=0x00000021 class=InputOutput(0x0001)
                                             bit-gravity=Forget(0x00) win-gravity=NorthWest(0x01) backing-planes=0xffffffff
                                             backing-pixel=0x00000000 save-under=false(0x00) map-is-installed=true(0x01)
                                             map-state=Viewable(0x02) override-redirect=false(0x00) colormap=0x00000020
                                             all-event-masks=PointerMotion,Exposure,StructureNotify,FocusChange,PropertyChange
                                             your-event-mask=PointerMotion,Exposure do-not-propagate-mask=0 unused=0x0000
000:>:0059:32: Reply to GetGeometry: depth=0x18 root=0x00000170 x=0 y=0 width=1920 height=1080 border-width=0
000:<:005a:  8: Request(3): GetWindowAttributes window=0x02200001
000:<:005b:  8: Request(14): GetGeometry drawable=0x02200001
000:>:005a:44: Reply to GetWindowAttributes: backing-store=NotUseful(0x00) visual=0x00000021 class=InputOutput(0x0001)
                                             bit-gravity=Forget(0x00) win-gravity=NorthWest(0x01) backing-planes=0xffffffff
                                             backing-pixel=0x00000000 save-under=false(0x00) map-is-installed=true(0x01)
                                             map-state=Viewable(0x02) override-redirect=false(0x00) colormap=0x00000020
                                             all-event-masks=PointerMotion,Exposure,StructureNotify,FocusChange,PropertyChange
                                             your-event-mask=PointerMotion,Exposure do-not-propagate-mask=0 unused=0x0000
000:>:005b:32: Reply to GetGeometry: depth=0x18 root=0x00000170 x=0 y=0 width=1920 height=1080 border-width=0
001:<:000e: 32: DRI2-Request(151,8): SwapBuffers drawable=0x02200001 target_msc_hi=0 target_msc_lo=0
                                     divisor_hi=0 divisor_lo=0 remainder_hi=0 remainder_lo=0
001:>:000e: Event DRI2-BufferSwapComplete(101) drawable=0x00000002 ust_hi=35651585 ust_lo=0 msc_hi=0 msc_lo=0 sbc_hi=0 sbc_lo=2

And here it is simply swapping the buffers.

001:>:000e:32: Reply to SwapBuffers: swap_hi=0 swap_lo=4096
000:<:005c:  8: Request(3): GetWindowAttributes window=0x02200001
000:<:005d:  8: Request(14): GetGeometry drawable=0x02200001
000:>:005c:44: Reply to GetWindowAttributes: backing-store=NotUseful(0x00) visual=0x00000021 class=InputOutput(0x0001)
                                             bit-gravity=Forget(0x00) win-gravity=NorthWest(0x01) backing-planes=0xffffffff
                                             backing-pixel=0x00000000 save-under=false(0x00) map-is-installed=true(0x01)
                                             map-state=Viewable(0x02) override-redirect=false(0x00) colormap=0x00000020
                                             all-event-masks=PointerMotion,Exposure,StructureNotify,FocusChange,PropertyChange
                                             your-event-mask=PointerMotion,Exposure do-not-propagate-mask=0 unused=0x0000
000:>:005d:32: Reply to GetGeometry: depth=0x18 root=0x00000170 x=0 y=0 width=1920 height=1080 border-width=0

And now it is polling for the change of window geometry. The same
"SwapBuffers -> GetGeometry -> SwapBuffers" pattern keeps repeating.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoCPU: Added ARM NEON optimized CopyWindow/CopyArea implementation
Siarhei Siamashka [Sat, 30 Mar 2013 03:49:38 +0000 (05:49 +0200)]
CPU: Added ARM NEON optimized CopyWindow/CopyArea implementation

Should be useful for better performance when moving windows
and scrolling on the devices without a dedicated 2D hardware
accelerator (Allwinner A13).

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agosunxi: Fix segfault when there is no "fbdev" option in xorg.conf
Siarhei Siamashka [Wed, 27 Mar 2013 19:10:38 +0000 (21:10 +0200)]
sunxi: Fix segfault when there is no "fbdev" option in xorg.conf

Just use "/dev/fb0" by default.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoG2D: Now sunxi_x_g2d.c code does not require sunxi disp anymore
Siarhei Siamashka [Tue, 26 Mar 2013 20:03:26 +0000 (22:03 +0200)]
G2D: Now sunxi_x_g2d.c code does not require sunxi disp anymore

The sunxi_x_g2d.c file contains the midlayer code for hooking the
G2D optimized blit into xserver. But in fact it does not strictly
need to depend on anything sunxi specific.

So now we introduce a simple "blt2d_i" interface struct which
specifically provides a pointer to the accelerated blit function.
And just use this interface struct instead of the whole "sunxi_disp_t".
This allows to easily reuse the same code for other non-G2D or even
non-sunxi blit implementations in the future.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoCPU: Remove unneeded test program bundled with runtime CPU detection
Siarhei Siamashka [Tue, 26 Mar 2013 18:35:14 +0000 (20:35 +0200)]
CPU: Remove unneeded test program bundled with runtime CPU detection

The 'main' function got there by accident and was not spotted
earlier because the driver itself is a shared library.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoCPU: Added code for runtime CPU features detection
Siarhei Siamashka [Tue, 26 Mar 2013 14:30:14 +0000 (16:30 +0200)]
CPU: Added code for runtime CPU features detection

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoG2D: enable accelerated blits for 16bpp color depth
Siarhei Siamashka [Fri, 22 Mar 2013 00:33:42 +0000 (02:33 +0200)]
G2D: enable accelerated blits for 16bpp color depth

This is still not perfect, because G2D can't saturate memory bandwidth
for this color depth (it is fillrate limited). We should emulate 16bpp blits
with 32bpp blits whenever it is possible.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoG2D: attempt loading 'g2d_23' kernel module
Siarhei Siamashka [Fri, 22 Mar 2013 00:27:24 +0000 (02:27 +0200)]
G2D: attempt loading 'g2d_23' kernel module

It might be not statically compiled into kernel (for example in Fedora),
so we should try to explictly load it.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoG2D: accelerate CopyArea between different pixmaps in framebuffer
Siarhei Siamashka [Thu, 21 Mar 2013 04:14:43 +0000 (06:14 +0200)]
G2D: accelerate CopyArea between different pixmaps in framebuffer

Now source and destination pixmaps don't need to be the same for
using G2D acceleration (as long as both of them are allocated in
the framebuffer). This allows using G2D to copy pixels from DRI2
buffers to the framebuffer on the fallback path (when the window
of an OpenGL ES application is partially overlapped by some other
windows). Though it only works when composite extension is
disabled, for example by adding the following to xorg.conf:

    Section "Extensions"
        Option "Composite" "Disable"
    EndSection

If composite extension is enabled, windows have backing pixmaps, and
we have a longer chain of copies:

   DRI2 buffer -> backing pixmap -> framebuffer

Because backing pixmap is not allocated in a physically contiguous
memory, it can't be copied using G2D yet.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoSuppress "[DISP] not supported scaler input pixel format:0" dmesg spam
Siarhei Siamashka [Tue, 19 Mar 2013 22:16:41 +0000 (00:16 +0200)]
Suppress "[DISP] not supported scaler input pixel format:0" dmesg spam

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoG2D: Hardware acceleration for XCopyArea (initially 32bpp only)
Siarhei Siamashka [Mon, 18 Mar 2013 21:34:37 +0000 (23:34 +0200)]
G2D: Hardware acceleration for XCopyArea (initially 32bpp only)

Wrap CreateGC function to add a hook for CopyArea operation, which
can be accelerated using G2D for the buffers inside of the visible
part of the framebuffer. In the future we may try to also ensure
that DRI2 buffers are copied using G2D instead of CPU in the case
if we hit the fallback path and can't avoid this copy.

Benchmark using "x11perf -scroll500 -copywinwin500":

=== ShadowFB (software rendering) ===

   3000 reps @   2.0308 msec (   492.0/sec): Scroll 500x500 pixels
   3000 reps @   1.9741 msec (   507.0/sec): Scroll 500x500 pixels
   3000 reps @   1.9826 msec (   504.0/sec): Scroll 500x500 pixels
   3000 reps @   1.9830 msec (   504.0/sec): Scroll 500x500 pixels
   3000 reps @   1.9965 msec (   501.0/sec): Scroll 500x500 pixels
  15000 trep @   1.9934 msec (   502.0/sec): Scroll 500x500 pixels

   1600 reps @   3.3054 msec (   303.0/sec): Copy 500x500 from window to window
   1600 reps @   3.3179 msec (   301.0/sec): Copy 500x500 from window to window
   1600 reps @   3.2263 msec (   310.0/sec): Copy 500x500 from window to window
   1600 reps @   3.2491 msec (   308.0/sec): Copy 500x500 from window to window
   1600 reps @   3.2357 msec (   309.0/sec): Copy 500x500 from window to window
   8000 trep @   3.2669 msec (   306.0/sec): Copy 500x500 from window to window

=== G2D (hardware acceleration) ===

   3000 reps @   2.1949 msec (   456.0/sec): Scroll 500x500 pixels
   3000 reps @   2.1929 msec (   456.0/sec): Scroll 500x500 pixels
   3000 reps @   2.1923 msec (   456.0/sec): Scroll 500x500 pixels
   3000 reps @   2.1889 msec (   457.0/sec): Scroll 500x500 pixels
   3000 reps @   2.1941 msec (   456.0/sec): Scroll 500x500 pixels
  15000 trep @   2.1926 msec (   456.0/sec): Scroll 500x500 pixels

   2800 reps @   1.8114 msec (   552.0/sec): Copy 500x500 from window to window
   2800 reps @   1.8103 msec (   552.0/sec): Copy 500x500 from window to window
   2800 reps @   1.8160 msec (   551.0/sec): Copy 500x500 from window to window
   2800 reps @   1.8099 msec (   553.0/sec): Copy 500x500 from window to window
   2800 reps @   1.8126 msec (   552.0/sec): Copy 500x500 from window to window
  14000 trep @   1.8120 msec (   552.0/sec): Copy 500x500 from window to window

CPU usage remains low when running this test with G2D acceleration enabled.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoDRI2: fix build problem introduced by the previous commit (stray line)
Siarhei Siamashka [Sun, 17 Mar 2013 21:31:12 +0000 (23:31 +0200)]
DRI2: fix build problem introduced by the previous commit (stray line)

Reported-by: Maurice de la Ferté <kadava@gmx.de>
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoDRI2: more informative messages for /var/log/Xorg.0.log
Siarhei Siamashka [Sun, 17 Mar 2013 20:52:29 +0000 (22:52 +0200)]
DRI2: more informative messages for /var/log/Xorg.0.log

Explain that AIGLX is normally expected to fail and the users should
not really worry about it. Also provide a warning in the case if the
driver has been compiled without libUMP support (it could be that
the user actually wanted 3D acceleration, but just has not installed
all the needed dependencies).

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoDRI2: Typo fixes (need to return NULL instead of FALSE)
Siarhei Siamashka [Sat, 16 Mar 2013 22:23:04 +0000 (00:23 +0200)]
DRI2: Typo fixes (need to return NULL instead of FALSE)

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agotest: Added missing sunxi_disp_close() to sunxi_g2d_bench
Siarhei Siamashka [Fri, 15 Mar 2013 22:30:32 +0000 (00:30 +0200)]
test: Added missing sunxi_disp_close() to sunxi_g2d_bench

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agotest: Added a simple synthetic benchmark for G2D
Siarhei Siamashka [Fri, 15 Mar 2013 16:21:38 +0000 (18:21 +0200)]
test: Added a simple synthetic benchmark for G2D

It measures MPix/s numbers for blit and fill operations done
by G2D, and also for comparison tests the performance of the
same operations done by pixman (software rendering).

G2D has clock frequency configured to be half of the RAM clock
frequency. So for 480 MHz RAM, we have G2D clocked at 240 MHz,
which means that no more than 240 MPix can be processed per
second. Unfortunately this limits the performance of a simple
operation such as solid fill.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoIntroduce experimental G2D acceleration
Siarhei Siamashka [Thu, 14 Mar 2013 17:08:07 +0000 (19:08 +0200)]
Introduce experimental G2D acceleration

This initial G2D support code can speed up moving windows in XFCE. Currently
disabled by default, but can be enabled by editing /etc/X11/xorg.conf and
adding the following line to the "Device" section:

        Option          "AccelMethod" "G2D"

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoReuse the already existing xserver framebuffer mapping for sunxi_disp_t
Siarhei Siamashka [Thu, 14 Mar 2013 00:42:52 +0000 (02:42 +0200)]
Reuse the already existing xserver framebuffer mapping for sunxi_disp_t

Avoid creating a new mapping because that's a waste of the virtual address
space. Also we are going to use this xserver framebuffer mapping address
for testing whether window backing pixmaps are allocated in the framebuffer
and can be accelerated by G2D.

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agotest: use G2D acceleration in sunxi_disp_vsync_demo
Siarhei Siamashka [Wed, 13 Mar 2013 04:44:39 +0000 (06:44 +0200)]
test: use G2D acceleration in sunxi_disp_vsync_demo

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoAdded ioctl wrappers for simple G2D fill and blit operations
Siarhei Siamashka [Wed, 13 Mar 2013 04:28:26 +0000 (06:28 +0200)]
Added ioctl wrappers for simple G2D fill and blit operations

The existing kernel driver from Allwinner for G2D accelerator
is quite bad because ioctls are synchronous and blocking the
caller thread, compromise security (basically it is a backdoor
for copying data in memory between any arbitrary physical
addresses) and have high overhead (each individual fill or
blit operation needs an ioctl). But we need to start with
something, so use this stuff as a placeholder.

The g2d_driver.h header file is taken from linux-sunxi-3.4

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoAdded 'test' directory and a demo program for testing tear-free animation
Siarhei Siamashka [Wed, 13 Mar 2013 00:20:38 +0000 (02:20 +0200)]
Added 'test' directory and a demo program for testing tear-free animation

It is basically the first test program for the sunxi disp ioctls wrapper
code from "src/sunxi_disp.c".

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
11 years agoFree sunxi_disp_t struct directly from sunxi_disp_close()
Siarhei Siamashka [Tue, 12 Mar 2013 14:39:22 +0000 (16:39 +0200)]
Free sunxi_disp_t struct directly from sunxi_disp_close()

Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>