Qbasicnews.com

Full Version: UGL vs DQB
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Why is UGL's pset faster in 320x200x8 than DQB? (UGL psets the same # of pixels in 3/5th the time by my tests)

I'm looking at the source code, and by all rights it seems like it should be slower. I guess I have more to learn about asm optimization.

EDIT: What I mean is, if you copy in the proc that UGL's pset calls, it basically reduces to DQB's (unless I'm missing something).

Blitz, anyone? Can you explain it to me?

All the relevant code is a little too much to post:
DQB 1.71 with source code (I hope it's ok that I'm linking to your site, Adigun - I'll edit it out if it's not)
DQBpset in draw.asm (line 425), which calls GetLayerSeg (line 209) in main.asm

latest version of UGL with source code (ditto Blitz)
uglPSet (line 35) in src\ugl\uglpixel.asm, which calls b8_pSet (line 13) in src\cfmt\b8\8pixel.asm and bnk_RdAccess (line 267) in src\dct\dctbnk.asm (I think, or possibly mem_RdAccess in dctmem.asm)
All the calls that happen internally within ugl are near calls. near calls are much much much faster then far calls. And you'll especially notice that when the called routine is as small as a pset. Another probable factor is the getlayerseg is probably slow. GetLayerSeg does compares to check what it is. UGL jumps directly to handler for the sort of memory the dc is. Clever huh?
Very clever indeed.

I stripped out the call to GetLayerSeg, and the speed increase was negligible (from 1.05 seconds to 1.04, UGL: .62)... Does UGL use a lookup table for scanline pointers or something?
Yeah, it does
Ah, I guess that explains it. I wonder why none of the 13h-only libs do that, 400 bytes is a small price to pay.
Wouldn't do much, afterall you only need to calculate the adress once. After that you just do one add each scanline.
Changing scanline length i
Quote:Ah, I guess that explains it. I wonder why none of the 13h-only libs do that, 400 bytes is a small price to pay.

What do you mean?

I usually use.

;ax=y
;bx=x

Xchg al, ah
mov di, ax
shr di, 2
add di, ax
add di, bx


Can you explain the use of a LUT in 13h?
Quote:;ax=y
;bx=x

Xchg al, ah
mov di, ax
shr di, 2
add di, ax
add di, bx

hrm...That code has 4 AGI stalls in it alone. That in itself would account for the slow nature of calculating the pixel address. It is a clever method, but four AGI stalls just aren't worth it. Consider using that method combined with an LUT to speed things up.

Anyway, 320x200 8-bpp LUT code (simple, but there is an AGI stall):

;Assumes Scanline[0...199] dw is pregenerated with scanline addresses
;ebx=y
;ax=x

mov bx, Scanline[ebx*2]
add bx, ax
I actually put somethings in between those commands to do away with AGI. ;*)