Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
pixel speed
#1
PSET (x, y), color is the normal pixel setting command.
you can make a much faster version with

POKE 320& * Y + X, color
(only screen 13)

but, what if that's still too slow?
what can I do?
can someone tell me the assembly equivelent to this command?
will it be faster?

any help would be highly appreciated.
Reply
#2
Quote:PSET (x, y), color is the normal pixel setting command.
you can make a much faster version with

POKE 320& * Y + X, color
(only screen 13)

but, what if that's still too slow?
what can I do?
can someone tell me the assembly equivelent to this command?
will it be faster?

any help would be highly appreciated.
Blit instead of plotting pixels. Especially for large areas.
Reply
#3
What Z!re says. Plotting to the screen is slower than writing to RAM, so it's better to make your image in RAM and then copy it to video memory in a blast with a memcopy routine in assemby.

Anyhow, POKE 320&*Y% + X%, C% can be speeded up.

Note that you re performing a 32 bits multiplication in a 16 bits platform. That takes a lot of time. Replacing 320& for a 320 would make the multiplication a faster "16 bits one", but there's a glitch in the QB IDE: With larger values of Y%, 320*Y% doesn't fit in a signed, 16 bits integer, and QB doesn't allow for unsigned integers. *but* if you compile your program, it will work in the executable. So a POKE 320*Y%+X%, C% will pop up an error in the IDE for values of Y% which result on 320*Y%>32767, but it will work compiled.

Anyhow, we can even optimize it further, using shifts instead of multiplication. Multiplying a number by 2 means shifting it right once, taking in binary. For example 4*2 = 8, and 4 is 0100 and 8 is 1000. Note how it's been shifted left once. If you look closer, 320 = 256 + 64. 256 and 64 are powers of two, i.e. 256 = 2^8 and 64 = 2^5. That means that multiplying by 256 is the same thing as shifting left 8 times and multiplying by 64 is the same thing as doing it 6 times.

QB detects multiplications and divisions by powers of two and substitutes them by shifts on the compiled code. That means that it finds X%*256 and substitutes it by X% SHL 8. So

Code:
POKE 320*Y% + X%, C%

can be done faster (in compiled code, this is) with:

Code:
POKE 256*Y% + 64*Y% + X%, C%

But beware, it won't work in the IDE 'cause of the signed/unsigned problem mentioned.

Anyway, in assembly it would be some kind of:

Code:
mov ax, Y
shl ax, 8
mov bx, ax
shr ax, 2
add bx, ax
mov ax, X
add bx, ax
mov al, C
mov es, 0a000h
mov es:[bx], al

Anyhow, my assembly sucks so probably this is a very bad snippet Big Grin
SCUMM (the band) on Myspace!
ComputerEmuzone Games Studio
underBASIC, homegrown musicians
[img]http://www.ojodepez-fanzine.net/almacen/yoghourtslover.png[/i
Reply
#4
IF you want to learn to use assembly language effectively, check out http://webster.cs.ucr.edu/AoA/index.html.
Look for the 16 bit version of Randal Hydes free book.
It should only take about a week to read, and even if you don't find yourself coding in assembly often, you'll definitely develope a better understanding of how your programs work and develope a wider sense of how to accomplish any given programming task.



In the short, your poke statement can be accomplished like this. This isn't the only way, and the code syntax I'm using here is very generic. Each assembler has its own quirks. Also, this is not a full routine. It doesn't bother to save and restore the registers that it affects, and since it doesn't set up a stack frame, it won't function as a library call. What's more, I am assuming you are writing a 16 bit dos application for use with QB.
For FreeBasic, there would be magor differences but the principles would be the same.

Code:
mov ax, 0xa000
mov es, ax  'set the es segment register to the
            'begining of the video buffer
mov ax, word [Y]  'load a word from memory location Y
                  'into the ax register
mul word 320      'multiply the ax register by 320
                  'this will also affect the DX
                  'registers, but if your variables
                  'are in the proper range it will
                  'simply be set to 0.  We could also
                  'use the full 32 bit register
                  'EAX
add ax, word [X]  'add the word at memory location X
                  'to the ax register
                  'ax now holds the offset Y*320+X
mov di , ax       'load the di register with the
                  'contents of the ax register
mov al, byte [color] 'load the low byte of the ax
                     'registers with the contents of
                     'memory location color.
mov [es:di], al      'load the memory location address
                     'by es:di with the contents of
                     'the low byte of the ax register
Now, would this be faster than a POKE statement? In essence yes, but in practise, maybe.

Consider this simple program

DEFINT A-Z
DEF SEG= &HA000
POKE 320& * Y + X , c

Here is a listing of the assembly code generated by BC.EXE.

Code:
Offset  Data    Source Line      Microsoft (R) QuickBASIC Compiler Version 4.50

0030   0006    DEFINT A-Z
0030   0006    DEF SEG = &HA000
0030   0006    POKE 320& * Y + X, c
0030    **            I00002: mov   ax,0A000h
0033    **                    push  ax
0034    **                    call  B$DSEG
0039    **                    mov   ax,0140h
003C    **                    cwd  
003D    **                    push  dx
003E    **                    push  ax
003F    **                    mov   ax,Y%
0042    **                    cwd  
0043    **                    push  dx
0044    **                    push  ax
0045    **                    call  B$MUI4
004A    **                    mov   0F4h[bp],dx
004D    **                    mov   0F2h[bp],ax
0050    **                    mov   ax,X%
0053    **                    cwd  
0054    **                    add   ax,0F2h[bp]
0057    **                    adc   dx,0F4h[bp]
005A    **                    mov   bx,ax
005C    **                    mov   ax,C%
005F    **                    mov   es,__bseg%
0063    **                    es:  
0064    **                    mov   [bx],al
0066   000C    
0066   000C    
0066   000C    
0066    **                    call  B$CENP
006B   000C

First thing you should notice is that QB treats all 16 bit integers as signed, which means that 320*200 is out of bounds, so it uses both the dx and ax
registers during calculation(that why you used the & to perform long integer calculations). Also, notice that it does not implement multiplication with merely
opcodes, but rather uses the run time routine B$MUI4, which most likely performs error checking.
All of this would be slower than the code I presented above. ON THE OTHER hand, to actually use assembly code in QB requires some over head. The best case scenario involves assembling a library routine and calling that from within QB, in which case the code would have to be ammended to handle a stack frame, manage parameters on the stack, and preserve registers. The calling program would have to convert the parameters to the right format, and all of this takes time.

But even the code I've given is far from optimal. The multiplication instruction is time consuming when compared to a well known trick for handling addresses in screen mode 13. x * 320 is equivalent to (x*64)+(x*256). Since 64 and 256 are both powers of two, these multiplications can be handled by bit shifting instructions.

mov ax, word [Y]
mov bx, ax
shl ax, 8 ' ax*256 = ax* (2^8)
shl bx, 6 ' bx*64 = ax*(2^6)
add ax, bx
add ax, word [X]

Then again, I have no idea how modern mircropoccessors implement multiplication, so the old bit shifting trick may just be folk lore at this point. Also, any aditional functionality would require important changes. Boundary checking, transparent pixel setting... everything requires a different approach.

The main thing I want to stress is that if you really want to resort to assembly language, take a little time and learn assembly language. Assembly code is not a cure all, and more importantly, it is not general. Each solution is good only for the problem it is designed to solve. Like Z!re said, if you need to plot many pixels at the same time, a blitting routine will perform much much better because x86 instruction set has opcodes specifically for that task. If you want general, reusable code, stick with high level languages. That's part of why they were invented.
Reply
#5
Quote:Anyhow, my assembly sucks so probably this is a very bad snippet Big Grin

If I'd have known you were posting this while I was composing my reply, I could have saved some time Smile

You codes not that bad, and I like the way you shifted left, then right, rather than using two left shifts like I did.
Reply
#6
Quote:Blit instead of plotting pixels. Especially for large areas.
it's scrolling a background, so yes, a large area.
the only problem is, I don't have a clue of what your talking about.
(blitting?) :oops:
I'm sorry, but could you explain how to do this, please? :???:

-----
Reply
#7
She means this:

Quote:What Z!re says. Plotting to the screen is slower than writing to RAM, so it's better to make your image in RAM and then copy it to video memory in a blast with a memcopy routine in assemby.
SCUMM (the band) on Myspace!
ComputerEmuzone Games Studio
underBASIC, homegrown musicians
[img]http://www.ojodepez-fanzine.net/almacen/yoghourtslover.png[/i
Reply
#8
Quote:She means this:

na_th_an Wrote:What Z!re says. Plotting to the screen is slower than writing to RAM, so it's better to make your image in RAM and then copy it to video memory in a blast with a memcopy routine in assemby.
Or use FreeBASIC, which is faster.. Big Grin
Reply
#9
double buffer.

Do all your graphical commands on a big block of memory that you have in the RAM, which is faster than accessing the video memory. Then copy the buffer to the video mem.
f you play a Microsoft CD backwards you can hear demonic voices. The scary part is that if you play it forwards it installs Windows.
Reply
#10
sorry about my long absence, been really busy.
so, what memory copying functions can you use?
if I make my graphics screen in a block of memory with POKE, then
POKE it into the video buffer one byte at a time, I doubt it'll be faster.
I guess there is an ASM routine for it?
can some one explain it to me?
my dead line's comming up fast.
I'm kinda stressed out here.

thanks for all the help so far though, at least I know what has to be done generaly.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)