Qbasicnews.com

Pages: 1 2

Yay, I've had the bright Idea to make a HTML decrypter. Its not much now, but its somthing I'm playing around with for fun.. take a look...
(Reads TXT & HTM, but fails to read HTML dubbed files)

Code:
DECLARE SUB PARAread ()

DECLARE SUB BODYRead ()

DECLARE SUB HEADRead ()

DECLARE SUB TTLPRT ()

DECLARE SUB Overflow ()

DECLARE SUB TAGCheck ()

DECLARE SUB OpenHTML ()

DECLARE SUB HTMLread ()

DECLARE SUB HTMLCheck ()

DIM SHARED html$, cont, htm$, para$, header$, html2$

CLS

INPUT "Enter .HTM Document:", htm$

CLS

CALL HTMLCheck

CALL OpenHTML

CALL TAGCheck

cont = 0

CALL HTMLread

PRINT cont

SUB BODYRead

DO

cont = cont + 1

IF cont = LEN(html$) THEN END

IF MID$(UCASE$(html$), cont, 3) = "<P>" THEN cont = cont + 2: CALL PARAread

LOOP

END SUB

SUB HEADRead

DO

cont = cont + 1

IF cont = LEN(html$) THEN END

IF MID$(UCASE$(html$), cont, 7) = "<TITLE>" THEN cont = cont + 6: CALL TTLPRT

LOOP

END SUB

SUB HTMLCheck

cont = 0

DO

cont = cont + 1

IF cont = LEN(htm$) THEN PRINT " Not .HTM/.TXT Document ": END

IF MID$(UCASE$(htm$), cont, 3) = "TXT" THEN EXIT DO

IF MID$(UCASE$(htm$), cont, 3) = "HTM" THEN EXIT DO

LOOP

END SUB

SUB HTMLread

DO

cont = cont + 1

IF cont = LEN(html$) THEN END

IF MID$(UCASE$(html$), cont, 6) = "<HEAD>" THEN CALL HEADRead

IF MID$(UCASE$(html$), cont, 6) = "<BODY>" THEN CALL BODYRead

IF MID$(UCASE$(html$), cont, 7) = "</HTML>" THEN END

LOOP

END SUB

SUB OpenHTML

OPEN htm$ FOR INPUT AS #1

DO

IF EOF(1) = -1 THEN EXIT DO

LINE INPUT #1, a$

html$ = html$ + a$

IF LEN(html$) = 199 THEN PRINT " File To Big! ": END

LOOP

CLOSE #1

END SUB

SUB PARAread

DO

cont = cont + 1

IF cont = LEN(html$) THEN END

IF MID$(UCASE$(html$), cont, 4) = "</P>" THEN EXIT DO

phs$ = phs$ + MID$(html$, cont, 1)

LOOP

PRINT

PRINT phs$

CALL HTMLread

END SUB

SUB TAGCheck

cont = 0

DO

cont = cont + 1

IF cont = LEN(html$) THEN PRINT " No <html> tag located ": END

IF MID$(UCASE$(html$), cont, 6) = "<HTML>" THEN EXIT DO

LOOP

END SUB

SUB TTLPRT

DO

cont = cont + 1

IF cont = LEN(html$) THEN END

IF MID$(UCASE$(html$), cont, 8) = "</TITLE>" THEN EXIT DO

ttl$ = ttl$ + MID$(html$, cont, 1)

LOOP

PRINT ttl$

PRINT

CALL HTMLread

END SUB

Text File or HTM file...

Code:
<HTML>

<HEAD>

<TITLE>QB HTML READER!</TITLE>

</HEAD>

<BODY>

<P>This is a HTML paragraph being read by QBasic Yay!!</P>

</BODY>

</HTML>

It is limited to <html>, <head>, <body>, , <title>, and their reverse tags.. Also sofar the file has to be less than 199 charaters.. Big Grin

This is just a decrypter for now, not a veiwer, so no mouse function is present.

Note: If you also like to mess around and improve on it, its a great introduction to using the MID$() statment for scanning files for infomation, feel free to do so.. :wink:

Code:
IF LEN(html$) = 199 THEN PRINT " File To Big! ": END

199 chars? why? :o

Pretty neat code, but you don't need to use CALL. Just leave it out:

Code:
CALL foo(a, b)

is the same as:

Code:
foo a, b

I'm under the impression that QB strings can only holds 200 chars, and I use 199 to be safe.. :wink:

Yeah, Mitth told me that to, but putting CALL has become habit, I do it unknowingly so it doesn't bother me..

Thanks, heh heh,. I was helping Nathan1993 with a bug in his parser system, and I learned alot about the MID$, and I'm playing around with it.. The simple code I've posted only reads for 1 tag.. it drops outs the BODYread before checking for more.. that can be fixxed by changing..

Code:
CALL HTMLread

to

CALL BODYread

in the PARAread SUB, leaving out the CALL if you wish,. Say, how many chars can a string$ hold any way, I've haven't got around to checking it yet? :wink:

QB strings can handle up to 32K characters, i.e. 32767 if defined correctly.

I'd suggest you not to read everything to a variable and then work with that variable, but read the file as you interpret it. Then, 2 Gigs will be your limit Smile

Yeah thanks, thats what I read in QB help (Finially took the time to look it up :lol: )... Here is a update un the HTML decypter that holds the 32767 chars and reads more than 1 tag now

Its at its best at the moment,. anymore and I'll have to convert to gfx texts to display <Hx> tags,. Its just a small decrypter still, and I don't plan to make it a viewer any time soon..

Code:
DECLARE SUB PARAread ()

DECLARE SUB BODYread ()

DECLARE SUB HEADRead ()

DECLARE SUB TTLPRT ()

DECLARE SUB TAGCheck ()

DECLARE SUB OpenHTML ()

DECLARE SUB HTMLread ()

DECLARE SUB HTMLCheck ()

DIM SHARED html$, cont, htm$, para$, header$, html2$

CLS

INPUT "Enter .HTM/.TXT Document:", htm$

CLS

CALL HTMLCheck

CALL OpenHTML

CALL TAGCheck

cont = 0

CALL HTMLread

PRINT cont

SUB BODYread

DO

cont = cont + 1

IF cont = LEN(html$) THEN END

IF MID$(UCASE$(html$), cont, 3) = "<P>" THEN cont = cont + 2: CALL PARAread

LOOP

END SUB

SUB HEADRead

DO

cont = cont + 1

IF cont = LEN(html$) THEN END

IF MID$(UCASE$(html$), cont, 7) = "<TITLE>" THEN cont = cont + 6: CALL TTLPRT

LOOP

END SUB

SUB HTMLCheck

cont = 0

DO

cont = cont + 1

IF cont = LEN(htm$) THEN PRINT " Not .HTM/.TXT Document ": END

IF MID$(UCASE$(htm$), cont, 3) = "TXT" THEN EXIT DO

IF MID$(UCASE$(htm$), cont, 3) = "HTM" THEN EXIT DO

LOOP

END SUB

SUB HTMLread

DO

cont = cont + 1

IF cont = LEN(html$) THEN END

IF MID$(UCASE$(html$), cont, 6) = "<HEAD>" THEN CALL HEADRead

IF MID$(UCASE$(html$), cont, 5) = "<BODY" THEN CALL BODYread

IF MID$(UCASE$(html$), cont, 7) = "</HTML>" THEN END

LOOP

END SUB

SUB OpenHTML

OPEN htm$ FOR INPUT AS #1

DO

IF EOF(1) = -1 THEN EXIT DO

LINE INPUT #1, a$

html$ = html$ + a$

IF LEN(html$) = 32767 THEN PRINT " File To Big! ": END

LOOP

CLOSE #1

END SUB

SUB PARAread

DO

cont = cont + 1

IF cont = LEN(html$) THEN END

IF MID$(UCASE$(html$), cont, 4) = "</P>" THEN EXIT DO

phs$ = phs$ + MID$(html$, cont, 1)

LOOP

PRINT

PRINT phs$

CALL BODYread

END SUB

SUB TAGCheck

cont = 0

DO

cont = cont + 1

IF cont = LEN(html$) THEN PRINT " No <html> tag located ": END

IF MID$(UCASE$(html$), cont, 6) = "<HTML>" THEN EXIT DO

LOOP

END SUB

SUB TTLPRT

DO

cont = cont + 1

IF cont = LEN(html$) THEN END

IF MID$(UCASE$(html$), cont, 8) = "</TITLE>" THEN EXIT DO

ttl$ = ttl$ + MID$(html$, cont, 1)

LOOP

PRINT ttl$

PRINT

CALL HTMLread

END SUB

Text Document. or HTM Document.

Code:
<HTML>

<HEAD>

<TITLE>QB HTML READER!</TITLE>

</HEAD>

<BODY>

<P>This is a HTML paragraph being read by QBasic, Yay!!</P>

<P>This is the second paragraph in QB!!</P>

<P>What the heck lets go for THREE!!</P>

<P>Stressing the string$ size it holds up 32767 chars 

for reading most HTML code as it is doing now! 

Enjoy this program. Its great for those who would 

like to learn how to use MID$() to scan files!</P>

</BODY>

</HTML>

Well there it is,.. Like I said befor, anyone interested in using the MID$() to read and sort information, this is a good example I think... And also any who like the idea can improve on it... Smile

This is still limited to <html>, <head>, <body>, <title>, , and their reverse tags.. but I moved the chars size up.. :wink: and if you havn't noticed, this reads more than one ,.. Its also case insesitive, so you can use <HTML>, <HtMl>, or <html> and it still work. Big Grin

As a next step, you should acknowledge encoded characters, since this is easy to do. For example, the < and > characters are > and <. If you view-source this page, you'll see them encoded, and the > will look like &gt; to prevent it from being rendered.

To be 'future-proof', a markup parser is to ignore any elements it doesn't understand and render the inside content as if it's belonging to the parent element.

You could just make an XML parser that refuses documents that aren't well-formed. That would be nifty. And for all the valid XHTML/CSS sites out there (like mine), you'd be able to parse their markup as if it's XML.

I wrote an HTML render engine in QB a few years back and coupled it with my Twisted Sock library and came up with a functional QB-based web browser called "SockWeb". It made me realize just how much work goes into interpretting HTML. If you can keep going with this, it would be very very interesting. Big Grin

Quote:
Code:
IF LEN(html$) = 199 THEN PRINT " File To Big! ": END

199 chars? why? :o

Pretty neat code, but you don't need to use CALL. Just leave it out:

Code:
CALL foo(a, b)

is the same as:

Code:
foo a, b

Well na_th_an, it is a matter of taste, but I prefer CALL. It makes it easy to identify SUBs as opposed to commands and is, IMHO, good documentation at a cost of a few characters.

CALL foo(a, b) - 14 characters
foo a, b - 8 characters

Mac

But then you lose the feel of it being a command.

i mean, you dont say

CALL print ("hello")

It's close to an XML parser too. The trick there is deciding what sort of representation to use to hold the parsed tree in memory to work with it.

Pages: 1 2

Rattrapmax6

na_th_an

Rattrapmax6

na_th_an

Rattrapmax6

wizardlife

adosorken

Mac

KiZ

dilettante