Qbasicnews.com

Full Version: HTML Decrypter. In QBasic!!
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
Smile Yay, I've had the bright Idea to make a HTML decrypter. Its not much now, but its somthing I'm playing around with for fun.. take a look...
(Reads TXT & HTM, but fails to read HTML dubbed files)

Code:
DECLARE SUB PARAread ()
DECLARE SUB BODYRead ()
DECLARE SUB HEADRead ()
DECLARE SUB TTLPRT ()
DECLARE SUB Overflow ()
DECLARE SUB TAGCheck ()
DECLARE SUB OpenHTML ()
DECLARE SUB HTMLread ()
DECLARE SUB HTMLCheck ()
DIM SHARED html$, cont, htm$, para$, header$, html2$
CLS
INPUT "Enter .HTM Document:", htm$
CLS
CALL HTMLCheck
CALL OpenHTML
CALL TAGCheck
cont = 0
CALL HTMLread
PRINT cont

SUB BODYRead
DO
cont = cont + 1
IF cont = LEN(html$) THEN END
IF MID$(UCASE$(html$), cont, 3) = "<P>" THEN cont = cont + 2: CALL PARAread
LOOP
END SUB

SUB HEADRead
DO
cont = cont + 1
IF cont = LEN(html$) THEN END
IF MID$(UCASE$(html$), cont, 7) = "<TITLE>" THEN cont = cont + 6: CALL TTLPRT
LOOP
END SUB

SUB HTMLCheck
cont = 0
DO
cont = cont + 1
IF cont = LEN(htm$) THEN PRINT " Not .HTM/.TXT Document ": END
IF MID$(UCASE$(htm$), cont, 3) = "TXT" THEN EXIT DO
IF MID$(UCASE$(htm$), cont, 3) = "HTM" THEN EXIT DO
LOOP

END SUB

SUB HTMLread
DO
cont = cont + 1
IF cont = LEN(html$) THEN END
IF MID$(UCASE$(html$), cont, 6) = "<HEAD>" THEN CALL HEADRead
IF MID$(UCASE$(html$), cont, 6) = "<BODY>" THEN CALL BODYRead
IF MID$(UCASE$(html$), cont, 7) = "</HTML>" THEN END
LOOP
END SUB

SUB OpenHTML
OPEN htm$ FOR INPUT AS #1
DO
IF EOF(1) = -1 THEN EXIT DO
LINE INPUT #1, a$
html$ = html$ + a$
IF LEN(html$) = 199 THEN PRINT " File To Big! ": END
LOOP
CLOSE #1
END SUB

SUB PARAread
DO
cont = cont + 1
IF cont = LEN(html$) THEN END
IF MID$(UCASE$(html$), cont, 4) = "</P>" THEN EXIT DO
phs$ = phs$ + MID$(html$, cont, 1)
LOOP
PRINT
PRINT phs$
CALL HTMLread
END SUB

SUB TAGCheck
cont = 0
DO
cont = cont + 1
IF cont = LEN(html$) THEN PRINT " No <html> tag located ": END
IF MID$(UCASE$(html$), cont, 6) = "<HTML>" THEN EXIT DO
LOOP
END SUB

SUB TTLPRT
DO
cont = cont + 1
IF cont = LEN(html$) THEN END
IF MID$(UCASE$(html$), cont, 8) = "</TITLE>" THEN EXIT DO
ttl$ = ttl$ + MID$(html$, cont, 1)
LOOP
PRINT ttl$
PRINT
CALL HTMLread
END SUB

Text File or HTM file...

Code:
<HTML>
<HEAD>
<TITLE>QB HTML READER!</TITLE>
</HEAD>
<BODY>
<P>This is a HTML paragraph being read by QBasic Yay!!</P>
</BODY>
</HTML>

It is limited to <html>, <head>, <body>, <p>, <title>, and their reverse tags.. Also sofar the file has to be less than 199 charaters.. Big Grin This is just a decrypter for now, not a veiwer, so no mouse function is present.

Note: If you also like to mess around and improve on it, its a great introduction to using the MID$() statment for scanning files for infomation, feel free to do so.. :wink:
Code:
IF LEN(html$) = 199 THEN PRINT " File To Big! ": END

199 chars? why? :o

Pretty neat code, but you don't need to use CALL. Just leave it out:

Code:
CALL foo(a, b)

is the same as:

Code:
foo a, b
I'm under the impression that QB strings can only holds 200 chars, and I use 199 to be safe.. :wink:

Yeah, Mitth told me that to, but putting CALL has become habit, I do it unknowingly so it doesn't bother me..

Thanks, heh heh,. I was helping Nathan1993 with a bug in his parser system, and I learned alot about the MID$, and I'm playing around with it.. The simple code I've posted only reads for 1 <P> tag.. it drops outs the BODYread before checking for more.. that can be fixxed by changing..

Code:
CALL HTMLread

to

CALL BODYread

in the PARAread SUB, leaving out the CALL if you wish,. Say, how many chars can a string$ hold any way, I've haven't got around to checking it yet? :wink:
QB strings can handle up to 32K characters, i.e. 32767 if defined correctly.

I'd suggest you not to read everything to a variable and then work with that variable, but read the file as you interpret it. Then, 2 Gigs will be your limit Smile
Smile Yeah thanks, thats what I read in QB help (Finially took the time to look it up :lol: )... Here is a update un the HTML decypter that holds the 32767 chars and reads more than 1 <P> tag now

Its at its best at the moment,. anymore and I'll have to convert to gfx texts to display <Hx> tags,. Its just a small decrypter still, and I don't plan to make it a viewer any time soon..

Code:
DECLARE SUB PARAread ()
DECLARE SUB BODYread ()
DECLARE SUB HEADRead ()
DECLARE SUB TTLPRT ()
DECLARE SUB TAGCheck ()
DECLARE SUB OpenHTML ()
DECLARE SUB HTMLread ()
DECLARE SUB HTMLCheck ()
DIM SHARED html$, cont, htm$, para$, header$, html2$
CLS
INPUT "Enter .HTM/.TXT Document:", htm$
CLS
CALL HTMLCheck
CALL OpenHTML
CALL TAGCheck
cont = 0
CALL HTMLread
PRINT cont

SUB BODYread
DO
cont = cont + 1
IF cont = LEN(html$) THEN END
IF MID$(UCASE$(html$), cont, 3) = "<P>" THEN cont = cont + 2: CALL PARAread
LOOP
END SUB

SUB HEADRead
DO
cont = cont + 1
IF cont = LEN(html$) THEN END
IF MID$(UCASE$(html$), cont, 7) = "<TITLE>" THEN cont = cont + 6: CALL TTLPRT
LOOP
END SUB

SUB HTMLCheck
cont = 0
DO
cont = cont + 1
IF cont = LEN(htm$) THEN PRINT " Not .HTM/.TXT Document ": END
IF MID$(UCASE$(htm$), cont, 3) = "TXT" THEN EXIT DO
IF MID$(UCASE$(htm$), cont, 3) = "HTM" THEN EXIT DO
LOOP

END SUB

SUB HTMLread
DO
cont = cont + 1
IF cont = LEN(html$) THEN END
IF MID$(UCASE$(html$), cont, 6) = "<HEAD>" THEN CALL HEADRead
IF MID$(UCASE$(html$), cont, 5) = "<BODY" THEN CALL BODYread
IF MID$(UCASE$(html$), cont, 7) = "</HTML>" THEN END
LOOP
END SUB

SUB OpenHTML
OPEN htm$ FOR INPUT AS #1
DO
IF EOF(1) = -1 THEN EXIT DO
LINE INPUT #1, a$
html$ = html$ + a$
IF LEN(html$) = 32767 THEN PRINT " File To Big! ": END
LOOP
CLOSE #1
END SUB

SUB PARAread
DO
cont = cont + 1
IF cont = LEN(html$) THEN END
IF MID$(UCASE$(html$), cont, 4) = "</P>" THEN EXIT DO
phs$ = phs$ + MID$(html$, cont, 1)
LOOP
PRINT
PRINT phs$
CALL BODYread
END SUB

SUB TAGCheck
cont = 0
DO
cont = cont + 1
IF cont = LEN(html$) THEN PRINT " No <html> tag located ": END
IF MID$(UCASE$(html$), cont, 6) = "<HTML>" THEN EXIT DO
LOOP
END SUB

SUB TTLPRT
DO
cont = cont + 1
IF cont = LEN(html$) THEN END
IF MID$(UCASE$(html$), cont, 8) = "</TITLE>" THEN EXIT DO
ttl$ = ttl$ + MID$(html$, cont, 1)
LOOP
PRINT ttl$
PRINT
CALL HTMLread
END SUB

Text Document. or HTM Document.

Code:
<HTML>
<HEAD>
<TITLE>QB HTML READER!</TITLE>
</HEAD>
<BODY>
<P>This is a HTML paragraph being read by QBasic, Yay!!</P>
<P>This is the second paragraph in QB!!</P>
<P>What the heck lets go for THREE!!</P>
<P>Stressing the string$ size it holds up 32767 chars
for reading most HTML code as it is doing now!
Enjoy this program. Its great for those who would
like to learn how to use MID$() to scan files!</P>
</BODY>
</HTML>

Well there it is,.. Like I said befor, anyone interested in using the MID$() to read and sort information, this is a good example I think... And also any who like the idea can improve on it... Smile

This is still limited to <html>, <head>, <body>, <title>, <p>, and their reverse tags.. but I moved the chars size up.. :wink: and if you havn't noticed, this reads more than one <P>,.. Its also case insesitive, so you can use <HTML>, <HtMl>, or <html> and it still work. Big Grin
As a next step, you should acknowledge encoded characters, since this is easy to do. For example, the < and > characters are &gt; and &lt;. If you view-source this page, you'll see them encoded, and the &gt; will look like &amp;gt; to prevent it from being rendered.

To be 'future-proof', a markup parser is to ignore any elements it doesn't understand and render the inside content as if it's belonging to the parent element.

You could just make an XML parser that refuses documents that aren't well-formed. That would be nifty. And for all the valid XHTML/CSS sites out there (like mine), you'd be able to parse their markup as if it's XML.
I wrote an HTML render engine in QB a few years back and coupled it with my Twisted Sock library and came up with a functional QB-based web browser called "SockWeb". It made me realize just how much work goes into interpretting HTML. If you can keep going with this, it would be very very interesting. Big Grin
Quote:
Code:
IF LEN(html$) = 199 THEN PRINT " File To Big! ": END

199 chars? why? :o

Pretty neat code, but you don't need to use CALL. Just leave it out:

Code:
CALL foo(a, b)

is the same as:

Code:
foo a, b

Well na_th_an, it is a matter of taste, but I prefer CALL. It makes it easy to identify SUBs as opposed to commands and is, IMHO, good documentation at a cost of a few characters.

CALL foo(a, b) - 14 characters
foo a, b - 8 characters

Mac
But then you lose the feel of it being a command.

i mean, you dont say

CALL print ("hello")
It's close to an XML parser too. The trick there is deciding what sort of representation to use to hold the parsed tree in memory to work with it.
Pages: 1 2