Kai Jasey's Homepage

P'' Lexical Analysis and Transpiler

Notice: this program is originally intended as an adaptation of the BF language. This is not a full implamentation of BF, though it IS a full implamentation of P''. Both P'' and BF have unruly nomenclature, so I'm going to refer to my version of the language as BF.

Supported Commands

+ incrament current memory value
- decrament current memory value
> move to next memory location
< move to previous memory location
. output the current memory value as a character
[ label to jump to
] if current memory location is not zero, jump to the last label. note, labels can be nested. read more about it here

The Process

Although tokenization of BF is quite simple, after writing the first iteration of this program, I quickly realized that the program code needed to be optimized. Take this code for example:

++++++++++[>+<-]

This program adds 10 to the current mmemory location, then shifts the 10 from location zero to location one. In Dartmouth BASIC the direct translation would be something like this:

5 DIM a(256) 7 i = 0 10 a(i)=a(i) + 1 20 a(i)=a(i) + 1 30 a(i)=a(i) + 1 40 a(i)=a(i) + 1 50 a(i)=a(i) + 1 60 a(i)=a(i) + 1 70 a(i)=a(i) + 1 80 a(i)=a(i) + 1 90 a(i)=a(i) + 1 100 a(i)=a(i) + 1 110 rem open brac 1 120 i=i+1 130 a(i)=a(i) + 1 140 i=i-1 150 a(i)=a(i) - 1 160 IF a(i)<>0 THEN GOTO 110

As we can see, this program is quite verbose. To solve this problem, I split up the process into three sections. Tokenization, Culling, and Formatting.

Tokenization

This section takes each BF command, and turns it into an object class I call "opcode" (not to be confused with the machine language analog). The opcode class contains a unique identifier, the code itself, a value associated with the code, and a boolean skip field.

ID helps the computer keep track of each unique command

Code is the code itself. Code strings in use are ADD, SUB, TO, FRO, OUT, IN, LAB, and JMP. They represent +-><.,[] respectively (TO meaning go forward, FRO meaning step back, LAB meaning label, and JMP meaning jump back). Note that though IN is tokenized, It's not fully implamented. I'm trying to figure a cross platform method of handling input.

Val holds a value associated with the current code. This will be very useful in the next section. Skip I will describe in the next section.

Here's what the tokenized program looks like in memory:

ID	CODE	VAL	SKIP
4875247597984	ADD	1	FALSE
3	ADD	1	FALSE
8248249	ADD	1	FALSE
595730	ADD	1	FALSE
4837133924774	ADD	1	FALSE
475862223	ADD	1	FALSE
90985771	ADD	1	FALSE
33	ADD	1	FALSE
99478203	ADD	1	FALSE
111111	ADD	1	FALSE
8487475	LAB	1	FALSE
4728274	TO	1	FALSE
983334	ADD	1	FALSE
72877274	FRO	1	FALSE
12	SUB	1	FALSE
99	JMP	8487475	FALSE

Culling

After the BF program is tokenized, we need to bundle the duplicates. We loop through the program as it appears in the table above from top to bottom, and if we encounter the same opcode twice, we add the two VAL columns together and place the result into the second opcode row, then mark the first opcode row's SKIP column as true.

After all the duplicates have been marked, the program looks like this:

ID	CODE	VAL	SKIP
4875247597984	ADD	1	TRUE
3	ADD	2	TRUE
8248249	ADD	3	TRUE
595730	ADD	4	TRUE
4837133924774	ADD	5	TRUE
475862223	ADD	6	TRUE
90985771	ADD	7	TRUE
33	ADD	8	TRUE
99478203	ADD	9	TRUE
111111	ADD	10	FALSE
8487475	LAB	1	FALSE
4728274	TO	1	FALSE
983334	ADD	1	FALSE
72877274	FRO	1	FALSE
12	SUB	1	FALSE
99	JMP	8487475	FALSE

The culling process only works for ADD, SUB, TO, and FRO opcodes. The others remain as they are.

After the culling process, the program looks like this:

ID	CODE	VAL	SKIP
111111	ADD	10	FALSE
8487475	LAB	1	FALSE
4728274	TO	1	FALSE
983334	ADD	1	FALSE
72877274	FRO	1	FALSE
12	SUB	1	FALSE
99	JMP	8487475	FALSE

The end goal is to have no duplicates and no values where SKIP == TRUE.

Formatting

After the program is culled, we now have to format the tokens into a readable format for the transpiler to read. We simply export this segment into a JSON file:

{"0":{"CODE":"ADD","VAL":"10","ID":"5612126083620"},"1":{"CODE":"LAB","VAL":"1","ID":"4349794339981"},"2":{"CODE":"TO","VAL":"1","ID":"6460291803755"},"3":{"CODE":"ADD","VAL":"1","ID":"6319243514633"},"4":{"CODE":"FRO","VAL":"1","ID":"8037930419534"},"5":{"CODE":"SUB","VAL":"1","ID":"4791340815520"},"6":{"CODE":"JMP","VAL":"4349794339981","ID":"154116386890"}}

The tokenization is complete. Now we need to transpile the code.

Transpiler

I've written a few different scripts to translate tokenized BF into a few programming languages languages.

Supported Languages:

Dartmouth BASIC
Python 3
C
Javascript

Code will be pushed to github shortly.