P'' Lexical Analysis and Transpiler

Notice: this program is originally intended as an adaptation of the BF language. This is not a full implamentation of BF, though it IS a full implamentation of P''. Both P'' and BF have unruly nomenclature, so I'm going to refer to my version of the language as BF.

Supported Commands

The Process

Although tokenization of BF is quite simple, after writing the first iteration of this program, I quickly realized that the program code needed to be optimized. Take this code for example:

++++++++++[>+<-]

This program adds 10 to the current mmemory location, then shifts the 10 from location zero to location one. In Dartmouth BASIC the direct translation would be something like this:

5 DIM a(256)
7 i = 0
10 a(i)=a(i) + 1
20 a(i)=a(i) + 1
30 a(i)=a(i) + 1
40 a(i)=a(i) + 1
50 a(i)=a(i) + 1
60 a(i)=a(i) + 1
70 a(i)=a(i) + 1
80 a(i)=a(i) + 1
90 a(i)=a(i) + 1
100 a(i)=a(i) + 1
110 rem open brac 1
120 i=i+1
130 a(i)=a(i) + 1
140 i=i-1
150 a(i)=a(i) - 1
160 IF a(i)<>0 THEN GOTO 110

As we can see, this program is quite verbose. To solve this problem, I split up the process into three sections. Tokenization, Culling, and Formatting.

Tokenization

This section takes each BF command, and turns it into an object class I call "opcode" (not to be confused with the machine language analog). The opcode class contains a unique identifier, the code itself, a value associated with the code, and a boolean skip field.

ID helps the computer keep track of each unique command

Code is the code itself. Code strings in use are ADD, SUB, TO, FRO, OUT, IN, LAB, and JMP. They represent +-><.,[] respectively (TO meaning go forward, FRO meaning step back, LAB meaning label, and JMP meaning jump back). Note that though IN is tokenized, It's not fully implamented. I'm trying to figure a cross platform method of handling input.

Val holds a value associated with the current code. This will be very useful in the next section. Skip I will describe in the next section.

Here's what the tokenized program looks like in memory:

ID CODE VAL SKIP
4875247597984 ADD 1 FALSE
3 ADD 1 FALSE
8248249 ADD 1 FALSE
595730 ADD 1 FALSE
4837133924774 ADD 1 FALSE
475862223 ADD 1 FALSE
90985771 ADD 1 FALSE
33 ADD 1 FALSE
99478203 ADD 1 FALSE
111111 ADD 1 FALSE
8487475 LAB 1 FALSE
4728274 TO 1 FALSE
983334 ADD 1 FALSE
72877274 FRO 1 FALSE
12 SUB 1 FALSE
99 JMP 8487475 FALSE

Culling

After the BF program is tokenized, we need to bundle the duplicates. We loop through the program as it appears in the table above from top to bottom, and if we encounter the same opcode twice, we add the two VAL columns together and place the result into the second opcode row, then mark the first opcode row's SKIP column as true.

After all the duplicates have been marked, the program looks like this:

ID CODE VAL SKIP
4875247597984 ADD 1 TRUE
3 ADD 2 TRUE
8248249 ADD 3 TRUE
595730 ADD 4 TRUE
4837133924774 ADD 5 TRUE
475862223 ADD 6 TRUE
90985771 ADD 7 TRUE
33 ADD 8 TRUE
99478203 ADD 9 TRUE
111111 ADD 10 FALSE
8487475 LAB 1 FALSE
4728274 TO 1 FALSE
983334 ADD 1 FALSE
72877274 FRO 1 FALSE
12 SUB 1 FALSE
99 JMP 8487475 FALSE

The culling process only works for ADD, SUB, TO, and FRO opcodes. The others remain as they are.

After the culling process, the program looks like this:

ID CODE VAL SKIP
111111 ADD 10 FALSE
8487475 LAB 1 FALSE
4728274 TO 1 FALSE
983334 ADD 1 FALSE
72877274 FRO 1 FALSE
12 SUB 1 FALSE
99 JMP 8487475 FALSE

The end goal is to have no duplicates and no values where SKIP == TRUE.

Formatting

After the program is culled, we now have to format the tokens into a readable format for the transpiler to read. We simply export this segment into a JSON file:

{"0":{"CODE":"ADD","VAL":"10","ID":"5612126083620"},"1":{"CODE":"LAB","VAL":"1","ID":"4349794339981"},"2":{"CODE":"TO","VAL":"1","ID":"6460291803755"},"3":{"CODE":"ADD","VAL":"1","ID":"6319243514633"},"4":{"CODE":"FRO","VAL":"1","ID":"8037930419534"},"5":{"CODE":"SUB","VAL":"1","ID":"4791340815520"},"6":{"CODE":"JMP","VAL":"4349794339981","ID":"154116386890"}}

The tokenization is complete. Now we need to transpile the code.

Transpiler

I've written a few different scripts to translate tokenized BF into a few programming languages languages.

Supported Languages:

Code will be pushed to github shortly.