Notice: this program is originally intended as an adaptation of the BF language. This is not a full implamentation of BF, though it IS a full implamentation of P''. Both P'' and BF have unruly nomenclature, so I'm going to refer to my version of the language as BF.
+
incrament current memory value-
decrament current memory value>
move to next memory location<
move to previous memory location.
output the current memory value as a character[
label to jump to]
if current memory location is not zero, jump to the last label. note, labels can be nested. read more about it hereAlthough tokenization of BF is quite simple, after writing the first iteration of this program, I quickly realized that the program code needed to be optimized. Take this code for example:
++++++++++[>+<-]
This program adds 10 to the current mmemory location, then shifts the 10 from location zero to location one. In Dartmouth BASIC the direct translation would be something like this:
5 DIM a(256)
7 i = 0
10 a(i)=a(i) + 1
20 a(i)=a(i) + 1
30 a(i)=a(i) + 1
40 a(i)=a(i) + 1
50 a(i)=a(i) + 1
60 a(i)=a(i) + 1
70 a(i)=a(i) + 1
80 a(i)=a(i) + 1
90 a(i)=a(i) + 1
100 a(i)=a(i) + 1
110 rem open brac 1
120 i=i+1
130 a(i)=a(i) + 1
140 i=i-1
150 a(i)=a(i) - 1
160 IF a(i)<>0 THEN GOTO 110
As we can see, this program is quite verbose. To solve this problem, I split up the process into three sections. Tokenization, Culling, and Formatting.
This section takes each BF command, and turns it into an object class I call "opcode" (not to be confused with the machine language analog). The opcode class contains a unique identifier, the code itself, a value associated with the code, and a boolean skip field.
ID helps the computer keep track of each unique command
Code is the code itself. Code strings in use are ADD, SUB, TO, FRO, OUT, IN, LAB, and JMP
. They represent +-><.,[]
respectively (TO meaning go forward, FRO meaning step back, LAB meaning label, and JMP meaning jump back). Note that though IN is tokenized, It's not fully implamented. I'm trying to figure a cross platform method of handling input.
Val holds a value associated with the current code. This will be very useful in the next section. Skip I will describe in the next section.
Here's what the tokenized program looks like in memory:
ID | CODE | VAL | SKIP |
---|---|---|---|
4875247597984 | ADD | 1 | FALSE |
3 | ADD | 1 | FALSE |
8248249 | ADD | 1 | FALSE |
595730 | ADD | 1 | FALSE |
4837133924774 | ADD | 1 | FALSE |
475862223 | ADD | 1 | FALSE |
90985771 | ADD | 1 | FALSE |
33 | ADD | 1 | FALSE |
99478203 | ADD | 1 | FALSE |
111111 | ADD | 1 | FALSE |
8487475 | LAB | 1 | FALSE |
4728274 | TO | 1 | FALSE |
983334 | ADD | 1 | FALSE |
72877274 | FRO | 1 | FALSE |
12 | SUB | 1 | FALSE |
99 | JMP | 8487475 | FALSE |
After the BF program is tokenized, we need to bundle the duplicates. We loop through the program as it appears in the table above from top to bottom, and if we encounter the same opcode twice, we add the two VAL columns together and place the result into the second opcode row, then mark the first opcode row's SKIP column as true.
After all the duplicates have been marked, the program looks like this:
ID | CODE | VAL | SKIP |
---|---|---|---|
4875247597984 | ADD | 1 | TRUE |
3 | ADD | 2 | TRUE |
8248249 | ADD | 3 | TRUE |
595730 | ADD | 4 | TRUE |
4837133924774 | ADD | 5 | TRUE |
475862223 | ADD | 6 | TRUE |
90985771 | ADD | 7 | TRUE |
33 | ADD | 8 | TRUE |
99478203 | ADD | 9 | TRUE |
111111 | ADD | 10 | FALSE |
8487475 | LAB | 1 | FALSE |
4728274 | TO | 1 | FALSE |
983334 | ADD | 1 | FALSE |
72877274 | FRO | 1 | FALSE |
12 | SUB | 1 | FALSE |
99 | JMP | 8487475 | FALSE |
The culling process only works for ADD, SUB, TO, and FRO opcodes. The others remain as they are.
After the culling process, the program looks like this:
ID | CODE | VAL | SKIP |
---|---|---|---|
111111 | ADD | 10 | FALSE |
8487475 | LAB | 1 | FALSE |
4728274 | TO | 1 | FALSE |
983334 | ADD | 1 | FALSE |
72877274 | FRO | 1 | FALSE |
12 | SUB | 1 | FALSE |
99 | JMP | 8487475 | FALSE |
The end goal is to have no duplicates and no values where SKIP == TRUE.
After the program is culled, we now have to format the tokens into a readable format for the transpiler to read. We simply export this segment into a JSON file:
{"0":{"CODE":"ADD","VAL":"10","ID":"5612126083620"},"1":{"CODE":"LAB","VAL":"1","ID":"4349794339981"},"2":{"CODE":"TO","VAL":"1","ID":"6460291803755"},"3":{"CODE":"ADD","VAL":"1","ID":"6319243514633"},"4":{"CODE":"FRO","VAL":"1","ID":"8037930419534"},"5":{"CODE":"SUB","VAL":"1","ID":"4791340815520"},"6":{"CODE":"JMP","VAL":"4349794339981","ID":"154116386890"}}
The tokenization is complete. Now we need to transpile the code.
I've written a few different scripts to translate tokenized BF into a few programming languages languages.
Supported Languages:
Code will be pushed to github shortly.