How to use JavaScript compiler

Tan Guangzhi 2020-11-10 15:10:47
use javascript compiler


A compiler is a program , Role is Translate one language into another .

for example babel It's a compiler , It will es6 Version of js Translate into es5 Version of js. From this point of view , Software translated into English will also be translated into English .

General procedure ,CPU It can't be executed directly , because CPU Only machine instructions can be recognized . So if you want to execute a program , The first step is to translate the program written in high-level language into assembly code (Java One more step , Translate high-level language into bytecode ), Then translate the assembly code into machine instructions , such CPU To identify and execute .

Because assembly language and machine language correspond one to one , And assembly language is more readable . Therefore, the textbook of computer principle will use assembly language instead of machine language to explain machine instructions .

This paper will write four operations compiler need to be 1 + 1 These four expressions are translated into machine instructions and executed . Please see the example for the specific process :

// CPU Can't recognize
10 + 5
// Translated into assembly language
push 10
push 5
add
// Finally translated into machine instructions , Assembly code corresponds to machine instructions one by one
// Machine instructions are given by 1 and 0 form , The following instructions are not real instructions , Just for demonstration
0011101001010101
1101010011100101
0010100111100001

Four compiler operations , Although the function is very simple , Only four Operational expressions can be compiled . But the front-end part of the principle of compilation almost involves : Lexical analysis 、 Syntax analysis . In addition, there is the code generation of the back-end part of the compilation principle . Whether it's simple 、 Complex compilers , The compilation steps are similar , It's just that complicated compiler implementations are more difficult .

Someone might ask , What's the advantage of learning to compile

I think that mastering the internal principles of the compilation process will make you a better advanced programmer . In addition, I quote Zhihu net friend - What you want Answer , More specific :

  1. It is easier to understand which writing methods are equivalent in a language , What are the differences
  2. We can compare the differences between different languages more objectively
  3. It is not easy to be fooled by the propagandist of a particular language
  4. Learning a new language is more efficient
  5. In fact, from the language a Switch to language b It's a general requirement , Learning the principles of compilation will make you more comfortable when dealing with such requirements

Okay , Let's take a look at how to write a quad arithmetic compiler .

Lexical analysis

A program is actually a series of characters stored in a text file , The function of lexical analysis is to decompose a series of characters into characters according to some rules (token, Also called terminator ), Ignore spaces and comments .

Example :

// Program code
10 + 5 + 6
// The result of lexical analysis is token
10
+
5
+
6

Terminator

Terminator is the basic element used in language , It can't be broken down anymore .

The terminators in the four operations include symbols and integer constants ( Unary operators and floating point operations are not supported ).

  1. Symbol + - * / ( )
  2. Integer constant :12、1000、111...

Lexical analysis code implementation

function lexicalAnalysis(expression) {
const symbol = ['(', ')', '+', '-', '*', '/']
const re = /\d/
const tokens = []
const chars = expression.trim().split('')
let token = ''
chars.forEach(c => {
if (re.test(c)) {
token += c
} else if (c == ' ' && token) {
tokens.push(token)
token = ''
} else if (symbol.includes(c)) {
if (token) {
tokens.push(token)
token = ''
}
tokens.push(c)
}
})
if (token) {
tokens.push(token)
}
return tokens
}
console.log(lexicalAnalysis('100 + 23 + 34 * 10 / 2'))
// ["100", "+", "23", "+", "34", "*", "10", "/", "2"]

The grammatical rules of four operations ( Grammar rules are hierarchical )

  1. x*, Express x Zero or more times
  2. x | y, Express x or y Will appear
  3. ( ) parentheses , Used to group words in language

The following rules look from left to right , The expression on the left can be further subdivided into the expression on the right , It's subdivided until it can't be subdivided again .

  • expression: addExpression
  • addExpression: mulExpression (op mulExpression)*
  • mulExpression: term (op term)*
  • term: '(' expression ')' | integerConstant
  • op: + - * /

addExpression Corresponding + - expression ,mulExpression Corresponding * / expression .

If you don't understand the above rules , Then put it down first , Keep looking down . See how to implement syntax analysis in code .

Syntax analysis

A process in which input text is analyzed according to grammatical rules and its grammatical structure is determined , It's called parsing .

The output of general parsing is abstract syntax tree (AST) Or parse tree (parse tree). But because the four operations are relatively simple , So the solution here is to do code generation and error reporting in real time , In this way, you don't need to save the whole program structure in memory .

Let's take a look at how to analyze a four Operational expression 1 + 2 * 3.

The first match is expression, Due to the present expression There's only one way to go down , namely addExpression, So it's broken down into addExpression.
By analogy , The next order is mulExpressionterm1(integerConstant)、+(op)、mulExpressionterm2(integerConstant)、*(op)、mulExpressionterm3(integerConstant).

As shown in the figure below :

There may be questions here , Why is an expression so complicated ,expression There is addExpression,addExpression And then there is mulExpression.
In fact, this is to consider the priority of operators ,mulExpr Than addExpr Expression operation level should be high .

1 + 2 * 3
compileExpression
| compileAddExpr
| | compileMultExpr
| | | compileTerm
| | | |_ matches integerConstant push 1
| | |_
| | matches '+'
| | compileMultExpr
| | | compileTerm
| | | |_ matches integerConstant push 2
| | | matches '*'
| | | compileTerm
| | | |_ matches integerConstant push 3
| | |_ compileOp('*') *
| |_ compileOp('+') +
|_

There are many algorithms for building parsing trees , There are only two algorithms .

Recursive descent analysis

Recursive descent analysis , Also known as top-down analysis . Analyze step by step recursively according to the rules of grammar token flow , If you encounter a non Terminator , Then go on to analyze , Until the Terminator .

LL(0) analysis

Recursive descent analysis is a simple and efficient algorithm ,LL(0) On this basis, there is one more step , When the first one token When it's not enough to determine the element type , Take... For the next character “ Check in advance ”, It's possible to solve this uncertainty .

The above is a brief introduction to these two algorithms , See the following code implementation for specific implementation .

Expression code generation

The four Operational expressions we usually use are infix expressions , But for computers, infix expressions are not easy to calculate . So in the code generation phase , To convert infix expression to suffix expression .

Postfix expression

Postfix expression , Also known as reverse Polish , Does not contain brackets , The operator is placed after two operands , All calculations are in the order in which the operators appear , Strictly from left to right ( The precedence rules for operators are no longer considered ).

Example :

Infix expression : 5 + 5 Convert to suffix expression :5 5 +, Then generate code according to the suffix expression .

// 5 + 5 Convert to 5 5 + Generate code again
push 5
push 5
add

Code implementation

The theoretical knowledge of compiler principles is like the book of heaven , It's often seen in the clouds , But really do it , You'll find that , Actually, it's quite simple .

If the above theoretical knowledge is not very clear , No problem , First look at the code implementation , And then combine it with theoretical knowledge to see .

Be careful : Here we need to introduce the lexical analysis code .

// Assembly code generator
function AssemblyWriter() {
this.output = ''
}
AssemblyWriter.prototype = {
writePush(digit) {
this.output += `push ${digit}\r\n`
},
writeOP(op) {
this.output += op + '\r\n'
},
// Output assembly code
outputStr() {
return this.output
}
}
// parsers
function Parser(tokens, writer) {
this.writer = writer
this.tokens = tokens
// tokens Array index
this.i = -1
this.opMap1 = {
'+': 'add',
'-': 'sub',
}
this.opMap2 = {
'/': 'div',
'*': 'mul'
}
this.init()
}
Parser.prototype = {
init() {
this.compileExpression()
},
compileExpression() {
this.compileAddExpr()
},
compileAddExpr() {
this.compileMultExpr()
while (true) {
this.getNextToken()
if (this.opMap1[this.token]) {
let op = this.opMap1[this.token]
this.compileMultExpr()
this.writer.writeOP(op)
} else {
// There is no corresponding operator on the match There's no match here + -
// take token Index back one bit
this.i--
break
}
}
},
compileMultExpr() {
this.compileTerm()
while (true) {
this.getNextToken()
if (this.opMap2[this.token]) {
let op = this.opMap2[this.token]
this.compileTerm()
this.writer.writeOP(op)
} else {
// There is no corresponding operator on the match There's no match here * /
// take token Index back one bit
this.i--
break
}
}
},
compileTerm() {
this.getNextToken()
if (this.token == '(') {
this.compileExpression()
this.getNextToken()
if (this.token != ')') {
throw ' Missing right bracket :)'
}
} else if (/^\d+$/.test(this.token)) {
this.writer.writePush(this.token)
} else {
throw ' FALSE token: The first ' + (this.i + 1) + ' individual token (' + this.token + ')'
}
},
getNextToken() {
this.token = this.tokens[++this.i]
},
getInstructions() {
return this.writer.outputStr()
}
}
const tokens = lexicalAnalysis('100+10*10')
const writer = new AssemblyWriter()
const parser = new Parser(tokens, writer)
const instructions = parser.getInstructions()
console.log(instructions) // Output generated assembly code
/*
push 100
push 10
push 10
mul
add
*/

Simulation execution

Now let's simulate CPU The execution of machine instructions , Because assembly code corresponds to machine instructions one by one , So we can create a simulator that directly executes assembly code .
Before creating the emulator , Let's first explain the operation of the relevant instructions .

Stack

In memory , The characteristic of stack is that it can only insert and delete at the same end , That is, only push and pop Two kinds of operations .

push

push The purpose of the instruction is to push an operand onto the stack .

pop

pop The function of the instruction is to pop an operand out of the stack .

add

add The function of an instruction is to execute it twice pop operation , Pop up two operands a and b, And then execute a + b, Then the result push Go to the stack .

sub

sub The function of an instruction is to execute it twice pop operation , Pop up two operands a and b, And then execute a - b, Then the result push Go to the stack .

mul

mul The function of an instruction is to execute it twice pop operation , Pop up two operands a and b, And then execute a * b, Then the result push Go to the stack .

div

sub The function of an instruction is to execute it twice pop operation , Pop up two operands a and b, And then execute a / b, Then the result push Go to the stack .

All the instructions of the four operations have been explained , Do you think it's very simple ?

Code implementation

Be careful : Need to introduce lexical analysis and syntax analysis code

function CpuEmulator(instructions) {
this.ins = instructions.split('\r\n')
this.memory = []
this.re = /^(push)\s\w+/
this.execute()
}
CpuEmulator.prototype = {
execute() {
this.ins.forEach(i => {
switch (i) {
case 'add':
this.add()
break
case 'sub':
this.sub()
break
case 'mul':
this.mul()
break
case 'div':
this.div()
break
default:
if (this.re.test(i)) {
this.push(i.split(' ')[1])
}
}
})
},
add() {
const b = this.pop()
const a = this.pop()
this.memory.push(a + b)
},
sub() {
const b = this.pop()
const a = this.pop()
this.memory.push(a - b)
},
mul() {
const b = this.pop()
const a = this.pop()
this.memory.push(a * b)
},
div() {
const b = this.pop()
const a = this.pop()
// Floating point operations are not supported , So round it up here
this.memory.push(Math.floor(a / b))
},
push(x) {
this.memory.push(parseInt(x))
},
pop() {
return this.memory.pop()
},
getResult() {
return this.memory[0]
}
}
const tokens = lexicalAnalysis('(100+ 10)* 10-100/ 10 +8* (4+2)')
const writer = new AssemblyWriter()
const parser = new Parser(tokens, writer)
const instructions = parser.getInstructions()
const emulator = new CpuEmulator(instructions)
console.log(emulator.getResult()) // 1138

A simple four arithmetic compiler has been implemented . Let's write a test function and run it , Let's see if the results are what we expected :

function assert(expression, result) {
const tokens = lexicalAnalysis(expression)
const writer = new AssemblyWriter()
const parser = new Parser(tokens, writer)
const instructions = parser.getInstructions()
const emulator = new CpuEmulator(instructions)
return emulator.getResult() == result
}
console.log(assert('1 + 2 + 3', 6)) // true
console.log(assert('1 + 2 * 3', 7)) // true
console.log(assert('10 / 2 * 3', 15)) // true
console.log(assert('(10 + 10) / 2', 10)) // true

All tests are correct . Also attached Complete source code , It is suggested that the students who don't understand it should read it twice more .

Take it to the next level

For industrial compilers , The four arithmetic compilers belong to toys in toys . But you can't be a fat man at one bite , So it's better to learn compiler principles step by step . Let's introduce a more advanced compiler , This compiler can compile a Jack Language ( class Java Language ), Its grammar is like this :

class Generate {
field String str;
static String str1;
constructor Generate new(String s) {
let str = s;
return this;
}
method String getString() {
return str;
}
}
class Main {
function void main() {
var Generate str;
let str = Generate.new("this is a test");
do Output.printString(str.getString());
return;
}
}

The output of the above code is :this is a test.

Do you want to implement such a compiler ?

This compiler comes from a Book 《 Computer system elements 》, It starts from the 6 Chapter begins , All the way to 11 Chapter describes the assembly compiler ( Convert assembly language to machine language )、VM compiler ( Will be similar to bytecode VM Language is translated into assembly language )、Jack Language compiler ( High level language Jack Translate into VM Language ). Each chapter has detailed explanation and experiment of knowledge points , As long as you do the experiment step by step , We can finally implement such a compiler .

If the compiler is finished , Finally, where does machine language execute ?

This book has been considered for you , It starts from the 1 The first chapter 5 Chapter , There are five chapters . Teach you to start with logic gates , Step by step, the arithmetic logic unit is formed ALU、CPU、 Memory , And finally build a modern computer . And then let the program that you compile with a compiler run on this computer .

in addition , The first of this book 12 Chapter will teach you how to write various library functions of the operating system , for example Math library ( It involves all kinds of mathematical operations )、Keyboard library ( How to press the keyboard is output to the screen )、 Memory management and so on .

I want to have a look at the whole book 12 What happened after Zhang's experiment ? I'm here to provide a few pictures of this analog computer running program DEMO GIF, For your reference .

The top right corner of these pictures is “ Computer ” The screen of , The other parts are “ Computer ” Stack area and instruction area of .

I have done all the experiments in this book ( Spend every day 3 Hours , It can be done in two months ), The answer is in my github On , If you are interested, you can have a look at .

Reference material

版权声明
本文为[Tan Guangzhi]所创,转载请带上原文链接,感谢

  1. [front end -- JavaScript] knowledge point (IV) -- memory leakage in the project (I)
  2. This mechanism in JS
  3. Vue 3.0 source code learning 1 --- rendering process of components
  4. Learning the realization of canvas and simple drawing
  5. gin里获取http请求过来的参数
  6. vue3的新特性
  7. Get the parameters from HTTP request in gin
  8. New features of vue3
  9. vue-cli 引入腾讯地图(最新 api,rocketmq原理面试
  10. Vue 学习笔记(3,免费Java高级工程师学习资源
  11. Vue 学习笔记(2,Java编程视频教程
  12. Vue cli introduces Tencent maps (the latest API, rocketmq)
  13. Vue learning notes (3, free Java senior engineer learning resources)
  14. Vue learning notes (2, Java programming video tutorial)
  15. 【Vue】—props属性
  16. 【Vue】—创建组件
  17. [Vue] - props attribute
  18. [Vue] - create component
  19. 浅谈vue响应式原理及发布订阅模式和观察者模式
  20. On Vue responsive principle, publish subscribe mode and observer mode
  21. 浅谈vue响应式原理及发布订阅模式和观察者模式
  22. On Vue responsive principle, publish subscribe mode and observer mode
  23. Xiaobai can understand it. It only takes 4 steps to solve the problem of Vue keep alive cache component
  24. Publish, subscribe and observer of design patterns
  25. Summary of common content added in ES6 + (II)
  26. No.8 Vue element admin learning (III) vuex learning and login method analysis
  27. Write a mini webpack project construction tool
  28. Shopping cart (front-end static page preparation)
  29. Introduction to the fluent platform
  30. Webpack5 cache
  31. The difference between drop-down box select option and datalist
  32. CSS review (III)
  33. Node.js学习笔记【七】
  34. Node.js learning notes [VII]
  35. Vue Router根据后台数据加载不同的组件(思考->实现->不止于实现)
  36. Vue router loads different components according to background data (thinking - & gt; Implementation - & gt; (more than implementation)
  37. 【JQuery框架,Java编程教程视频下载
  38. [jQuery framework, Java programming tutorial video download
  39. Vue Router根据后台数据加载不同的组件(思考->实现->不止于实现)
  40. Vue router loads different components according to background data (thinking - & gt; Implementation - & gt; (more than implementation)
  41. 【Vue,阿里P8大佬亲自教你
  42. 【Vue基础知识总结 5,字节跳动算法工程师面试经验
  43. [Vue, Ali P8 teaches you personally
  44. [Vue basic knowledge summary 5. Interview experience of byte beating Algorithm Engineer
  45. 【问题记录】- 谷歌浏览器 Html生成PDF
  46. [problem record] - PDF generated by Google browser HTML
  47. 【问题记录】- 谷歌浏览器 Html生成PDF
  48. [problem record] - PDF generated by Google browser HTML
  49. 【JavaScript】查漏补缺 —数组中reduce()方法
  50. [JavaScript] leak checking and defect filling - reduce() method in array
  51. 【重识 HTML (3),350道Java面试真题分享
  52. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  53. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  54. [re recognize HTML (3) and share 350 real Java interview questions
  55. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  56. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  57. 【重识 HTML ,nginx面试题阿里
  58. 【重识 HTML (4),ELK原来这么简单
  59. [re recognize HTML, nginx interview questions]
  60. [re recognize HTML (4). Elk is so simple