How JavaScript calculates 1 + 1 - Part 1 creates a source string

QCY 2021-04-07 21:42:27
javascript calculates creates source string


I'm a compiler enthusiast , Always learning V8 JavaScript How the engine works . Of course , The best way to learn something is to write it out , So that's why I'm here to share my experience . I hope it will also interest others .

translator's note : The translation has been authorized by the author . Because of my understanding of some nouns 、c++ I don't know much about grammar , So combined with their own understanding and context to do some 「 translator's note 」, There can be a trade-off reference

image.png without doubt 1 + 1 = 2, however V8 JavaScript How does the engine work out ?

Digression , One of my favorite interview questions is :「 From input URL What happened to the page load ?」 _ This is a good question , Because it can show the depth and breadth of a person's relevant knowledge , From the process of answering this question , Find out which parts he's most interested in

This is the first in a series of blog posts , Will explore V8 stay 1 + 1 Everything after being input . First , We will focus on V8 How to store in its heap memory 1 + 1 character string . It sounds simple , But it's worth the whole post !

One 、 Client applications (The Client Applicant)

To calculate 1 + 1, The first thing you can do is start NodeJS, Or open Chrome Developer console , And then simply type in 1 + 1. But to show V8 The internal structure of , I decided to revise, This is a V8 A standard sample application in the source code

I print the original "Hello World" Code for , use 1 + 1 Instead of

// Create a containing JavaScript String of source code 
Local<String> source = String::NewFromUtf8Literal(isolate, "1 + 1");
// Compiled source code 
Local<Script> script =
Script::Compile(context, source).ToLocalChecked();
// Run the script to get the results 
Local<Value> result = script->Run(context).ToLocalChecked();
// Convert the result to Number And print it out 
Local<Number> number = Local<Number>::Cast(result);
printf("%f\n", number->Value());
 Copy code 

translator's note : For the convenience of not understanding C++ Students understand the meaning of the code , Provide a description of some variables and a TS The expression of form ( Just to help understand the code ! It's not real logic !)

  • isolate( Isolation )- stay V8 In a isolate yes V8 An example of . stay blink in isolate And threads are 1 : 1 The relationship between . The main thread is associated with a isolate Related to , A worker thread is associated with an isolation
  • context( Context )- context yes V8 The concept of global variable range in . To put it simply , One Window Object corresponds to a context. for example and parent frame There are different Window object , So different frame With different context
  • Literal ( Literal ) - value Representative value ,literals Represents how to express a value . such as 15 It's a value , This value is unique , But there are many ways to express it : For example, Arabic numbers 15, In Chinese 15、 ... and , In English fifteen, use 16 Base number 0xF.15 yes value, The rest is literal
  • Double colon :: It can be regarded as js Inside .,String::NewFromUtf8Literal Namely String.``NewFromUtf8Literal
  • Arrow function -> It can be regarded as js Inside .,script->Run(context) Namely script.Run(context)
// Import class String、Script, Import type collection Local
import { String, Script, Number, Local } from 'v8'
const source: Local["String"] = String.NewFromUtf8Literal(isolate, "1 + 1");
const script: Local["Script"] = Script.Compile(context, source).ToLocalChecked();
const result: Local["Value"] = script.Run(context, source).ToLocalChecked();
const number: Local["Number"] = Number.Cast(result);
 Copy code 

Read this code quickly and get a general idea of . these C++ The code looks hard to understand , But it should help you . In this post , We focus on the first sentence of code , That is to say V8 Assign a new one to the heap 1 + 1 character string

Local<String> source = String::NewFromUtf8Literal(isolate, "1 + 1");
 Copy code 

To understand this code , Let's start with a series of V8 Module start . In this picture , The execution process is from left to right , The return value is passed back from right to left , Insert into soruce variable


  • Applications - This represents. V8 The client of , In our case , It is Program . But usually , It is the whole Chrome browser 、NodeJS The runtime system or any other embedded V8 JavaScript The software of the engine

  • V8 external API - This is a Client oriented API, Provide right V8 Function access . Although it is used C++ Realized , but API It's about all kinds of JavaScript Concept , Like numbers 、 character string 、 Array 、 Functions and objects , Allow them to be created and manipulated in a variety of ways

  • The reactor plant - V8 Inside the engine ( Not through API expose ) It's a way to create various data objects on the heap 「 factory 」. It's amazing , The available set of factory methods is associated with the external API The methods offered are very different , So a lot of conversions are in API It's done inside the layer

  • New Space - V8 It's very complicated , But newly assigned objects are usually stored in New Space in , Usually called The new generation . We won't go into details here , however New Space It's using Cheney Algorithms to manage ,Cheney Algorithm is a famous algorithm for garbage collection

Now let's take a closer look at the process , The key is :

  • API How layers decide what type of string to create , And where it's stored in the heap
  • What is the internal memory layout of a string . It depends on the range of characters in the string
  • How to allocate space from the heap . In our case , need 20 Bytes
  • Last , How to return a pointer to a string to the application , For future garbage collection

Two 、 Determine how and where strings are stored

As mentioned above , On the client side Applications and The reactor plant ( Where the object is actually created ) There has to be a lot of conversion between . Most of the work is in src/api/ In the middle of

Let's start with the invocation of the client application :

String::NewFromUtf8Literal(isolate, "1 + 1");
 Copy code 

The first parameter is 「Isolate( Isolation )」, It is V8 The main internal data structure of , Represents the state of the runtime system , And other possible V8 Instance isolation . To understand that , Imagine opening multiple browser windows , Each window has a completely separate V8 Instance is running , Each instance has its own isolation heap . We won't talk more about isolate Parameters , Just need to know a lot API This parameter is required for all calls to

String::NewFromUtf8Literal Method ( see src/api/ First, do the basic string length check , It also decides how to store strings in memory . Considering that we only provide two parameters , Third type The parameter defaults to NewStringType::kNormal, Indicates that the string should be allocated on the heap as a regular object . Another way is to pass NewStringType::kInternalized, Indicates that the string needs to be de duplicated . This feature is useful to avoid storing multiple copies of the same constant string

The internal will then call NewString Method ( see [src/api/](, It calls factory->NewStringFromUtf8(string). Please note that , there string Has been mapped to an internal Vector In the data structure , Not an ordinary C++ character string , Because the reactor plant has a set of external API A completely different approach . When the return value is returned to the client application , The difference will become more obvious later

stay NewStringFromUtf8 Inside ( see src/heap/, The best format for storing strings . Of course ,UTF-8 It's a convenient format , Can store a wide range of Unicode character , But when using only basic ASCII Character time ( for example 1 + 1) V8 Will 「1 Bytes 」 Store strings in the format of . To make this decision , The character of the string is passed to Utf8Decoder decoder(utf8_data) in ( stay src/strings/unicode-decoder.h In a statement )

Now we have decided to allocate one 1 Byte string , Use ordinary ( Not internalized ) Method , The next step is to call NewRawOneByteString( see src/heap/, ad locum , Heap memory is allocated , The contents of the string are written to the memory

3、 ... and 、 String memory structure

stay V8 Inside , our 1 + 1 The string is represented as v8::Internal::SeqOneByteString An instance of a class ( see src/objects/string.h). If you're like most object-oriented developers , You would expect SeqOneByteString There are many public methods , And some private properties , Like a character array or an integer that stores the length of a string . However , This is not the case ! contrary , All internal Object class It's actually just a pointer to the address where the data is stored in the heap

translator's note : Object class - Defines a named collection of properties , And classify them into required attribute set and optional attribute set

from src/objects/objects.h As you can see from the code comments in , There are about 150 The parent of the inner class is v8::Internal::Object. All of these classes contain only one 8 Byte value ( stay 64 On the bit machine ), Points to the address of the object in the heap


The interesting part is :

SeqOneByteString object

As mentioned earlier , This is not a fully functional string class , It's a pointer to the actual content address of the string in the heap . stay 64 On the machine , This 「 The pointer 」 Will be a 8 Bytes of unsigned long ( Unsigned long shaping ), Its type alias is Address. Please note that , Data on heap ( On the right side of the graph ) Actually, it's not a real C++ object , So there's no need to put this Address As something that points to a strong type ( Such as String *) To handle

however , You may wonder why there is an indirect layer first , Instead of visiting directly Heap Block Well ? When you consider that garbage collection causes objects to move in the heap , You will know that this method makes sense . It is important to , Data can be moved , Without confusing the client application

translator's note :Heap Block - Memory block

It should be noted that , stay Generational Garbage Collection( Intergenerational garbage collection ) in , The first object is The new generation (New Space) The distribution of , If they live long enough , Will be moved to Old generation (Old Space) in . To achieve this , The garbage collector will Heap Block Copy to the new heap space , And then update Address Value points to the new memory address . Whereas SeqOneByteString The memory address of the object itself remains exactly the same as before , The client software won't notice this change .

Compressed Pointer To Map (Heap Block Of the 0-3 Bytes )( Point to Map Compressed pointer for )

JavaScript It's a dynamic type of language , It means _ Variable _ There is no type , However _ Values stored in variables _ But there are types .「map 」 yes V8 How to associate each object in the heap with its data type description . After all , If the object is not marked with its type ,Heap Block It becomes a string of meaningless bytes

Besides mentioning maps Also stored in _ Read only space _ Outside of a heap object in , We won't be right 1 + 1 A string of map More details . Maps( Also known as Shape or hidden class ) It can be very complicated , Although our constant string is called read_only_roots().one_byte_string_map()( see src/heap/ A predefined map

translator's note :heap object - Heap objects . It is an object that can be created or deleted at any time when the program is running , There are some free storage units in the virtual program space , These free storage units make up the so-called heap

Interestingly , Although the map A field is a pointer to another heap object , But it's a clever use of Pointer compression , In a 32 A... Is stored in the field of 64 Bit pointer value

Object Hash Value (Heap Block Of the 4-7 Bytes )( Object hash value )

Each object has an internal hash value , But in this case , It defaults to kEmptyHashField( The value is 3), Indicates that the hash value has not been calculated

String Length (Heap Block Of the 8-11 Bytes )( String length )

This is the number of bytes in the string (5)( Two 1, Two , One +

The Characters and the Padding (Heap Block Of the 12-19 Bytes )( Characters and fillers )

As you expected , What's stored next is 5 A single byte character . Besides , To ensure that future heap objects are based on CPU Alignment is required for the architecture of , There's an extra 3 Byte fill ( Align objects to 4 The boundary of bytes ).

Four 、 Allocate memory from the heap

We simply mention , The factory class allocates a block of memory from the heap ( In our case, it's 20 Bytes ), Then fill the block with the object's data . The remaining problem is that 20 Bytes are _ how _ The distribution of

stay Cheney In our garbage collection algorithm , The new generation (New Space) Divided into two half spaces . To allocate a memory block in the heap , The allocator determines... In the current half space Limit, And the current of the half space Top Is there enough bytes available between . If there is enough space , The algorithm returns the address of the next block , Then increment by the number of bytes requested Top The pointer

Here's the basic picture , Shows the front and back states of the current half space : image.png

If the current half space runs out of available memory (Top and Limit Too close to ), that Cheney The collection part of the algorithm starts . Once the collection is complete , be-all _ live _ The object will be copied to the beginning of the second half space , And all _ die _ object ( Remains in the first half space ) Will be discarded . No matter what , A half space can guarantee all of its _ Have used _ It's all at the bottom , And all _ Idle _ The spaces are all at the top , So it's always going to look like the picture above

But in our case , The current half space has a lot of free memory , So we cut off 20 Bytes , Then increase Top The pointer . There's no need for garbage collection , It's not about the second half space . stay V8 In the code , There are many special circumstances to consider , But in the end 20 The allocation of bytes is made up of src/heap/new-spaces-inl.h Medium NewSpace::AllocateFastUnaligned method-treated

5、 ... and 、 Returns a handle

Handle (Handle) yes C++ A term often used in programming . It's not a concrete 、 A fixed data type or entity , It represents a broad concept in programming .

A handle is usually a method to get another object - A generalized pointer , Its concrete form may be an integer 、 An object or a real pointer , And its purpose is Establish a unique connection between the access and the object being accessed

Now we have a pointer , Points to the content that is fully populated with a string ( Including the length 、 Hash value and mapping ) Of Heap Block, This pointer must be returned to the client application . If you remember , The client called this line of code

Local<String> source = String::NewFromUtf8Literal(isolate, "1 + 1");
 Copy code 

however ,source What is the type of ,Local<String> What does it mean ? There are two key observations here :

Convert inner class to outer class

First , So let's go back , V8 Use v8::internal::SeqOneByteString Class stores our string object , Interestingly, it's just a pointer to the data on the heap . However , The type of data the client application expects is v8::String, This is a V8 API Part of

You may be surprised ,v8::internal::SeqOneByteStringv8::internal::String A subclass of ) And v8::String In a completely different class hierarchy . in fact , All inner classes are in src/objects Use... In the directory v8::internal Namespace defined , And the outer class is in include/v8.h Use in v8 Namespace defined

Go back to what we discussed before NewFromUtf8Literal Method ( see src/api/, The last step before returning the object pointer to the client application is to get the result from v8::internal::String Turn into v8::String

return Utils::ToLocal(handle_result);
 Copy code 

This transformation is defined by src/api/api-inl.h Macro in

translator's note : macro (Macro) It's essentially code snippets , Use... By alias . In preprocessing before compilation , The macro will be replaced with the actual code snippet

Manage garbage collection 「 root 」

secondly , Let's talk about it Local<String> The meaning of ( By the way , It is v8::Local<v8::String> Abbreviation ).Local The concept of string objects is when they are no longer needed , How do we deal with its garbage collection

whatever JavaScript Developers know that , When the object has no remaining references , It's garbage collection . Reclaim algorithm from 「 root 」 Start , And then traverse the entire heap , Find all accessible objects . Root is a non heap (non-heap) quote , Like a global variable , Or stack based that is still in scope (stack-based) Local variables of . If these variables are assigned new values , Or they leave the scope ( Their encapsulation function ends ), The data they used to point to may now be junk

translator's note : Stack is stack , This 「 Pile up 」 It's not a heap in the sense of data structure (Heap), It's a heap in the sense of dynamic memory allocation - Memory areas for managing dynamic life cycles

stay In the case of the program , We are C++ There are also pointers in the stack , You can refer to heap objects . These pointers don't have a corresponding JavaScript Variable name , Because they only exist in C++ In the context of the program ( such as, perhaps Chrome, perhaps NodeJS). for example :

Local<String> source = ...
 Copy code 

under these circumstances ,source Is a reference to a heap object , Even though there's an extra layer of indirectness . This picture will explain : image.png

translator's note : ptr to heap = pointer to heap Pointer to memory block

Intuitive direction :source Point to 1 + 1 It actually points to :source Point to ptr to heap Point to Heap Block( Which includes 1 + 1

On the left is C++ Stack , With the execution of the program , The stack grows from top to bottom , On the right is the memory block we saw earlier . When the client program executes , It will HandleScope Objects are pushed locally C++ On the stack ( see src/samples/ Next , call String::NewFromUtf8Literal() The return value of as a Local<String> Objects stored in C++ On the stack

It looks like we've added another layer of indirectness , But it's good

  • It's easier to find roots - HandleScope An object is a collection of heap objects 「 Handle 」( That's the pointer ) The place of . You remember , This is exactly what we have SeqOneByteString object , One that points to the underlying heap data 8 Byte pointer . When garbage collection starts ,V8 It'll scan quickly HandleScope object , Find all the root pointers . then , If the underlying heap data is moved , It can update these pointers .

  • ** Local pointers are easy to manage - ** And quite a lot of HandleScope comparison ,Local<String> The object is C++ One on the stack 8 Byte value , It can be anything else 8 Byte value ( Like a pointer or an integer ) Use... In the same context . especially , It can be stored in CPU In the register , Pass to function , Or as a return value . It is worth noting that , When garbage collection happens , The garbage collector does not need to locate or update these values

  • ** It's easy to eliminate scopes - ** Last , When in a client application C++ When the function finishes ,C++ On the stack HandleScope and Local The object will be deleted , But only when they C++ Object destructors are not deleted until they are called . These destructors remove all handles from the root list of the garbage collector . They are no longer in scope , So the underlying heap objects may have become garbage

Translator's note : Destructor (destructor) Contrary to constructors , When an object ends its life cycle , When the function of the object has been called , The system automatically performs the destructor . Destructors are often used to do “ Clean up the aftermath ” The job of ( For example, when creating an object, use new Opened up a piece of memory space ,delete Will automatically call the destructor to free up memory )

Last , Quote our 1 + 1 A string of source Variable , Now we're ready to pass on to the next line in our client application

Local<Script> script =
Script::Compile(context, source).ToLocalChecked();
 Copy code 

Next section ……

Distribute... On the heap 1 + 1 There's obviously a lot of work to do . I hope it shows V8 Some parts of the internal architecture , And how to represent data in different parts of the system . In future blogs , I'll look more at how our simple expressions are parsed and executed , This will reveal more about V8 How it works

At the end of this series 2 part , I'm going to delve into _ Compile cache _ How it works , To avoid compiling code longer than necessary

appendix :

The first time to translate an article , thank deepL、 Baidu translation 、 Google Translate

Author authorization : image.png


  1. Behind the miracle of the sixth championship is the football with AI blessing in the Bundesliga
  2. An easy to use Visual Studio code extension - live server, suitable for front-end gadget development
  3. 用 Python 抓取公号文章保存成 HTML
  4. User login of front end spa project based on Vue and Quasar (2)
  5. Summary of common selectors in CSS
  6. Using Python to grab articles with public number and save them as HTML
  7. To "restless" you
  8. 【免费开源】基于Vue和Quasar的crudapi前端SPA项目实战—环境搭建 (一)
  9. 【微信小程序】引入阿里巴巴图标库iconfont
  10. layui表格点击排序按钮后,表格绑定事件失效解决方法
  11. Unity解析和显示/播放GIF图片,支持http url,支持本地file://,支持暂停、继续播放
  12. 【vue】 export、export default、import的用法和区别
  13. [free and open source] crudapi front end spa project based on Vue and Quasar
  14. [wechat applet] introduces Alibaba icon library iconfont
  15. Layui table click Sort button, table binding event failure solution
  16. Element树形控件Tree踩坑:修改current-node-key无效
  17. Unity parses and displays / plays GIF images, supports HTTP URL, supports local file: / /, supports pause and resume playback
  18. Element树形控件Tree踩坑:修改current-node-key无效
  19. The usage and difference of export, export default and import
  20. Element tree control: invalid to modify current node key
  21. Element tree control: invalid to modify current node key
  22. linux下安装apache(httpd-2.4.3版本)各种坑
  23. How to install Apache (httpd-2.4.3) under Linux
  24. 程序员业余时间写的代码也算公司的?Nginx之父被捕引发争议
  25. Nacos serialize for class [] failed.
  26. Do programmers write code in their spare time? Controversy over the arrest of nginx's father
  27. Nacos serialize for class [ . common.http.HttpRestResult ] failed.
  28. Seamless management of API documents using eolink and gitlab
  29. vue 的基础应用(上)
  30. 28岁开始零基础学前端,这些血的教训你一定要避免
  31. Basic application of Vue
  32. Starting at the age of 28, you must avoid these bloody lessons
  33. Ubuntu 16.04 can not connect to the wireless solution and QQ installation
  34. Industry security experts talk about the rapid development of digital economy, how to guarantee the security of data elements?
  35. 利用Vue实现一个简单的购物车功能
  36. Behind the "tireless classroom" and teacher training, can byte education really "work wonders"?
  37. Using Vue to realize a simple shopping cart function
  38. 【css】伪类和伪类元素的区别
  39. 【css效果】实现简单的下拉菜单
  40. 【vue】父子组件传值
  41. The difference between pseudo class and pseudo class elements
  42. [CSS effect] simple drop-down menu
  43. [Vue] value transfer by parent-child component
  44. 【css】设置table表格边框样式
  45. 【css】修改input,textarea中的placeholder样式
  46. vue-router的两种模式(hash和history)及区别
  47. CSS3的滤镜filter属性
  48. [CSS] set table border style
  49. [CSS] modify the placeholder style in input and textarea
  50. Two modes of Vue router (hash and History) and their differences
  51. Filter property of CSS3
  52. 全局安装gulp 报错问题解决
  53. Solution of error report in global installation of gulp
  54. 18个好用的自定义react hook
  55. 你应该知道的常用服务器HTTP状态码?
  56. 18 user defined react hooks
  57. What HTTP status codes should you know about common servers?
  58. 手把手教你打造属于自己团队的前端小报系统
  59. Hand in hand to teach you to build your own front-end tabloid system
  60. In 2021, enterprise SEO actual operation, how to less update, batch ranking regional words?