Optimization of pod creation efficiency in serverless scenario

Aliyunqi 2021-02-23 03:35:43
optimization pod creation efficiency serverless

brief introduction : as everyone knows ,Kubernetes It's the cornerstone of cloud native , As the infrastructure for container choreography , It is widely used in Serverless field . Resilience is Serverless The core competitiveness of the field , This sharing will focus on Kubernetes Of Serverless In service , How to optimize Pod Create efficiency , Improve flexibility and efficiency .

Serverless  Introduction to calculation

Before entering the subject , Let's briefly review Serverless Definition of calculation .

You can learn from Wikipedia that ,Serverless Computing is a form of Cloud Computing , Cloud vendors manage servers , Dynamically allocate machine resources to users , Billing based on the amount of resources actually used .

When users build and run services , Regardless of the server , Reduce the burden of user management server . In the peak period of business, the instance can be automatically expanded through the flexibility of the platform , Example of automatic capacity reduction during low peak period of service , Reduce resource costs .

Serverless  Computing platform

The following are the common ones at present  Serverless  The architecture of computing products .

The whole product architecture usually has two layers: control plane and data plane , Control the flat service developers , Manage application lifecycle , Meet the needs of developers for application management , The accessor of data plane service application , For example, users of developer business , Meet the traffic management and access demands of the application .

The control plane usually uses Kubernetes Do resource management and scheduling ,master Usually 3 node , Meet the need for high availability , Nodes through the intranet SLB visit K8s master.

At the node level , There are usually two types of nodes :

  • One is to run kubelet The node of , Such as bare metal servers 、 Virtual machines, etc , This kind of node will run security container as Pod Runtime , Every Pod Have an independent kernel, Reduce the number of shared hosts kernel Security risks . At the same time, through cloud products VPC Network or other network technology , Isolate the tenant's network access at the data link layer . adopt Safe containers + Layer 2 network isolation , A reliable multi rent running environment can be provided on a single node .
  • There is also a virtual node , adopt VirtualKubelet join K8s And elastic examples . Elastic instance is a kind of lightweight resource form similar to virtual machine in cloud products , Container group services that provide unlimited resource pools , The concept of this container group corresponds to K8s Medium Pod Concept .AWS There are Fargate Elastic examples , Alibaba cloud provides ECI Elastic examples .

Serverless The product will provide information based on K8s Of PaaS layer , Responsible for providing deployment to developers 、 Development and other related services , shielding K8s Related concepts , Reduce developer development 、 The cost of operation and maintenance application .

On the data plane , User access SLB Access to application instances .PaaS The layer will also usually provide, for example, traffic grayscale in the plane 、A/B  Testing and other traffic management services , Meet the needs of developers for traffic management .

Resilience is Serverless The core competitiveness of computing platforms , Need to meet the developer's demand for Pod scale The appeal of , Providing the ability to pool like an infinite resource , At the same time, we have to meet the requirements of creating Pod The demand for efficiency , Respond to requests in a timely manner .

Pod Scale can be increased by IaaS Layer resources to satisfy , Next, we will focus on improving Pod Technology to create efficiency .

Pod  Create related scenes

Let's get to know Pod Create relevant scenes , In this way, business demands can be more effectively met through technology .

There are two scenarios in the business that involve Pod establish :

  • The first is to create applications , This process will be scheduled first , Decision making is best for Pod The node of , Then create... On the node Pod.
  • The second is to upgrade the application , In the process , It's always going on Create a new Pod and Destroy the old Pod.

Serverless In service , Developers focus on the life cycle of the application , Especially in the creation and upgrade phase ,Pod Creation efficiency affects the overall time consumption of these two stages , And then affect the developer experience . In the face of sudden traffic , The level of creation efficiency will have an important impact on the response speed of developer services , In serious cases, developers' business will be damaged .

In the face of the above business scenarios , Next, focus on how to improve Pod Create efficiency .

establish  Pod  technological process

On the whole Pod The stage of creation , According to the impact Pod Create efficiency priorities to solve in turn .

This is a simplified creation Pod technological process :


When there is Pod When creating a request , Schedule first , by Pod Select the most appropriate node . On the node , First, pull the image , When the image is ready locally , Then create the container group . In the pull mirror phase , It is divided into two steps: downloading image and decompressing image .

We tested two types of images , give the result as follows :


It can be seen from the test results that , The proportion of decompression time in the whole process of image pulling cannot be ignored , Before decompression 248MB Left and right golang:1.10 Mirror image , The time taken to unzip the image actually accounts for the time taken to pull the image 77.02%, For the section before decompression 506MB Left and right hadoop namenode Mirror image , The time spent decompressing the image and downloading the image respectively 40% and 60% about , That is to say, the total time-consuming of the image pulling process can not be ignored .

Next, optimize the different nodes of the above process , From the whole process mentioned above 、 Decompression mirror image 、 Download Image and so on .

The efficiency of pulling images is improved


Image preheating

A quick way to think of it is to warm up the image , stay Pod Prepare the mirror image on the node before scheduling to the node , Pull image from create Pod Remove from the main chain of , Here's the picture :


Global preheating can be performed before scheduling , Pull the image in advance on all nodes . It can also warm up during the scheduling process , In determining the scheduled
After node , Pull the image on the target node .

There is nothing wrong with the two ways , You can choose according to the actual situation of the cluster .

In the community OpenKruise The project is about to launch image preheating service , You can pay attention to . Here's how the service is used :


adopt ImagePullJob CRD Issue the image preheating task , Specify the target image and node , Configurable pull concurrency 、Job The processing timeout and Job Object Time for automatic recycling . If it's a private image , You can specify when pulling the image secret To configure .ImagePullJob Of Events  Will pick up the status information of the mirror task , It can be considered to increase Job Object Time for automatic recycling , Easy to pass ImagePullJob Events View the processing status of the task .

Improve decompression efficiency

From the data of the pull image just seen , The time spent decompressing the image will account for a large proportion of the total time spent pulling the image , The largest proportion of test cases is  77%, So we need to consider how to improve the decompression efficiency .

Look back at docker pull Technical details of :


stay docker pull when , There will be two stages in the whole process :

  • Parallel download image layer
  • Take apart image layer

Decompression image When the layer , Default adopted gunzip.

Let's have a brief look at docker push The process of :

  • First pair image Layer for packaging operations , In this process, we will pass gzip Compress .
  • And then upload in parallel .

gzip/gunzip It's single threaded compression / Unzip tool , Consider using pigz/unpigz Multi threaded compression / decompression , Take advantage of multi-core .

containerd from 1.2 Version starting support pigz, Install on node unpigz After the tool , It will be used for decompression first . In this way , The efficiency of image decompression can be improved through the multi-core capability of nodes .

This process also needs attention download / Upload The concurrency problem of ,docker daemon Two parameters are provided to control concurrency , Control the number of mirror layers for parallel processing ,--max-concurrent-downloads and --max-concurrent-uploads. By default , The concurrency of downloads is 3, The concurrency of uploads is 5, It can be adjusted to the appropriate value according to the test results .

Use unpigz After decompression image efficiency :


In the same environment ,golang:1.10 The efficiency of image decompression is improved 35.88%,hadoop namenode The efficiency of image decompression is improved 16.41%.

Uncompressed mirror

Usually the bandwidth of Intranet is large enough , Is it possible to save decompression / Compress The logic of , Focus the time of pulling images on downloading images ? That is, increase the download time appropriately , Reduce decompression time .

Look back docker pull/push The process of , stay unpack/pack Stage , You can think about gunzip and gzip Remove the logic of :

about docker Mirror image , if docker push The image is uncompressed , be docker pull There is no need to decompress , So to achieve these goals , It needs to be in docker push Remove the compression logic .

docker daemon The above operation is not supported for the time being , We are right. docker There was a change , Do not compress when uploading image , The test results are as follows :


Here we focus on the time consuming of decompressing the image , You can see golang:1.10 The efficiency of image decompression is improved 50% about ,hadoop namenode The image decompression efficiency avatar is dead 28% about . In terms of the total time consumed to pull the image , The scheme has a certain effect .

Image distribution

In small clusters , To improve the efficiency of pulling images, we need to focus on improving the decompression efficiency , Downloading images is usually not the bottleneck . And in large clusters , Because of the large number of nodes , Central Image Registry Bandwidth and stability will also affect the efficiency of pulling images , Here's the picture :


The pressure to download images is concentrated on the central Image Registry On .

Here we introduce a new method based on P2P To solve the above problems , With CNCF Of DragonFly Project as an example :

Here are a few core components :


It's essentially a central SuperNode, stay P2P In the network as tracker and scheduler Coordinate the download task of the node . It's also a caching service , Cache from Image Registry Image downloaded from , Reduce the number of nodes added to Image Registry The pressure .


It is the client that downloads the image on the node , At the same time, it acts as the ability to provide data to other nodes , The existing local image data can be provided to other nodes on demand .


On each node there is a Dfdaemon Components , It's essentially a proxy, Yes docker daemon The request to pull the image of the image implements the transparent proxy service , Use Dfget Download mirroring .

adopt P2P The Internet , Central Image Registry The data is cached to ClusterManager in ,ClusterManager Coordinate the download requirements of the node for the image , Allocate the pressure of downloading the image to the cluster nodes , The cluster node is the pull side of the image data , It's also the provider of mirror data , Make full use of the capacity of Intranet bandwidth for image distribution .

Load images on demand

In addition to the methods described above , Is there any other optimization method ?

When creating a container on the current node , You need to pull all the data of the image to the local first , Then you can start the container . Consider the process of starting a virtual machine , Even a few hundred GB Virtual machine image of , Starting a virtual machine is usually at the second level , You can hardly feel the impact of virtual machine image size .

So can similar technologies be used in the container field ?

Look at another article published in usenix It's called 《Slacker: Fast Distribution with Lazy Docker Containers》 Of paper describe :

Our analysis shows that pulling packages accounts for 76% of container start time, but only 6.4% of  
that data is read.

The paper analysis , In the mirror startup time , The proportion of pull image 76%, But at startup , have only 6.4% The data is used to , That is to say, the amount of mirror data required for mirror startup is very small , You need to consider loading the image on demand during the image startup phase , Change the way you use images .

about 「Image all layers The image can only be started after downloading 」, You need to load the image on demand when you start the container instead , It's like starting a virtual machine , Only the data needed in the start-up phase is transmitted through the network .

But the current image format is usually tar.gz or tar, and tar The file has no index ,gzip The file cannot read data from any location , In this way, it can't meet the requirement of pulling the specified file on demand , The image format needs to be changed to an indexable file format .

Google A new image format is proposed ,stargz, The full name is seeable tar.gz. It's compatible with the current image format , But it provides a file index , Data can be read from a specified location .

Conventional .tar.gz The file is generated like this : Gzip(TarF(file1) + TarF(file2) + TarF(file3) + TarFooter)). Package each file separately , Then compress the file group .

stargz The document makes such an innovation :Gzip(TarF(file1)) + Gzip(TarF(file2)) + Gzip(TarF(file3_chunk1)) + Gzip(F(file3_chunk2)) + Gzip(F(index of earlier files in magic file), TarFooter). Package and compress each file , Form an index file at the same time , and TarFooter Compress together .

In this way, you can quickly locate the location of the file to be pulled through the index file , Then pull the file from the specified location .

And then in containerd Pull the mirror link , Yes containerd Provide a kind of remote snapshotter, In creating the container rootfs When the layer , Instead of downloading the image layer first and then building it , Instead of directly mount Remote storage tier , As shown in the figure below :


To achieve this capability , On the one hand, it needs to be revised containerd The current logic , stay filter The stage identifies the remote mirror layer , For such a mirror layer, no download operation , On the one hand, we need to achieve a remote snapshotter, To support the management of the remote layer .

When containerd adopt remote snapshotter When creating a container , There is no need to pull the mirror image , For files needed during startup , But for stargz Format of the image data HTTP Range GET request , Pull the target data .

Alibaba cloud has implemented a system called DADI The accelerator , Ideas like the above , Currently, it is used in Alibaba cloud container service , Realized 3.01s start-up  
10000 A container , Perfect to eliminate the long wait for cold start . Interested readers also refer to the article :https://developer.aliyun.com/article/742103

Upgrade in place

All of the above are aimed at creating Pod Technical solutions provided by the process , For the upgrade scenario , Under the existing technology , Whether there is the possibility of efficiency improvement ? Whether the following effects can be achieved , That is to say, there is no need to create Pod The process of , Realization Pod Upgrade in place ?


In the upgrade scenario , The most common scenario is to upgrade the image only . For this scenario , You can use K8s Self patch Ability . adopt patch image,Pod Will not rebuild , Target only container The reconstruction , So you don't have to go through it completely Dispatch + newly build Pod technological process , Upgrade in place only for containers that need to be upgraded .

In the process of upgrading in place , With the help of K8s readinessGates Ability , Can be controlled Pod Elegant offline , from K8s Endpoint Controller Take the initiative to remove the upgrade Pod, stay Pod After upgrading in place, add the upgraded Pod, The traffic is lossless during the upgrade .

OpenKruise In the project CloneSet Controller Provides the above capabilities :


Developers use CloneSet Declare application , Usage is similar. Deployment. When upgrading the image , from CloneSet Controller Responsible for the execution of  patch operation , At the same time, ensure that the business flow is not damaged during the upgrade process .


Starting from the business scenario , We learned about ascension Pod Create scenarios where efficiency benefits . And then by analyzing Pod Created process , Make corresponding optimization for different stages , have a definite object in view .

Through this analysis process , Make it possible to effectively meet business needs through technology .

Author's brief introduction

Zhang Yifei , Working in Alibaba cloud container service team , Main focus  Serverless  Product R & D in the industry .

Link to the original text

This article is the original content of Alibaba cloud , No reprint without permission


  1. An inexperienced front-end engineer, what are the common problems when writing CSS?
  2. HttpServletRequest get URL (parameter, path, port number, protocol, etc.) details
  3. Springboot starts http2
  4. Enabling http2.0 in spring boot
  5. JQuery:JQuery基本语法,JQuery选择器,JQuery DOM,综合案例 复选框,综合案例 随机图片
  6. Using JavaScript in Safari browser history.back () the page will not refresh after returning to the previous page
  7. vue.js Error in win10 NPM install
  8. In less than two months, musk made more than $1 billion, more than Tesla's annual profit
  9. Springboot starts http2
  10. Vue event bus
  11. JQuery easy UI tutorial: custom data grid Pagination
  12. Using okhttp and okhttpgo to obtain onenet cloud platform data
  13. Vue3 component (IX) Vue + element plus + JSON = dynamic rendering form control
  14. HTTP 1. X learning notes: an authoritative guide to Web Performance
  15. Vue3 component (IX) Vue + element plus + JSON = dynamic rendering form control
  16. HTTP 1. X learning notes: an authoritative guide to Web Performance
  17. JQuery:JQuery基本语法,JQuery选择器,JQuery DOM,综合案例 复选框,综合案例 随机图片
  18. Event bubble and capture in JavaScript
  19. The root element is missing solution
  20. Event bubble and capture in JavaScript
  21. Configure the certificate to enable ngnix to publish the trusted website of HTTPS
  22. Javascript数据类型
  23. HTTP interface debugging tool! 48000 star HTTP command line client!
  24. Parameter encryption of front end URL link band
  25. HTTP interface debugging tool! 48000 star HTTP command line client!
  26. Three front end frameworks: data binding and data flow
  27. Reading Axios source code (1) -- exploring the realization of basic ability
  28. Event bubble and capture in JavaScript
  29. 【微前端】微前端最終章-qiankun指南以及微前端整體探索
  30. R & D solution e-Car front end monitoring system
  31. [JS] 877 - 35 wonderful knowledge of JavaScript, long experience!
  32. R & D solution e-Car front end monitoring system
  33. High performance nginx HTTPS tuning - how to speed up HTTPS by 30%
  34. 解决ajax跨域问题【5种解决方案】
  35. Top ten classic sorting of JavaScript
  36. HTTP 1. X learning notes: an authoritative guide to Web Performance
  37. Vue3 component (IX) Vue + element plus + JSON = dynamic rendering form control component
  38. My http / 1.1 is so slow!
  39. Why Vue uses asynchronous rendering
  40. The response status was 0. Check out the W3C XMLHttpRequest Level 2 spec for
  41. The tapable instance object hook of webpack4. X core tool library
  42. The tapable instance object hook of webpack4. X core tool library
  43. Using libcurl for HTTP communication in C + +
  44. Using libcurl for HTTP communication in C + +
  45. Using CSS variable in Vue
  46. Deeply understand the update of state and props in react
  47. No matter how fast the Internet is, it's useless! In addition to Baidu disk, there is this website slow to let you eat shriveled
  48. Baidu share does not support the solution of HTTPS
  49. [micro front end] the final chapter of micro front end - Qiankun guide and overall exploration of micro front end
  50. [micro front end] the final chapter of micro front end - Qiankun guide and overall exploration of micro front end
  51. Vue cli creates vue3 project
  52. Nginx reverse proxy for windows authentication using NTLM
  53. Rust tutorial: introduction to rust for JavaScript developers
  54. Deploying personal blog to Tencent cloud with serverless framework
  55. R & D solution e-Car front end monitoring system
  56. JavaScript advanced learning
  57. Spend 1 minute to master these 5 ppt tips, courseware making less detours
  58. Vue: vuex persistent state
  59. React native gets the current network state of the device Netinfo
  60. High performance nginx HTTPS tuning - how to speed up HTTPS by 30%