brief introduction ： as everyone knows ,Kubernetes It's the cornerstone of cloud native , As the infrastructure for container choreography , It is widely used in Serverless field . Resilience is Serverless The core competitiveness of the field , This sharing will focus on Kubernetes Of Serverless In service , How to optimize Pod Create efficiency , Improve flexibility and efficiency .
Before entering the subject , Let's briefly review Serverless Definition of calculation .
You can learn from Wikipedia that ,Serverless Computing is a form of Cloud Computing , Cloud vendors manage servers , Dynamically allocate machine resources to users , Billing based on the amount of resources actually used .
When users build and run services , Regardless of the server , Reduce the burden of user management server . In the peak period of business, the instance can be automatically expanded through the flexibility of the platform , Example of automatic capacity reduction during low peak period of service , Reduce resource costs .
The following are the common ones at present Serverless The architecture of computing products .
The whole product architecture usually has two layers: control plane and data plane , Control the flat service developers , Manage application lifecycle , Meet the needs of developers for application management , The accessor of data plane service application , For example, users of developer business , Meet the traffic management and access demands of the application .
The control plane usually uses Kubernetes Do resource management and scheduling ,master Usually 3 node , Meet the need for high availability , Nodes through the intranet SLB visit K8s master.
At the node level , There are usually two types of nodes ：
Serverless The product will provide information based on K8s Of PaaS layer , Responsible for providing deployment to developers 、 Development and other related services , shielding K8s Related concepts , Reduce developer development 、 The cost of operation and maintenance application .
On the data plane , User access SLB Access to application instances .PaaS The layer will also usually provide, for example, traffic grayscale in the plane 、A/B Testing and other traffic management services , Meet the needs of developers for traffic management .
Resilience is Serverless The core competitiveness of computing platforms , Need to meet the developer's demand for Pod scale The appeal of , Providing the ability to pool like an infinite resource , At the same time, we have to meet the requirements of creating Pod The demand for efficiency , Respond to requests in a timely manner .
Pod Scale can be increased by IaaS Layer resources to satisfy , Next, we will focus on improving Pod Technology to create efficiency .
Let's get to know Pod Create relevant scenes , In this way, business demands can be more effectively met through technology .
There are two scenarios in the business that involve Pod establish ：
Serverless In service , Developers focus on the life cycle of the application , Especially in the creation and upgrade phase ,Pod Creation efficiency affects the overall time consumption of these two stages , And then affect the developer experience . In the face of sudden traffic , The level of creation efficiency will have an important impact on the response speed of developer services , In serious cases, developers' business will be damaged .
In the face of the above business scenarios , Next, focus on how to improve Pod Create efficiency .
On the whole Pod The stage of creation , According to the impact Pod Create efficiency priorities to solve in turn .
This is a simplified creation Pod technological process ：
When there is Pod When creating a request , Schedule first , by Pod Select the most appropriate node . On the node , First, pull the image , When the image is ready locally , Then create the container group . In the pull mirror phase , It is divided into two steps: downloading image and decompressing image .
We tested two types of images , give the result as follows ：
It can be seen from the test results that , The proportion of decompression time in the whole process of image pulling cannot be ignored , Before decompression 248MB Left and right golang:1.10 Mirror image , The time taken to unzip the image actually accounts for the time taken to pull the image 77.02%, For the section before decompression 506MB Left and right hadoop namenode Mirror image , The time spent decompressing the image and downloading the image respectively 40% and 60% about , That is to say, the total time-consuming of the image pulling process can not be ignored .
Next, optimize the different nodes of the above process , From the whole process mentioned above 、 Decompression mirror image 、 Download Image and so on .
A quick way to think of it is to warm up the image , stay Pod Prepare the mirror image on the node before scheduling to the node , Pull image from create Pod Remove from the main chain of , Here's the picture ：
Global preheating can be performed before scheduling , Pull the image in advance on all nodes . It can also warm up during the scheduling process , In determining the scheduled
After node , Pull the image on the target node .
There is nothing wrong with the two ways , You can choose according to the actual situation of the cluster .
In the community OpenKruise The project is about to launch image preheating service , You can pay attention to . Here's how the service is used ：
adopt ImagePullJob CRD Issue the image preheating task , Specify the target image and node , Configurable pull concurrency 、Job The processing timeout and Job Object Time for automatic recycling . If it's a private image , You can specify when pulling the image secret To configure .ImagePullJob Of Events Will pick up the status information of the mirror task , It can be considered to increase Job Object Time for automatic recycling , Easy to pass ImagePullJob Events View the processing status of the task .
From the data of the pull image just seen , The time spent decompressing the image will account for a large proportion of the total time spent pulling the image , The largest proportion of test cases is 77%, So we need to consider how to improve the decompression efficiency .
Look back at docker pull Technical details of ：
stay docker pull when , There will be two stages in the whole process ：
Decompression image When the layer , Default adopted gunzip.
Let's have a brief look at docker push The process of ：
gzip/gunzip It's single threaded compression / Unzip tool , Consider using pigz/unpigz Multi threaded compression / decompression , Take advantage of multi-core .
containerd from 1.2 Version starting support pigz, Install on node unpigz After the tool , It will be used for decompression first . In this way , The efficiency of image decompression can be improved through the multi-core capability of nodes .
This process also needs attention download / Upload The concurrency problem of ,docker daemon Two parameters are provided to control concurrency , Control the number of mirror layers for parallel processing ,--max-concurrent-downloads and --max-concurrent-uploads. By default , The concurrency of downloads is 3, The concurrency of uploads is 5, It can be adjusted to the appropriate value according to the test results .
Use unpigz After decompression image efficiency ：
In the same environment ,golang:1.10 The efficiency of image decompression is improved 35.88%,hadoop namenode The efficiency of image decompression is improved 16.41%.
Usually the bandwidth of Intranet is large enough , Is it possible to save decompression / Compress The logic of , Focus the time of pulling images on downloading images ？ That is, increase the download time appropriately , Reduce decompression time .
Look back docker pull/push The process of , stay unpack/pack Stage , You can think about gunzip and gzip Remove the logic of ：
about docker Mirror image , if docker push The image is uncompressed , be docker pull There is no need to decompress , So to achieve these goals , It needs to be in docker push Remove the compression logic .
docker daemon The above operation is not supported for the time being , We are right. docker There was a change , Do not compress when uploading image , The test results are as follows ：
Here we focus on the time consuming of decompressing the image , You can see golang:1.10 The efficiency of image decompression is improved 50% about ,hadoop namenode The image decompression efficiency avatar is dead 28% about . In terms of the total time consumed to pull the image , The scheme has a certain effect .
In small clusters , To improve the efficiency of pulling images, we need to focus on improving the decompression efficiency , Downloading images is usually not the bottleneck . And in large clusters , Because of the large number of nodes , Central Image Registry Bandwidth and stability will also affect the efficiency of pulling images , Here's the picture ：
The pressure to download images is concentrated on the central Image Registry On .
Here we introduce a new method based on P2P To solve the above problems , With CNCF Of DragonFly Project as an example ：
Here are a few core components ：
It's essentially a central SuperNode, stay P2P In the network as tracker and scheduler Coordinate the download task of the node . It's also a caching service , Cache from Image Registry Image downloaded from , Reduce the number of nodes added to Image Registry The pressure .
It is the client that downloads the image on the node , At the same time, it acts as the ability to provide data to other nodes , The existing local image data can be provided to other nodes on demand .
On each node there is a Dfdaemon Components , It's essentially a proxy, Yes docker daemon The request to pull the image of the image implements the transparent proxy service , Use Dfget Download mirroring .
adopt P2P The Internet , Central Image Registry The data is cached to ClusterManager in ,ClusterManager Coordinate the download requirements of the node for the image , Allocate the pressure of downloading the image to the cluster nodes , The cluster node is the pull side of the image data , It's also the provider of mirror data , Make full use of the capacity of Intranet bandwidth for image distribution .
In addition to the methods described above , Is there any other optimization method ？
When creating a container on the current node , You need to pull all the data of the image to the local first , Then you can start the container . Consider the process of starting a virtual machine , Even a few hundred GB Virtual machine image of , Starting a virtual machine is usually at the second level , You can hardly feel the impact of virtual machine image size .
So can similar technologies be used in the container field ？
Look at another article published in usenix It's called 《Slacker: Fast Distribution with Lazy Docker Containers》 Of paper describe ：
Our analysis shows that pulling packages accounts for 76% of container start time, but only 6.4% of
that data is read.
The paper analysis , In the mirror startup time , The proportion of pull image 76%, But at startup , have only 6.4% The data is used to , That is to say, the amount of mirror data required for mirror startup is very small , You need to consider loading the image on demand during the image startup phase , Change the way you use images .
about 「Image all layers The image can only be started after downloading 」, You need to load the image on demand when you start the container instead , It's like starting a virtual machine , Only the data needed in the start-up phase is transmitted through the network .
But the current image format is usually tar.gz or tar, and tar The file has no index ,gzip The file cannot read data from any location , In this way, it can't meet the requirement of pulling the specified file on demand , The image format needs to be changed to an indexable file format .
Google A new image format is proposed ,stargz, The full name is seeable tar.gz. It's compatible with the current image format , But it provides a file index , Data can be read from a specified location .
Conventional .tar.gz The file is generated like this : Gzip(TarF(file1) + TarF(file2) + TarF(file3) + TarFooter)). Package each file separately , Then compress the file group .
stargz The document makes such an innovation ：Gzip(TarF(file1)) + Gzip(TarF(file2)) + Gzip(TarF(file3_chunk1)) + Gzip(F(file3_chunk2)) + Gzip(F(index of earlier files in magic file), TarFooter). Package and compress each file , Form an index file at the same time , and TarFooter Compress together .
In this way, you can quickly locate the location of the file to be pulled through the index file , Then pull the file from the specified location .
And then in containerd Pull the mirror link , Yes containerd Provide a kind of remote snapshotter, In creating the container rootfs When the layer , Instead of downloading the image layer first and then building it , Instead of directly mount Remote storage tier , As shown in the figure below ：
To achieve this capability , On the one hand, it needs to be revised containerd The current logic , stay filter The stage identifies the remote mirror layer , For such a mirror layer, no download operation , On the one hand, we need to achieve a remote snapshotter, To support the management of the remote layer .
When containerd adopt remote snapshotter When creating a container , There is no need to pull the mirror image , For files needed during startup , But for stargz Format of the image data HTTP Range GET request , Pull the target data .
Alibaba cloud has implemented a system called DADI The accelerator , Ideas like the above , Currently, it is used in Alibaba cloud container service , Realized 3.01s start-up
10000 A container , Perfect to eliminate the long wait for cold start . Interested readers also refer to the article ：https://developer.aliyun.com/article/742103
All of the above are aimed at creating Pod Technical solutions provided by the process , For the upgrade scenario , Under the existing technology , Whether there is the possibility of efficiency improvement ？ Whether the following effects can be achieved , That is to say, there is no need to create Pod The process of , Realization Pod Upgrade in place ？
In the upgrade scenario , The most common scenario is to upgrade the image only . For this scenario , You can use K8s Self patch Ability . adopt patch image,Pod Will not rebuild , Target only container The reconstruction , So you don't have to go through it completely Dispatch + newly build Pod technological process , Upgrade in place only for containers that need to be upgraded .
In the process of upgrading in place , With the help of K8s readinessGates Ability , Can be controlled Pod Elegant offline , from K8s Endpoint Controller Take the initiative to remove the upgrade Pod, stay Pod After upgrading in place, add the upgraded Pod, The traffic is lossless during the upgrade .
OpenKruise In the project CloneSet Controller Provides the above capabilities ：
Developers use CloneSet Declare application , Usage is similar. Deployment. When upgrading the image , from CloneSet Controller Responsible for the execution of patch operation , At the same time, ensure that the business flow is not damaged during the upgrade process .
Starting from the business scenario , We learned about ascension Pod Create scenarios where efficiency benefits . And then by analyzing Pod Created process , Make corresponding optimization for different stages , have a definite object in view .
Through this analysis process , Make it possible to effectively meet business needs through technology .
Zhang Yifei , Working in Alibaba cloud container service team , Main focus Serverless Product R & D in the industry .
This article is the original content of Alibaba cloud , No reprint without permission