Face the wind! Io_ Uring optimizes nginx practice

Song Baohua 2020-11-13 04:53:14
face wind io_ io uring


io_uring yes Linux Kernel in v5.1 The introduction of a set of asynchrony IO Interface , With its rapid development , current io_uring It's far more than pure IO The category of . from Linux v5.3 Version start ,io_uring Added network programming related API, Provide users with sendmsg、recvmsg、accept、connect Asynchronous support of the interface , take io_uring The ecological scope has been extended to the network field .

From the other Linux v5.7 Start ,io_uring These asynchronous interfaces provide FAST POLL Mechanism , Users no longer need to use things like select、event poll And other multiplexing mechanisms to listen to file handles , Just drop the read-write request directly to io_uring Of submit queue And submit , When the file handle is not readable or writable , The kernel will actively add poll handler, When the file handle is read and write, it is called actively poll handler Send a read / write request again , In order to reduce the number of system calls and improve performance .

In the last article, we explored io_uring Programming models for networks and echo server benchmark Performance under , This article will be based on generic applications nginx actual combat .

Nginx io_uring Code optimization

Nginx It's a lightweight model Web The server 、 Reverse proxy , Because it uses less memory , It starts very fast , High concurrency , Widely used in Internet projects .

architecturally ,Nginx By a master And multiple worker Process composition , Multiple worker There is no need to lock between , Deal with independently with client Connection and network requests for .worker It's a single threaded loop , This is the same as the last one “ Do you think io_uring Only for storage IO? Absolutely wrong !” As described in the article echo server The model is basically consistent .

be based on event poll Programming model

event poll yes Nginx stay Linux The default event model under .

event poll Event models put listen fd And new connections sock fd All registered in event poll in , When these fd When there is data to read on , Wait for the epoll_wait() Of worker The process will be awakened , Call the corresponding callback function to process , there recv、writev All requests are synchronous requests .

be based on io_uring Programming model

Mentioned earlier ,io_uring Of FAST POLL The mechanism allows data to be stored in the ready In this case, it will be issued directly , There's no need to connect the normal fd Sign up for event poll. In addition, the read and write requests here are passed through io_uring Send out asynchronously , The processing flow is as follows :

in fact ,accept() You can also take FAFST POLL Mechanism , No need to wait listen_fd When the data is readable, it is directly distributed , To reduce the number of system calls . But in the debugging process, we found that accept() The probability of failure is greatly increased , And every time it fails accept() It's going to lead to an invalid sock Memory request and release , It's expensive , So it's still like event poll To listen listen fd. In the future, we can do some optimization for this .

test result

Test environment

  • Testing machine
    CPU: Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz  64 Logical kernel
    server cmdline add to :mitigation=on

  • nginx To configure

user root;
http {
access_log off;
server {
access_log off; // close access log, Otherwise, I will write a log , Impact testing
location / {
return 200; // Don't read local files , Go straight back to 200
  • benchmark
    Use lightweight HTTP Performance testing tools wrk Carry out pressure test .

  • The test command

 A long connection wrk -c $connection -t $thread -d 120 $url
Short connection  wrk -c $connection -t $thread -H "Connection: Close" -d 120 $url

test result

A long connection

• connection=1000,thread=200, test server It's different worker Number performance .

worker Number in 8 Following time ,QPS Yes 20% About improvement . With worker The number increases ,CPU Don't become a bottleneck , The benefits are gradually falling .

  • server single worker, test client End different connection number performance (thread Take the default number 2).

You can see the single worker Under the circumstances ,500 More than connections ,QPS Yes 20% The above promotion . In terms of the number of system calls ,io uring The number of system calls is basically event poll Of 1/10 within .

Short connection

• connection=1000,thread=200, test server It's different worker Number performance .

Short connection scenario ,io uring be relative to event poll Not only did it not improve , There's even... In some scenarios 5%~10% Performance degradation of . The reason is , except io uring Beyond the overhead of the framework itself , And maybe with io uring In programming mode, the delay caused by batch requests is related to .

Summary and next step work

From the author's current test ,io_uring Optimization in network programming is more suitable for long connection scenarios , In the long connection scenario, the highest is 20% More promotion . Short connection scenarios need to be optimized , The main consideration is the following two aspects :
• io uring Optimization of its own framework overhead , Of course, this optimization also works for long connections .
• Optimization for short connections , As for accept() request , First check if there is data to read , Avoid invalid memory request release ; Multiple accept() Send it out together and so on .

nginx and echo server And so on ( Include source code ), We are all in OpenAnolis High performance storage community SIG Open source (openanolis.org). We also welcome your active participation in the discussion and contribution , Explore together io_uring The road to high performance .

Welcome to nail scan code group communication !

本文为[Song Baohua]所创,转载请带上原文链接,感谢

  1. [front end -- JavaScript] knowledge point (IV) -- memory leakage in the project (I)
  2. This mechanism in JS
  3. Vue 3.0 source code learning 1 --- rendering process of components
  4. Learning the realization of canvas and simple drawing
  5. gin里获取http请求过来的参数
  6. vue3的新特性
  7. Get the parameters from HTTP request in gin
  8. New features of vue3
  9. vue-cli 引入腾讯地图(最新 api,rocketmq原理面试
  10. Vue 学习笔记(3,免费Java高级工程师学习资源
  11. Vue 学习笔记(2,Java编程视频教程
  12. Vue cli introduces Tencent maps (the latest API, rocketmq)
  13. Vue learning notes (3, free Java senior engineer learning resources)
  14. Vue learning notes (2, Java programming video tutorial)
  15. 【Vue】—props属性
  16. 【Vue】—创建组件
  17. [Vue] - props attribute
  18. [Vue] - create component
  19. 浅谈vue响应式原理及发布订阅模式和观察者模式
  20. On Vue responsive principle, publish subscribe mode and observer mode
  21. 浅谈vue响应式原理及发布订阅模式和观察者模式
  22. On Vue responsive principle, publish subscribe mode and observer mode
  23. Xiaobai can understand it. It only takes 4 steps to solve the problem of Vue keep alive cache component
  24. Publish, subscribe and observer of design patterns
  25. Summary of common content added in ES6 + (II)
  26. No.8 Vue element admin learning (III) vuex learning and login method analysis
  27. Write a mini webpack project construction tool
  28. Shopping cart (front-end static page preparation)
  29. Introduction to the fluent platform
  30. Webpack5 cache
  31. The difference between drop-down box select option and datalist
  32. CSS review (III)
  33. Node.js学习笔记【七】
  34. Node.js learning notes [VII]
  35. Vue Router根据后台数据加载不同的组件(思考->实现->不止于实现)
  36. Vue router loads different components according to background data (thinking - & gt; Implementation - & gt; (more than implementation)
  37. 【JQuery框架,Java编程教程视频下载
  38. [jQuery framework, Java programming tutorial video download
  39. Vue Router根据后台数据加载不同的组件(思考->实现->不止于实现)
  40. Vue router loads different components according to background data (thinking - & gt; Implementation - & gt; (more than implementation)
  41. 【Vue,阿里P8大佬亲自教你
  42. 【Vue基础知识总结 5,字节跳动算法工程师面试经验
  43. [Vue, Ali P8 teaches you personally
  44. [Vue basic knowledge summary 5. Interview experience of byte beating Algorithm Engineer
  45. 【问题记录】- 谷歌浏览器 Html生成PDF
  46. [problem record] - PDF generated by Google browser HTML
  47. 【问题记录】- 谷歌浏览器 Html生成PDF
  48. [problem record] - PDF generated by Google browser HTML
  49. 【JavaScript】查漏补缺 —数组中reduce()方法
  50. [JavaScript] leak checking and defect filling - reduce() method in array
  51. 【重识 HTML (3),350道Java面试真题分享
  52. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  53. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  54. [re recognize HTML (3) and share 350 real Java interview questions
  55. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  56. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  57. 【重识 HTML ,nginx面试题阿里
  58. 【重识 HTML (4),ELK原来这么简单
  59. [re recognize HTML, nginx interview questions]
  60. [re recognize HTML (4). Elk is so simple