Design a high concurrency and high availability HTTP service for IP query based on flash

Bright brother in the workplace 2020-11-11 10:58:09
design high concurrency high availability

The structure design

The infrastructure is flask+gunicorn+ Load balancing , Load balancing is divided into Alibaba cloud hardware load balancing service and software load nginx.gunicorn Use supervisor Conduct management .

Use nginx Software load structure diagram

Use alicloud hardware load balancing service structure diagram

because flask app It needs to be saved in memory ip Trees and countries 、 Province 、 City related dictionaries , So it takes up a lot of memory .gunicorn Of 1 individual worker Take up 300M Memory ,nginx Of 4 individual worker Less memory consumption ( Less than 100M), So occupy 1.3G Of memory ( That is, you need a 2G Memory servers ). When gunicorn When any node hangs up or upgrades , Another node is still using , It doesn't affect the overall service

ip database

IP library ( Also called IP Address database ), It is collected by professional and technical personnel through a variety of technical means over a long period of time , And there are professionals to update it for a long time 、 maintain 、 Add .

ip Database parsing query code

Implementation based on binary search tree

import struct
from socket import inet_aton, inet_ntoa
import os
import sys
_unpack_V = lambda b: struct.unpack("<L", b)
_unpack_N = lambda b: struct.unpack(">L", b)
_unpack_C = lambda b: struct.unpack("B", b)
class IpTree:
def __init__(self):
self.ip_dict = {}
self.country_codes = {}
self.china_province_codes = {}
self.china_city_codes = {}
def load_country_codes(self, file_name):
path = os.path.abspath(file_name)
with open(path, "rb") as f:
for line in f.readlines():
data = line.split('\t')
self.country_codes[data[0]] = data[1]
# print self.country_codes
except Exception as ex:
print "cannot open file %s: %s" % (file, ex)
print ex.message
def load_china_province_codes(self, file_name):
path = os.path.abspath(file_name)
with open(path, "rb") as f:
for line in f.readlines():
data = line.split('\t')
provinces = data[2].split('\r')
self.china_province_codes[provinces[0]] = data[0]
# print self.china_province_codes
except Exception as ex:
print "cannot open file %s: %s" % (file, ex)
print ex.message
def load_china_city_codes(self, file_name):
path = os.path.abspath(file_name)
with open(path, "rb") as f:
for line in f.readlines():
data = line.split('\t')
cities = data[3].split('\r')
self.china_city_codes[cities[0]] = data[0]
except Exception as ex:
print "cannot open file %s: %s" % (file, ex)
print ex.message
def loadfile(self, file_name):
ipdot0 = 254
path = os.path.abspath(file_name)
with open(path, "rb") as f:
local_binary0 =
local_offset, = _unpack_N(local_binary0[:4])
local_binary = local_binary0[4:local_offset]
# 256 nodes
while ipdot0 >= 0:
middle_ip = None
middle_content = None
lis = []
# offset
begin_offset = ipdot0 * 4
end_offset = (ipdot0 + 1) * 4
# index
start_index, = _unpack_V(local_binary[begin_offset:begin_offset + 4])
start_index = start_index * 8 + 1024
end_index, = _unpack_V(local_binary[end_offset:end_offset + 4])
end_index = end_index * 8 + 1024
while start_index < end_index:
content_offset, = _unpack_V(local_binary[start_index + 4: start_index + 7] +
content_length, = _unpack_C(local_binary[start_index + 7])
content_offset = local_offset + content_offset - 1024
content = local_binary0[content_offset:content_offset + content_length]
if middle_content != content and middle_content is not None:
contents = middle_content.split('\t')
lis.append((middle_ip, (contents[0], self.lookup_country_code(contents[0]),
contents[1], self.lookup_china_province_code(contents[1]),
contents[2], self.lookup_china_city_code(contents[2]),
contents[3], contents[4])))
middle_content, = content,
middle_ip = inet_ntoa(local_binary[start_index:start_index + 4])
start_index += 8
self.ip_dict[ipdot0] = self.generate_tree(lis)
ipdot0 -= 1
except Exception as ex:
print "cannot open file %s: %s" % (file, ex)
print ex.message
def lookup_country(self, country_code):
for item_country, item_country_code in self.country_codes.items():
if country_code == item_country_code:
return item_country, item_country_code
return 'None', 'None'
except KeyError:
return 'None', 'None'
def lookup_country_code(self, country):
return self.country_codes[country]
except KeyError:
return 'None'
def lookup_china_province(self, province_code):
for item_province, item_province_code, in self.china_province_codes.items():
if province_code == item_province_code:
return item_province, item_province_code
return 'None', 'None'
except KeyError:
return 'None', 'None'
def lookup_china_province_code(self, province):
return self.china_province_codes[province.encode('utf-8')]
except KeyError:
return 'None'
def lookup_china_city(self, city_code):
for item_city, item_city_code in self.china_city_codes.items():
if city_code == item_city_code:
return item_city, item_city_code
return 'None', 'None'
except KeyError:
return 'None', 'None'
def lookup_china_city_code(self, city):
return self.china_city_codes[city]
except KeyError:
return 'None'
def lookup(self, ip):
ipdot = ip.split('.')
ipdot0 = int(ipdot[0])
if ipdot0 < 0 or ipdot0 > 255 or len(ipdot) != 4:
return None
d = self.ip_dict[int(ipdot[0])]
except KeyError:
return None
if d is not None:
return self.lookup1(inet_aton(ip), d)
return None
def lookup1(self, net_ip, (net_ip1, content, lefts, rights)):
if net_ip < net_ip1:
if lefts is None:
return content
return self.lookup1(net_ip, lefts)
elif net_ip > net_ip1:
if rights is None:
return content
return self.lookup1(net_ip, rights)
return content
def generate_tree(self, ip_list):
length = len(ip_list)
if length > 1:
lefts = ip_list[:length / 2]
rights = ip_list[length / 2:]
(ip, content) = lefts[length / 2 - 1]
return inet_aton(ip), content, self.generate_tree(lefts), self.generate_tree(rights)
elif length == 1:
(ip, content) = ip_list[0]
return inet_aton(ip), content, None, None
if __name__ == "__main__":
import sys
ip_tree = IpTree()
print ip_tree.lookup('')

http request

Provide ip Query service GET Request and POST request

@ip_app.route('/api/ip_query', methods=['POST'])
def ip_query():
ip = request.json['ip']
except KeyError as e:
raise InvalidUsage('bad request: no key ip in your request json body. {}'.format(e), status_code=400)
if not is_ip(ip):
raise InvalidUsage('{} is not a ip'.format(ip), status_code=400)
res = ip_tree.lookup(ip)
except Exception as e:
raise InvalidUsage('internal error: {}'.format(e), status_code=500)
if res is not None:
return jsonify(res)
raise InvalidUsage('no ip info in ip db for ip: {}'.format(ip), status_code=501)
@ip_app.route('/api/ip_query', methods=['GET'])
def ip_query_get():
ip = request.values.get('ip')
except ValueError as e:
raise InvalidUsage('bad request: no param ip in your request. {}'.format(e), status_code=400)
if not is_ip(ip):
raise InvalidUsage('{} is not a ip'.format(ip), status_code=400)
res = ip_tree.lookup(ip)
except Exception as e:
raise InvalidUsage('internal error: {}'.format(e), status_code=500)
if res is not None:
return jsonify(res)
raise InvalidUsage('no ip info in ip db for ip: {}'.format(ip), status_code=501)

POST The request needs to contain something like the following in the body of the request json Field

"ip": ""

GET The request is in the form of :

Service deployment

Install dependency library

Dependent Library requirements.txt as follows :


Installation method :pip install -r requirements.txt

To configure supervisor

vim /etc/supervisor/conf.d/ip_query_http_service.conf, The contents are as follows

directory = /root/qk_python/ip_query
command = gunicorn -w10 -b0.0.0.0:8080 ip_query_app:ip_app --worker-class gevent
autostart = true
startsecs = 5
autorestart = true
startretries = 3
user = root

After the content is added , Need to create stdout_logfile and stderr_logfile These two directories , otherwise supervisor Start will report an error . And then update supervisor start-up ip_query_http_service process .

# start-up supervisor
supervisord -c /etc/supervisor/supervisord.conf
# to update supervisor service
supervisorctl update

About supervisor See the resources at the end of the article for common operations .

install nginx

If it is in the form of soft load, it needs to be installed nginx, Compilation and installation nginx See the resources at the end of the article .

To configure nginx

vim /usr/local/nginx/nginx.conf, Modify the configuration file as follows :

#user nobody;
#nginx Number of processes , It is recommended to set equal to CPU Total core number .
worker_processes 4;
#error_log logs/error.log;
#error_log logs/error.log notice;
# Global error log definition type ,[ debug | info | notice | warn | error | crit ]
error_log logs/error.log info;
# Process documents
pid logs/;
# One nginx The maximum number of file descriptors opened by the process , The theoretical value should be the maximum number of open files ( Value of the system ulimit -n) And nginx Divide the number of processes , however nginx Allocation requests are not even , So suggestions and ulimit -n Consistent values for .
worker_rlimit_nofile 65535;
events {
# Refer to the event model linux Next use epoll
use epoll;
# Maximum connections per process ( maximum connection = The number of connections * Number of processes )
worker_connections 65535;
http {
include mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log logs/access.log main;
sendfile on;
#keepalive_timeout 0;
keepalive_timeout 65;
tcp_nopush on; # Prevent network congestion
tcp_nodelay on; # Prevent network congestion
#gzip on;
server {
# The proxy port provided by the convergence service is configured here .
listen 9000;
server_name localhost;
#charset koi8-r;
#access_log logs/host.access.log main;
location / {
# root html;
# index index.html index.htm;
proxy_redirect off;
proxy_set_header X-Real-IP $remote_addr;
# Back end Web The server can go through X-Forwarded-For Get the user's reality IP
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
client_max_body_size 10m; # Maximum number of single file bytes allowed for client requests
client_body_buffer_size 128k; # The maximum number of bytes requested by the buffer agent to buffer the client ,
proxy_buffer_size 4k; # Setting up a proxy server (nginx) Buffer size to hold user header information
proxy_temp_file_write_size 64k; # Set cache folder size , More than that , Will be taken from upstream Server transfer
#error_page 404 /404.html;
# redirect server error pages to the static page /50x.html
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;

Pressure test

Do a stress test , Choosing the right tool is the premise . In the following tools ,jmeter Running on the windows There are many machines , Other tool suggestions run in *nix On the machine .

Stress testing tool selection

Tool name Advantages and disadvantages Suggest
ApacheBench(ab) The command is easy to use , Efficient , The statistical information is perfect , Pressure machine memory pressure is small recommend
locust python To write , Low efficiency , Limited by GIL, You need to write python The test script Not recommended
wrk The command is easy to use , Efficient , Statistical information is refined , Few pits , Report less mistakes Most recommended
jmeter be based on java,Apache Open source , Graphical interface , Easy to operate recommend
webbench Easy to use , But not supported POST request commonly
tsung erlang To write , There are many configuration templates , More complicated Not recommended

All the six tools mentioned above have been used in person , Next choice ab、wrk、jmeter The three tools briefly explain how to install and use , How to use other tools if necessary , On their own google



apt-get install apache2-utils

common options

option meaning
-r When receiving socket Wrong time ab Do not exit
-t The maximum time to send a request
-c Concurrency number , Number of requests constructed at one time
-n Number of requests sent
-p postfile, The specified contains post Data files
-T content-type, Appoint post and put The type of request body when sending a request


test GET request

ab -r -t 120 -c 5000

test POST request

ab -r -t 120 -c 5000 -p /tmp/post_data.txt -T 'application/json'

among /tmp/post_data.txt The content of the document is to be sent -T Data in specified format , Here is json Format

{"ip": ""}



apt-get install libssl-dev
git clone
cd wrk
cp wrk /usr/sbin

common options

option meaning
-c Number of open connections , That is, concurrent number
-d Stress testing time : The maximum time to send a request
-t The number of threads used by the pressure machine
-s Specify the... To load lua Script
--latency Print delay statistics


test GET request

wrk -t10 -c5000 -d120s --latency

test POST request

wrk -t50 -c5000 -d120s --latency -s /tmp/wrk_post.lua

among /tmp/wrk_post.lua The contents of the file are to be loaded lua Script , Appoint post Of path,header,body

request = function()
path = "/api/ip_query"
wrk.headers["Content-Type"] = "application/json"
wrk.body = "{\"ip\":\"\"}"
return wrk.format("POST", path)



install jmeter Installation is required jdk1.8. And then in Apache The official website can download jmeter, Download this



The above picture is from a test bull , Very detailed , complete xmind For file download, see :jmeter- Zhang Bei .xmind

jmeter You can also refer to the resources section at the end of the article : Use Apache Jmeter Do concurrent stress testing

Analysis of stress test results

wrk GET Request pressure test results

root@ubuntu:/tmp# wrk -t10 -c5000 -d60s --latency
Running 1m test @
10 threads and 5000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 897.19ms 322.83ms 1.99s 70.52%
Req/Sec 318.80 206.03 2.14k 68.84%
Latency Distribution
50% 915.29ms
75% 1.11s
90% 1.29s
99% 1.57s
187029 requests in 1.00m, 51.01MB read
Socket errors: connect 0, read 0, write 0, timeout 38
Requests/sec: 3113.27
Transfer/sec: 869.53KB

ab GET Request pressure test results

root@ubuntu:/tmp# ab -r -t 60 -c 5000
This is ApacheBench, Version 2.3 <$Revision: 1796539 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd,
Licensed to The Apache Software Foundation,
Benchmarking (be patient)
Completed 5000 requests
Completed 10000 requests
Completed 15000 requests
Completed 20000 requests
Completed 25000 requests
Completed 30000 requests
Completed 35000 requests
Completed 40000 requests
Completed 45000 requests
Completed 50000 requests
Finished 50000 requests
Server Software: gunicorn/19.7.1
Server Hostname:
Server Port: 8080
Document Path: /api/ip_query?ip=
Document Length: 128 bytes
Concurrency Level: 5000
Time taken for tests: 19.617 seconds
Complete requests: 50000
Failed requests: 2
(Connect: 0, Receive: 0, Length: 1, Exceptions: 1)
Total transferred: 14050000 bytes
HTML transferred: 6400000 bytes
Requests per second: 2548.85 [#/sec] (mean)
Time per request: 1961.668 [ms] (mean)
Time per request: 0.392 [ms] (mean, across all concurrent requests)
Transfer rate: 699.44 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 597 1671.8 4 15500
Processing: 4 224 201.4 173 3013
Waiting: 4 223 200.1 172 2873
Total: 7 821 1694.4 236 15914
Percentage of the requests served within a certain time (ms)
50% 236
66% 383
75% 1049
80% 1155
90% 1476
95% 3295
98% 7347
99% 7551
100% 15914 (longest request)

jmeter GET Request pressure test results

Result analysis

The pressure test results of the above three tools are basically the same ,RPS(Requests per second) It's about 3000 about , At this time, the machine is configured as 4 nucleus 4G Memory , also gunicorn opened 10 individual worker, Memory footprint 3.2G. A single machine has only 3000 Concurrent , For machines with this configuration , Further analysis of the cause is needed . And then we'll get another machine , After load balancing, we can achieve 5000 The above can meet the requirements of use .

Notes on pressure test

Number of open files

During the pressure test, there is a general requirement for the opening number of documents of the pressure machine , More than that 1024 individual open files, Need to increase the linux The number of open files in the system , Increase method :

# Number of open files
ulimit -a
# Modify the number of open files
ulimit -n 500000

SYN Flood attack protection

linux There is a parameter in the system :/etc/sysctl.conf In the configuration file net.ipv4.tcp_syncookies Field . The default value of this field is 1, Indicates that the system will detect SYN Flood attack , And turn on the protection . Therefore, during the pressure measurement , If you send a request for a large amount of repetitive data , Pressure machine SYN After enabling queue overflow SYN cookie, As a result, there will be a large number of request timeout failures . Alibaba cloud's load balancing has SYN Flood attack detection and DDos Attack detection function , So there are two things you need to pay attention to when doing stress testing :

  • When testing, turn off the load balancing machine properly net.ipv4.tcp_syncookies Field
  • When making data, try to avoid a lot of repetitive data , To avoid being identified as an attack .

gunicorn Introduction and tuning

About gunicorn You can refer to the test report for your choice :Python WSGI Server Performance analysis

In selected gunicorn As WSGI server after , You need to choose the appropriate one according to the machine worker The quantity and each of them worker Of worker-class.

worker Quantity selection

every last worker Run as a separate child process , All hold a separate memory data , Every increase or decrease of one worker, There is a significant multiple change in system memory . At first, a single machine gunicorn Turn on 3 individual worker, The system only supports 1000RPS Concurrent . When put worker Expand to 9 After , System support 3000RPS Concurrent . So when there's enough memory , You can add worker Number .

worker-class choice

You can refer to the reference at the end of the article gunicorn Commonly used settings and Gunicorn several Worker class Performance test comparison These two articles .

take gunicorn At startup worker-class From the default sync Change to gevent after , System RPS Double directly .

worker-class worker Number ab The test of RPS
sync 3 573.90
gevent 3 1011.84

gevent rely on :gevent >= 0.13. So you need to use pip install . Corresponding gunicorn start-up flask The applied command needs to be modified to :

gunicorn -w10 -b0.0.0.0:8080 ip_query_app:ip_app --worker-class gevent


improvement ip Database accuracy

Lose efficiency for accuracy : Use a single ip There will be some ip Can't find out the result , And abroad ip Generally, it can only be accurate to the country . It can balance several ip The accuracy and coverage of the database , When you can't find out the exact address information, go to the other ip database .

Increase the concurrency of a single machine

From initiating a request , To WSGI The server processes , To the application interface , To ip Querying each process requires a separate analysis of the amount of executable per second , Then analyze the bottleneck of the system , Fundamentally improve the concurrency of single machine .

Reference material

Remember to give me some praise !

Carefully organized the computer in all directions from the entry 、 Advanced 、 Practical video courses and e-books , Classify according to the catalogue , Always find the learning materials you need , What are you waiting for ? Pay attention to downloads !!!


Not forget for a moment , There must be an echo , Guys, please give me a compliment , Thank you very much .

I'm a bright brother in the workplace ,YY Senior software engineer 、 Four years working experience , The slash programmer who refuses to be the leader of salted fish .

Listen to me , More progress , Program life is a shuttle

If I'm lucky enough to help you , Please order one for me 【 Fabulous 】, Pay attention , If you can give me a little encouragement with your comments , Thank you very much .

A list of articles by Liang Ge in the workplace : More articles


All my articles 、 The answers are all in cooperation with the copyright protection platform , The copyright belongs to brother Liang , unaccredited , Reprint must be investigated !

本文为[Bright brother in the workplace]所创,转载请带上原文链接,感谢

  1. [front end -- JavaScript] knowledge point (IV) -- memory leakage in the project (I)
  2. This mechanism in JS
  3. Vue 3.0 source code learning 1 --- rendering process of components
  4. Learning the realization of canvas and simple drawing
  5. gin里获取http请求过来的参数
  6. vue3的新特性
  7. Get the parameters from HTTP request in gin
  8. New features of vue3
  9. vue-cli 引入腾讯地图(最新 api,rocketmq原理面试
  10. Vue 学习笔记(3,免费Java高级工程师学习资源
  11. Vue 学习笔记(2,Java编程视频教程
  12. Vue cli introduces Tencent maps (the latest API, rocketmq)
  13. Vue learning notes (3, free Java senior engineer learning resources)
  14. Vue learning notes (2, Java programming video tutorial)
  15. 【Vue】—props属性
  16. 【Vue】—创建组件
  17. [Vue] - props attribute
  18. [Vue] - create component
  19. 浅谈vue响应式原理及发布订阅模式和观察者模式
  20. On Vue responsive principle, publish subscribe mode and observer mode
  21. 浅谈vue响应式原理及发布订阅模式和观察者模式
  22. On Vue responsive principle, publish subscribe mode and observer mode
  23. Xiaobai can understand it. It only takes 4 steps to solve the problem of Vue keep alive cache component
  24. Publish, subscribe and observer of design patterns
  25. Summary of common content added in ES6 + (II)
  26. No.8 Vue element admin learning (III) vuex learning and login method analysis
  27. Write a mini webpack project construction tool
  28. Shopping cart (front-end static page preparation)
  29. Introduction to the fluent platform
  30. Webpack5 cache
  31. The difference between drop-down box select option and datalist
  32. CSS review (III)
  33. Node.js学习笔记【七】
  34. Node.js learning notes [VII]
  35. Vue Router根据后台数据加载不同的组件(思考-&gt;实现-&gt;不止于实现)
  36. Vue router loads different components according to background data (thinking - & gt; Implementation - & gt; (more than implementation)
  37. 【JQuery框架,Java编程教程视频下载
  38. [jQuery framework, Java programming tutorial video download
  39. Vue Router根据后台数据加载不同的组件(思考-&gt;实现-&gt;不止于实现)
  40. Vue router loads different components according to background data (thinking - & gt; Implementation - & gt; (more than implementation)
  41. 【Vue,阿里P8大佬亲自教你
  42. 【Vue基础知识总结 5,字节跳动算法工程师面试经验
  43. [Vue, Ali P8 teaches you personally
  44. [Vue basic knowledge summary 5. Interview experience of byte beating Algorithm Engineer
  45. 【问题记录】- 谷歌浏览器 Html生成PDF
  46. [problem record] - PDF generated by Google browser HTML
  47. 【问题记录】- 谷歌浏览器 Html生成PDF
  48. [problem record] - PDF generated by Google browser HTML
  49. 【JavaScript】查漏补缺 —数组中reduce()方法
  50. [JavaScript] leak checking and defect filling - reduce() method in array
  51. 【重识 HTML (3),350道Java面试真题分享
  52. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  53. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  54. [re recognize HTML (3) and share 350 real Java interview questions
  55. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  56. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  57. 【重识 HTML ,nginx面试题阿里
  58. 【重识 HTML (4),ELK原来这么简单
  59. [re recognize HTML, nginx interview questions]
  60. [re recognize HTML (4). Elk is so simple