understand HTTP agreement
The story behind the browser
When we enter a domain name in the browser , What happened behind the scenes ？
First step ： When we enter the domain name , stay DNS The server queries the domain name .
The second step ： Get the corresponding ip Address .
The third step ： Browser according to ip towards web The server communicates and sends requests , And the protocol of communication is HTTP.
Step four ：web The server returns the page content .
Step five ： The browser receives the message data of the returned message , To render a page that we can understand .
for instance ： If we want to call Zhang San , We need to find a person named Zhang San in the address book first , And the name of Zhang San is the domain name , The corresponding mobile phone number is ip. I speak Mandarin during the call , And Zhang San speaks English , There must be no way to communicate , And the common language is HTTP agreement .
What is HTTP？
- Hypertext transfer protocol (Hyper Text Transfer Protocol) It's a kind of
Communication protocol, It allows hypertext markup language (HTML) Document from Web Browser delivered by server to client .
- HTTP It belongs to
Object oriented protocol of application layer, Because of its convenience 、 Fast way , Suitable for distributed hypermedia information system . It's on 1990 in , After several years of use and development , It has been constantly improved and expanded .
WEB and HTTP
- WEB It's based on hypertext and HTTP Of 、 Global 、 Dynamic and interactive 、 Cross platform distribution
Graphic information system.
- Based on the Internet A kind of
Network services, For visitors in Internet Search and browse information on provides a graphical 、 Easy to access intuitive interface , The documents and hyperlinks will Internet The information nodes on the network are organized into a network of interrelated structures .
HTTP The past and present life of the agreement
The founder of the world wide web is tim · berners · Li （Tim Berners-Lee） To put it simply , Is the founder of the contemporary Internet .
stay 1990 year , He published a paper , This paper puts forward the idea of constructing hyperlink document system on the Internet , In this paper, he established three key technologies ：
- URI： Uniform resource identifiers , As the unique identification of resources on the Internet
- HTML： Hypertext markup language , Description hypertext document
- HTTP： Hypertext transfer protocol , To transmit hypertext
These three technologies have directly laid the foundation for today's Web The technology of the world , Tim called it the World Wide Web （World Wide Web）.
therefore ,1991 year ,HTTP 0.9 The birth of .
This version is extremely simple , There is only one command GET. Provisions of the agreement , The server can only respond HTML Format string , Unable to respond to other formats . Server sent , Just shut it down TCP Connect .
Although this edition HTTP The agreement is simple , But as a prototype , It fully verified Web The feasibility of the service .
The following parts are mainly added ：
- Added HEAD/POST Wait for a new way
- Added response status code
- Added version number
- Added Header The concept of the head
- Added Content-Type, Transferring data is no longer limited to text
however HTTP/1.0 It's not a standard , It's just a reference document for documenting existing practices and patterns , Not binding , It's like a memo .
The following parts are mainly added ：
- Added PUT/DELETE/OPITIONS Wait for a new way
- Added cache control and management Cache Control
- Clear connection management , Allow persistent connections Keepalive
- Allow response data to be chunked , Easy to transfer large files （Chunked）
- Mandatory requirements Host head
because HTTP/1.1 Too big and complicated , So in 2014 Another revision was made in , Split into six smaller documents
These six documents add two big requirements ：
- Increased HTTP The security of , For example, use TLS agreement
Give Way HTTP Can support more applications , Four network protocols have been supported ：
- Traditional short connection
- reusable TCP The long connection model of
- Server side PUSH Model
- WebSocket Model
HTTP/1.1 There are two problems ：
- Slow connection , The request is serial , You need to guarantee the order , For example, a web page may have multiple resources
Poor performance ,HTTP/1.1 It's in the form of text , With the help of CPU Of zip Compression reduces network bandwidth , But it costs Front end and back end CPU
2010 year ,Google A new SPDY agreement , And apply it to your own server ,HTTP/2 That is to say SPDY On the basis of , It is mainly characterized by ：
- Using binary transmission , No longer plain text
- Can be in a TCP Multiple concurrent connections HTTP request , Removed HTTP/1.1 Serial request in
- Use HPACK Algorithm to compress the head
- Allow server to actively push data to client
- Enhanced security , be based on TLS agreement
HTTP 2.0 The main problem is the team head blocking problem , in other words , Several HTTP The request is reusing a TCP The connection of , In case of packet loss , The problem is that all requests have to wait for the lost packet to be retransmitted , Even if this bag is not mine HTTP Requested .
Based on this ,Google Invented QUIC（Quick UDP Internet Connections） agreement , It is based on UDP Of .
therefore , It solves the following problems ：
- UDP Is chaotic , So there's no head blocking problem
- QUIC There is a set of their own packet loss retransmission and congestion control protocol
- HTTPS Handshakes usually take six network interactions ,QUIC Direct will TLS and TCP It's merged into three handshakes
through TCP/IP see HTTP
HTTP The agreement is built on
TCP/IP Above the agreement , yes TCP/IP A subset of the protocol .
TCP/IP Protocol family
TCP/IP Protocol is actually a collection of protocols associated with the Internet .
TCP/IP The protocol family is a four layer protocol system , These four layers are ：
application layer 、
Transport layer 、
The network layer 、
Data link layer .
The application layer is usually the application we write , Determine the application services provided to users . Application layer can communicate with transmission layer through system call . Such as ：
HTTP etc. .
The transmission layer provides data transmission function between two computers in the network connection to the application layer through system call .
There are two different protocols in the transport layer ：
The network layer
The network layer is used to process packets that flow over the network , A packet is the smallest unit of data transmitted over a network . This layer specifies the path through ( Transmission route ) Get to each other's computer , And transmit the packet to the other party .
The link layer
The link layer is used to handle the hardware part of the connected network , Including control operating system 、 Hardware device drivers 、NIC(Network Interface Card, network adapter ) And optical fiber and other physical visible parts . The scope of hardware is within the scope of the link layer .
HTTP Data transfer process
When the sender sends data , Data will be transferred from the upper layer to the lower layer , And every layer passed by will be added to the layer of head information .
And when the receiver receives the data , Data will be transferred from the lower layer to the upper layer , The header information of the lower layer will be deleted before transmission .
Transport layer —— TCP Three handshakes
The first handshake： The client sends with SYN Connection request message segment of the flag , Then enter SYN_SEND state , Wait for the server to confirm .
The second handshake： The server receives the client's SYN After message segment , Need to send ACK The message to this SYN Message segment to confirm . meanwhile , And send your own SYN Request information . The server will put the above information into a message segment (SYN+ACK Message segment ) in , Send it to the client , At this time, the server enters SYN_RECV state .
The third handshake： The client receives the SYN+ACK After message segment , Will send to the server ACK Confirm message segment , After this message segment is sent , Both the client and the server enter ESTABLEISHED state , complete TCP Three handshakes .
By the way Four waves ：
When the passive party receives the FIN Message notification , It just means that the active party has no data to send to the passive party . But not all the data of the passive party is completely sent to the active party , So the passive side won't shut down immediately SOCKET, It may also need to send some data to the active party , Send again FIN Message to the active party , Tell the active party to agree to close the connection , So here ACK Message and FIN In most cases, messages are sent separately .
- First wave ：Client Send a FIN, Used to close Client To Server Data transfer of ,Client Get into FIN_WAIT_1 state .
- Second wave ：Server received FIN after , Send a ACK to Client, Confirmation No. is receipt No +1（ And SYN identical , One FIN Occupy a sequence number ）,Server Get into CLOSE_WAIT state .
- Third wave ：Server Send a FIN, Used to close Server To Client Data transfer of ,Server Get into LAST_ACK state .
- Fourth wave ：Client received FIN after ,Client Get into TIME_WAIT state , And then send a ACK to Server, Confirmation No. is receipt No +1,Server Get into CLOSED state , Finish four waves .
DNS Domain name resolution
It has been introduced that HTTP The agreement is closely related to TCP/IP agreement , What follows DNS Service is also related to HTTP The agreement is inextricably linked .
Usually we visit a website , The host name or domain name is used for access . Because relative to IP Address （ A set of pure numbers ）, Domain names are easier to remember . but TCP/IP The protocol uses IP Address to visit , So there must be a mechanism or service to convert a domain name to IP Address ,DNS Service is used to solve this problem ,
DNS Provide domain name to IP Resolution service between addresses .
DNS Domain name resolution process
Browser cache： When a user accesses a domain name through a browser , The browser will first find out whether there is a corresponding domain name in its cache IP Address （ If the domain name has been accessed and the cache has not been cleared ）.
System cache： When there is no domain name corresponding in the browser cache IP It will automatically check the user's computer system Hosts file DNS Whether the cache has the domain name corresponding to IP.
Router cache： When there is no domain name corresponding in browser and system cache IP Then enter the router cache to check , The above three steps are client side DNS cache .
ISP（ Internet service provider ）DNS cache： When you can't find the corresponding domain name in the customer service terminal IP Address , Will enter ISP DNS Query in cache . For example, we use the telecommunication network , It's going to go into telecommunications DNS Search in cache server .
Root domain server： When none of the above is completed , Then enter the root server to query . There is only 13 Root domain name server ,1 Primary root servers , rest 12 Secondary root domain server . When the root domain name receives the request, it will view the regional file record , If not, the top-level domain name within its jurisdiction will be （ Such as .com） The server IP Tell the local DNS The server .
Top-level domain server： After receiving the request, the top-level domain name server will view the regional file record , If not, it will be within its jurisdiction of the main domain name server IP Address to local DNS The server .
Primary domain server： The primary domain name server receives the request and queries its own cache , If not, go to the next level domain name server to find , And repeat this step until the correct record is found .
Save results to cache： The local domain name server saves the returned results to the cache , For the next use , At the same time, the result is fed back to the client , Client through this IP Address and web Server links .
review HTTP Transaction flow
When the client accesses Web Site time , First it will pass DNS The service found the domain name IP Address . Then the browser generates HTTP Ask for and pass TCP/IP The agreement is sent to Web The server .Web After receiving the request, the server will generate the response content according to the request , And pass TCP/IP The protocol is returned to the client .
be familiar with HTTP Protocol structure and communication principle
HTTP Characteristics of the agreement
Support customers / Server mode
- Customer / Server mode works by sending requests from the client to the server , The server responds to the request , And provide corresponding services .
Simple and fast
- When a client requests a service from the server , Just send the request method and path .
- The common request methods are
POST. Each method specifies a different type of client server contact .
- because HTTP Simple protocol , bring HTTP The program size of the server is small , So communication is fast .
- HTTP Allow transfer of any type of data object .
- The type being transmitted is by Content-Type To mark .
There is no connection
- The meaning of no connection is to limit the processing of one request per connection .
- The server finished processing the client request , And received the customer's response , Just disconnect .
- This way you can save transmission time .
- HTTP A protocol is a stateless protocol
- Stateless is a protocol that has no memory for transactions . The lack of state means that the previous information is required for subsequent processing , It must be retransmitted , This can lead to an increase in the amount of data transferred per connection .
Detailed explanation URL And URI The difference and connection
problem ： We type... In the browser Web The address should be called URL still URI？
URI： A compact string is used to represent abstract or physical resources .
URL： yes URI Subset , In addition to identifying a resource , It also provides a main access mechanism to locate the resource .
- URI Can be divided into URL,URN Or at the same time localtors and names A thing of character .
- URN It works like a person's name ,URL It's like a person's address .
- Words and sentences ：URN Identify something ,URL Provides a way to find it .
- URL yes URI A kind of , But not all URI All are URL.
- URI and URL The big difference ” Access mechanism “.
- URN It's part of the unique identity , It's identity information .
give an example ：
http://www.ietf.org/rfc/rfc/2396.txt: yes URL
telnet://192.0.2.16:80: yes URI
HTTP Message structure analysis
HTTP The message headers of can be divided into four categories , Namely ：
- General header
- Request header
- Response header
- Entity header
stay HTTP/1.1 It's standardized in 47 Message header fields .
effect ： Acceptable media types on browser side .
Accept：text/html The type that the browser can accept server postback is text/html, That's what we're talking about html file , If the server cannot return text/html Data of type , The server should return a 406 error (Non Acceptable).
effect ： The browser declares the encoding method it accepts , Usually, the compression method is specified , Whether compression is supported , It's just a compression method (gzip,deflate).
effect ： The browser declares the language it accepts
When the client has Chinese resources on the server , It will be asked to return the Chinese version of the response , When there is no Chinese version , Request to return the English version of the response .
Connection：keep-alive When a web page is opened , Transport between client and server HTTP Data TCP Connection will not close , If the client visits the web page on this server again , Will continue to use this established connection .
Connection：close Representing one Request After completion , Transport between client and server HTTP Data TCP Connection will close , When the client sends again Request, Need to be rebuilt TCP Connect .
effect ： The header domain of the request message is mainly used to specify the requested resource Internet Host and port number , It usually comes from HTTP URL Extracted from . such as ：
When the browser to Web When the server sends the request , Usually with Referer, Tell the server that I was linked from that page , The server can get some information for processing .
effect ： tell HTTP The server , Name and version of the operating system and browser used by the client .
effect ： Describes the media type of the object in the message body .
- application/xhtml+xml ：XHTML Format
- application/xml ：XML data format
- application/atom+xml ： Atom XML The Bureau and the cities
- application/json ： json data format
- application/pdf ：pdf Format
- application/msword ： Word Document format
- application/octet-stream ： Binary stream data
- application/x-www-form-urlencoded： Form submission
HTTP Request method analysis
HTTP/1.1 Common methods ：
- GET Method used to request access has been URI Identified resources .
- After the specified resource is parsed by the server, the response content .
- GET Methods can also be used to submit forms and other data . such as ：
http://localhost/login?name=admin&password=123456, From this paragraph URL in , It's easy to recognize the content of the form submission .
- POST Methods and GET The function is similar to , It is generally used to transmit the body of an entity .
- POST The main purpose of the method is not to get the response body content .
- POST The data won't be in URL It shows that , And then the length problem is overcome .
- The data transmitted from the client to the server replaces the specified document content .
- PUT Methods and POST The biggest difference is ：PUT Idempotent , and POST Not idempotent .
- therefore , More often we will PUT Method as a transport resource .
- Be similar to GET request , But there is no specific content in the response returned , For getting headers .
- Request the server to delete the specified resource .
- Used to query for requests URI Specified resource support methods .
- Echo requests received by server , Mainly used for testing or diagnosis .
- Open a two-way communication channel between the client and the requested resource , It can be used to create tunnels .
HTTP Response state code disassembly
The status code is used to indicate the response status of hypertext transmission protocol of web server 3 Digit code .
Status code classification
frequently-used HTTP Status code
|Status code||English name of status code||describe|
|200||OK||Request succeeded , The response header or data body that the request expects will be returned with this response|
|202||Accepted||Received , Request accepted , But not finished|
|206||Partial Content||Part content , Server successfully processed part GET request|
|301||Moved Parmanently||A permanent move , The requested resource has been permanently moved to the new URI, The return information will include the new URI|
|302||Found||Temporary movement , And 301 be similar . But resources are only temporarily moved . The client should continue to use the original URI|
|400||Bad Request||Syntax error in client request , Server does not understand|
|401||Unauthorized||Request for user authentication|
|403||Forbidden||The server understands the request from the requesting client , But refused to execute the request|
|404||Not Found||The server could not find the corresponding resource according to the client's request （ Webpage ）|
|500||Internal Server Error||Server internal error , Unable to complete request|
|502||Bad Gateway||Server acting as gateway or proxy , An invalid request was received from the remote server|
HTTP State management ：Cookie And Session
Cookie It's actually a short piece of text . The client requests the server , If the server needs to record the user status , Just send a Cookie.
The client browser will take Cookie Save up . When the browser requests the site again , The browser links the requested url with the Cookie Submit to the server together . The server checks the Cookie, To identify user status .
Cookie working principle
- Initiate request
- Server side set-cookie
- Return the server response result
- Client read set-cookie
- The client requests again
- Server check cookie, Return the response result
Session It's another mechanism for recording customer status , Save on the server . When the client browser accesses the server , The server records client information in some form on the server .
The client browser only needs to access again from the Session To find the status of the customer .
Session How it works
preservation Session ID The way
- URL rewrite
- Hide form
Session The validity of the
- Session Timeout time
- Program call HttpSession.invalidate()
- Server process stopped
Cookie And Session
- Different storage locations
- Security （ Different privacy policies ）
- Different expiry dates
- Different pressures on the server
HTTP The characteristics and usage of the protocol
Encoding and decoding
Each coding specification has its own usage scenarios , The font table stores all the words that can be represented in the encoding specification （ such as ： All the Chinese characters are in gbk In the font table of coding standard ）, In a group library table , Every word has its binary number , These binary numbers are stored in the character set . The font table corresponds to the character set one by one , transformation .
Different coding specifications save different space , A shorter binary number is converted into the corresponding address in the character set by a coding method , Then find a character in the font table , Show to user .
Common coding specifications are ：
ASCII code ：
effect ： English and Western European languages .
digit ： ASCII Yes, it is 7 Bit means , Can express 128 individual character ; Its extension uses 8 Who said , Express 256 Characters .
Range ： ASCII from 00 To 7F, Expand from 00 To FF.
An English letter （ Case insensitive ） Take up a byte of space , One Chinese character takes up two bytes of space .
GBK coding standard ：
compatible GB2312、GB13000-1、BIG5 All Chinese characters in the code , Using double byte encoding .
The encoding space is 0x8140 ～ 0xFEFE, share 23940 A code bit .
among GBK1 Area and GBK2 So is the district GB2312 The coding range of . Included 21003 The Chinese characters .
It belongs to single byte encoding , The maximum range of characters that can be represented is 0-255, Apply to English Series .
such as , Letter ’a’ The code of is 0x61=97. iso8859-1 Encoding represents a narrow range of characters , Unable to represent Chinese characters .
Because it's a single byte code , Consistent with the most basic representation of a computer , So a lot of times , Still use iso8859-1 Code to represent .
Think of any other code as iso8859-1 When it comes to decoding , All can be undone , It's also MYSQL The default encoding for .
digit ：8 position .
Range ： from 00 To FF, compatible ASCII Character set . english A byte , No Chinese support
Unicode code ：
effect ： the beautiful Using the same coded word set .
digit ：16 position , Range ： Symbol 6811 individual , Chinese characters 20902 individual , Korean Pinyin 11172 individual , Word formation area 6400 individual , Retain 20249 individual , total 65534 individual .
english Chinese takes two bytes , The same is true of Chinese and English punctuation marks .
URL Coding and decoding of
- URL Is to use ASCII Character set , So if URL It contains non ASCII Characters in the character set , Code it .
- URL Some reserved characters in , Such as "&" Represents the parameter separator , If you want to URL These reserved words are used in , Then you need to code .
- "% code " standard
- Yes URL Of ASCII Non reserved words of character set are not encoded ; Yes URL You need to take the reserved words in ASCII Internal code , And then add "%" The prefix encodes the character ; about URL Middle Africa ASCII Characters need to be taken from Unicode Internal code , And then add "%" The prefix encodes the character .
HTTP Protocol Authentication
Common authentication methods
- BASIC authentication （ Basic authentication ）
- DIGEST authentication （ Abstract authentication ）
- SSL Client side authentication
- FormBase authentication （ Form based authentication ）
What is? BASIC authentication ？
Basic Certification is a relatively simple HTTP authentication , Client through plaintext （Base64 Coding format ） Transfer user name and password to the server for authentication , Usually requires cooperation HTTPS To ensure the security of information transmission .
BASIC The certification process ：
- User name and password plaintext （Base64） transmission , Need to cooperate with HTTPS To ensure the security of information transmission .
- Even if the password is strongly encrypted , The third party can still replay attacks through encrypted user names and passwords .
- There is no protection against agents and intermediate nodes .
- Fake servers can easily cheat Authentication , Induce the user to enter a user name and password .
What is? DIGEST authentication ？
To make up for BASIC The disadvantages of Authentication , from HTTP /1.1 And then there is DIGEST authentication .
DIGEST Certification is also applicable to questions / How to respond , But not like BASIC Authentication that directly relaxes plaintext passwords .
DIGEST The certification process ：
SSL Client side authentication
What is? SSL Client side authentication ？
SSL Client authentication is done by HTTPS Client certificate to complete the authentication way . With client certificate authentication , The server can confirm whether access comes from the client who logs in by itself .
Form based authentication
Form based authentication is not in HTTP As defined in the agreement .
Use by Web Each application implements form based authentication .
adopt Cookie and Session To keep the user's state .
HTTP Long and short connections of
- HTTP The protocol is based on the request / Response mode , So as long as the server gives a response , This time HTTP The request is over .
- HTTP The long connection and the short connection of are essentially TCP Long and short connections .
- HTTP/1.0 in , The default is short connection , in other words , Browser and server every time HTTP operation , Just one connection , Stop when it's over .
- HTTP/1.1 rise , Use long connection by default , To maintain connectivity .
What is long connection ？
A long connection means that after a data transfer , Do not close the connection , Stay connected for a long time . If there is new data to be transferred between two applications , Then reuse the connection directly , There's no need to create a new connection . Like the picture below .
Its advantage is that it can save the overhead of connection establishment and closing in multiple communication , And on the whole , It takes less time to make multiple data transfers . The disadvantage is that it takes extra effort to keep this connection available all the time , Because of the network jitter 、 Server failure will cause this connection to be unavailable , Even because of the firewall . therefore , Generally, we will do it in the following ways “ Keep alive ” Work , Make sure the connection is available when it is in use ：
- utilize TCP Self preservation （Keepalive） Mechanism to achieve , The survivability mechanism will send detection messages regularly to identify whether the other party is reachable . The general default interval is 2 Hours , You can adjust the interval at the operating system level according to your needs , Whether it's linux still windows System .
- The upper application sends a small packet as “ heartbeat ”, Detect whether it can be successfully delivered to the other end . In most cases, the survivability function is used for the server to detect the client scenarios , Once the client is identified as unreachable , Then disconnect , Ease the pressure on the server .
Say one more sentence in advance , It will be more troublesome to use a long connection in a highly available distributed system scenario . Because high availability inevitably involves automatic failover 、 Fault isolation and other mechanisms . It's exactly what happens when something goes wrong , The client needs to find out which connections are not available in time , And reconnect accordingly , Including doing load balancing again .
What is short connection ？
The advantage is that every time you use a new connection , So basically, as long as you can establish a connection , The data can be sent to each other . And even if this transmission is abnormal, you don't have to worry about affecting the subsequent new data transmission , Because then it's a new connection . The disadvantage is that every connection needs to go through three handshakes and four handshakes , It takes a lot of time .
in addition , There is also a fatal drawback to short connections . When you're based on socket When it comes to development , The specific resources contained in these are mainly 5 individual ： Source IP、 Source port 、 Purpose IP、 Destination port 、 agreement , There is a professional name for it “ Quintuples ”. On a computer, as long as the value of the quintuples does not repeat , Then the connection can be established . However, a computer can only be turned on at most 65535 Ports , If communication is needed between two processes now , As a server IP And ports must be fixed , Therefore, in theory, a single client can only be established at the same time with the server 65535 individual socket Connect . If you remove the ports occupied by the operating system and other processes , There will be even less . therefore , Once used improperly , A lot of connections have been established in a very short period of time , It's easy to run out of ports . This will not only lead to their own failure to work properly , It also affects other processes on the same computer .
HTTP The agency of the intermediary
The agent is both a server and a client
The role of agency
- Grab the bag
- Anonymous access
HTTP Gateway to the intermediary
- The gateway can be used as some kind of translator , It abstracts out a way to reach resources . Gateways are the glue between resources and Applications .
- The gateway acts as “ Protocol converter ” Role .
You can see , A proxy is an endpoint of the same protocol , A gateway is an endpoint that uses different protocols .
- Web The gateway is used on one side HTTP agreement , Use another protocol on the other side .< Client protocol >/< Server side protocol >
- （HTTP/） Server side gateway ： adopt HTTP Protocol and client dialogue , Communicate with the server through other protocols .
- （/HTTP） Client gateway ： Talk to the client through other protocols , adopt HTTP The protocol communicates with the server .
Common gateway types
- （HTTP/\*） Server side Web gateway .
When the request flows into the original server , Server side Web The gateway will take the client HTTP The request is converted to another protocol to connect to the server , After getting the resources , The object will be placed in a http The response is sent to the client
- （HTTP/HTTPS） Server side security gateway .
An organization can gateway all input Web Request encryption , To provide additional privacy and security protection . The client can use normal HTTP Browse Web Content , But the gateway encrypts automatically .
- （HTTPS/HTTP） Client security accelerator gateway .
take HTTPS/HTTP Gateway is used more and more as security accelerator , these HTTPS/HTTP The gateway is located in Web Before server , Usually used as an invisible interception gateway or reverse agent . They receive secure HTTPS Traffic , Decrypt the security traffic , And to Web The server sends normal HTTP request . These gateways usually contain dedicated decryption hardware , Decrypt secure traffic in a much more efficient way than the original server , To reduce the load on the original server . These gateways send unencrypted traffic between the gateway and the original server . therefore , Use... With caution , Make sure that the network between the gateway and the original server is secure .
- Resource gateway .
The most common gateway , Application server , Will combine the target server and gateway in one server . The application server is the server-side gateway , With the client through HTTP communicate , And connect to the server-side application . The client is through HTTP Connect to the application server . But the application server doesn't send back files , It's a programming interface that will be requested through an application gateway (Application Programming Interface,API) Send to the application running on the server .
What is? HTTP cache ？
http Caching means : When the client requests resources from the server , Will first arrive at the browser cache , If the browser has “ Ask for resources ” Copy of , You can extract this resource directly from the browser cache instead of from the original server .
common http Caching can only cache get Request response resources , There's nothing to do with other types of responses , So the following request caching all refers to GET request .
Why use HTTP cache ？
1. The client requests the server every time , Waste traffic . 2. The server has to provide a lookup every time , download , If the request user base is large , There is a lot of pressure on the server . 3. The client will render the page after each request , The user experience is poor .
So we can store the requested files for use , For example, use HTTP cache .
HTTP Cache header fields
Cache-Control： request / Response head , Cache control fields .
no-store： All content is not cached .
no-cache： cache , But before browsers use caching , The server will be asked to determine whether the cache resource is up to date .
max-age=x( Unit second )： After the request cache X No more requests in seconds .
s-maxage=x( Unit second )After the proxy server requests the origin cache X No more requests in seconds , Only right CDN Caching works .
public： Clients and proxy servers （CDN） Can be cached .
private： Only the client can cache .
Expires： Response head , Represents the expiration time of a resource , Provided by the server , yes HTTP1.0 Properties of , With the max-age In co-existence , Lower priority .
Last-Modified： Response head , The latest modification time of the resource , The server tells the browser .
if-Modified-Since： Request header , The latest modification time of the resource , The browser tells the server , and Last-Modified It's a pair. , They'll compare .
Etag： Response head , Resource identification , The server tells the browser .
if-None-Match： Request header , Cache resource ID , The browser tells the server （ In fact, it was the server's last time Etag）, and Etag It's a pair. , The two of them will be compared .
HTTP How caching works
Let the server and browser agree on a file expiration date ——Expires(GMT Time format ).
The first request is still ： Suppose we return a js file , And then I'll go back Expires Time .
When it comes to subsequent requests ：
The browser will first compare whether the current time has Greater than Expires, That is to judge whether the document has exceeded the agreed expiration time .
Time is running out , Don't make a request , Use local cache directly .
Time expired , Initiate request , Continue the logic of the first request .
problem ： hypothesis Expires Has expired , The browser requests the server again , but js The file has not changed since last time , So, is this request something we can avoid ？
For example, the appointed time is one week , I haven't changed the appointment time .
solve ： Let the server and browser on the basis of agreed file expiration time , Add a comparison of the latest revision time of the file ——Last-Modified And if-Modified-Since
First request ： Suppose we return a js file , And then I'll go back Expires Time , Return to one more Last-Modified.
Follow up requests ：Expires Be overdue , The server has brought the latest modification time of the file if-Modified-Since( That is, the server returned the last time Last-Modified), Server will if-Modified-Since And Last-Modified I made a comparison .
if-Modified-Since And Last-Modified It's not equal , The server looked up the latest js, And back again Expires With the brand new Last-Modified.
if-Modified-Since And Last-Modified equal , The server returned the status code 304, The document has not been modified , Or use local caching .
problem ： The browser can be modified at will Expires,Expires unstable ,Last-Modified It's only accurate to seconds , Suppose the file is in 1s There is a change in ,Last-Modified No sense of change , In this case, the browser will never get the latest files .
solve ： Let the server and browser in the expiration time Expires+Last-Modified On the basis of , Add a unique contrast mark to the file content ——Etag And If-None-Match. We said Expires Could be tampered with , Here we add another max-age Instead of (cache-control One of the values ).
First request ： Suppose we return a js file , And then I'll go back max-age=60, Return to one more Last-Modified, There's also a file that says Unique identification Etag.
Follow up requests ：60S Inside , Don't make a request , Use local cache directly .（max-age=60 Represents the successful caching of the request 60S No more requests from within , And Expires be similar , At the same time max-age Priority is higher than Expires high ）
60S after , The browser comes with if-Modified-Since And If-None-Match( Last time the server returned Etag) Initiate request , Server comparison If-None-Match And Etag( No comparison if-Modified-Since And Last-Modified 了 ,Etag Priority ratio Last-Modified high .)
If-None-Match And Etag It's not equal , explain js The content has been modified , The server returns the latest js With the brand new Etag And max-age=60 And Last-Modified And Expires
If-None-Match And Etag equal , explain js There is no change in the contents of the document , return 304, Tell the browser to continue using the previous local cache .
problem ： We have been able to accurately compare the differences between server files and local cache files , But in fact, there is a big defect in the evolution of the above schemes , max-age or Expires Not overdue , The browser is unable to actively perceive changes in server files .
Cache improvement scheme
- md5/hash cache
adopt Don't cache html, Add as static file MD5 perhaps hash identification , Solve the problem that the browser can not skip the cache expiration time and actively sense the change of files .
Previous caching between browsers and servers was based on the same directory and the same file name for each request , If the directory or file name changes, it will re request , No matter what the expiration time is, all the fussy stuff , So this is the time for a new solution . It is through webpack To solve , Generate a new file every time you pack .
- CDN cache
CDN It's a content distribution network built on the Internet , Rely on edge servers deployed everywhere , Load balancing through the central platform 、 content distribution 、 Scheduling and other functional modules , Make users get the content they need nearby , Reduce network congestion , Improve user access speed and hit rate .
Suppose there was only one railway station in our city many years ago , Every Spring Festival , The whole city has to go to this railway station to buy tickets , The flow of people and the demand for tickets can be imagined , To alleviate the problem , Different parts of the city , There are train ticket agencies , In this way, people in each district can buy tickets nearby , In this way, the pressure on the railway station has been greatly reduced .
We can call the ticket office in each district CDN node , This is the proxy server mentioned above . In short , We can CDN The temporary site between browser and server , It processes a portion of the browser request for the server , In order to reduce the overall server pressure .
We can CDN The value of is summed up as ：
- CDN In the form of diversion , Greatly reduce the source site access pressure .
- It's like living in a remote area , Every time I buy a ticket, I have to go to the city center , And then there was a sub station in this area , Train tickets can be purchased nearby .CDN It also solved the problem of cross regional visits , It's basically accelerated access .
give an example ：
CDN Edge nodes cache data , When the browser requests ,CDN The request here will be judged and processed instead of the origin station .
First request ： The server hands the file to CDN,CDN To cache , meanwhile CDN Return the file to the browser , The browser itself also caches .
Follow up requests ：
situation 1：CDN The file cached by the node itself has not expired , So it goes back 304 To the browser , I called back this request .
situation 2：CDN The node finds that its cached file is out of date , For the sake of insurance , I made a request to the server ( Origin station ), We got the latest data back , And then it's given to the browser .
Actually speaking of this ,CDN The problem of caching is the same as the previous one http Cache is the same ,CDN Cache time does not expire , Browsers are always blocked , Can't get the latest files .
But we're back http The nature of the cache problem , Cache itself is for static files with low update frequency , secondly ,CDN Caching provides other advantages of shunting and Access Acceleration .
CDN It's like a platform , Yes, you can log in , Update manually CDN The cache , It solves the problem that browser cache cannot be controlled manually .
Browser operation on HTTP The impact of caching
- Browser address bar enter , Or click the jump button , Forward , back off , New window , These behaviors , Will make Expires,max-age take effect , in other words , These operations , The browser will determine the expiration time , Consider whether to make a request again , Of course Last-Modified and Etag It works .
- F5 Refresh the browser , Or use the refresh button on the browser's navigation bar , These kinds of , Will ignore Expires,max-age The limitation of , Force a request ,Last-Modified and Etag It works in this case .
- CTRL+F5 It's a mandatory request , All cache files are not used , Re request all downloads , therefore Expires,max-age,Last-Modified and Etag All the failure .
HTTP Content negotiation mechanism
It refers to the negotiation between the client and the server on the resource content of the response , Then provide the most suitable resources for the client . Content negotiation will respond to the language of the resource , Character set , Coding method, etc. as the basis of judgment .
When the default language of the browser is English or Chinese , Visit the same URI Of Web When the page , Return the corresponding English or Chinese Web page , This mechanism is called content negotiation （Content Negotiation）.
Content negotiation method
- Client driver
Client initiates request , The server sends a list of options , After the client makes a choice, it sends a second request .
advantage ： It's easier to achieve .
shortcoming ： Increased delay , Send at least two requests , The first request for a list of resources , Get a second copy of the selection .
- Server startup （ Widely used ）
The server checks the client's request header set and decides which version of the page to provide .
advantage ： Faster than client driven negotiation .HTTP Provides Q Mechanism （ It means weight ）, Allow servers to approximate match , It also provides vary The first is for the server to inform downstream devices （ Such as proxy server ） How to value a request .
shortcoming ： The first set doesn't match , The server has to guess .
- Transparent negotiation
Some intermediate device ( It's usually a cache agent ) Negotiate on behalf of the client .
advantage ： Exempt from web The negotiation cost of the server , Faster than client driven negotiation .
shortcoming ：HTTP There is no corresponding specification .
Server driven content negotiation - Request header
- Accept： Tell the server what type of media to send
- Accept-Language： Tell the server what language to send
- Accept-Charset： Tells the server what character set to send
- Accept-Encoding： Tell the server what encoding to use
The first set of content negotiation is sent by client to server to exchange preference information , So that the server can choose the most suitable version of the document from the different versions to provide the service
The server matches the client's with the first set of entities listed below Accept The first episode ：
|Accept The first one||Entity first|
Server driven content negotiation – Approximate match
Suppose the client's Accept-Language The designation is Spanish , But there are only English and French versions on the server side , This client wants to return to English first when there is no Spanish . That means , We need a HTTP The mechanism describes preferences in more detail . This mechanism is the mass value （q value ）.
Examples are as follows ：
Accept-Language: en;q=0.5, fr;q=0.0, nl;q=1.0, tr;q=0.0
This first book says ： Users are most willing to accept Dutch （nl）, English is OK, too （en）, Just not willing to accept French （fr） Or Turkish （tr）.
q Values range from 0.0~1.0（1.0 The highest priority ）
understand HTTP and HTTPS
HTTP And HTTPS The concept of
HTTP Is a client browser or other program with Web Application layer communication protocol between servers . stay Internet Upper Web The server is full of hypertext messages , The client needs to go through HTTP The protocol transmits the hypertext information to be accessed .
HTTPS（ Full name ：Hyper Text Transfer Protocol over Secure Socket Layer）, It's about security HTTP passageway , To put it simply HTTP Security version .
So why do you need to use HTTPS To communicate ？
Let's take a look first HTTP The shortcomings of ：
- The content of the communication is clear text , It's not encrypted , Content may be bugged .
- The identity of both sides of the communication is not verified , There may be a disguised identity .
- The integrity of the received message cannot be determined , It may be changed in the middle of the way .
HTTPS Overview of the agreement
HTTPS Think of it as
TLS( Secure transport layer protocol , Formerly known as SSL agreement ).
Whereas HTTP The shortcomings of ,HTTPS stay HTTP That's an increase from ：
- Content encryption
- Identity Authentication
- Data integrity protection
Use in access HTTPS communication Web When it comes to websites , We can see ：
First , What needs to be clarified is HTTPS It's not a new agreement .HTTP The communication interface part of SSL Protocol to implement .
It can be seen that ,SSL Is independent of HTTP The agreement , It can also be used for encryption of other protocols , Such as SMTP etc. .
- Symmetric encryption
Symmetric key encryption refers to the way in which encryption and decryption use the same key , The biggest problem in this way is the key sending problem , That is, how to send the key to the other party safely ;
Why symmetric encryption ？
After one party encrypts the information with a key , Pass the ciphertext on to the other side , The other party decrypts the ciphertext through the same key , Convert to understandable clear text .
- Asymmetric encryption
A problem with shared keys is that , How to send the key to the other party safely . The public key solves this problem well .
Asymmetric key . One is a public key , One is a private key . The public key is open to both sides of communication , Anyone can get , And private ones are not public . The sender uses this public key to encrypt the message , The receiver uses a private key to decrypt . It's hard to crack ciphertext just by ciphertext and public key .
Use public key to bring security at the same time , There are also some hidden problems , How to ensure the authenticity of the public key ？ This problem is accompanied by certification authorities . The authenticity of public key is guaranteed by certificate .
HTTPS Using mixed encryption mechanism
Because the mechanism of public key is relatively complex , Resulting in a relatively slow processing speed . therefore HTTPS Take advantage of both , The mechanism of mixed encryption is adopted . We know , share （ symmetry ） The problem that the key fails to solve is how to send the key to the other party safely . As long as we solve this problem, we can communicate safely . therefore ,HTTPS First, the public key is used to encrypt the shared key . When the shared key is securely transmitted to the other party , Both sides use the way of sharing key to encrypt the message , In order to improve the efficiency of transmission .
HTTP and HTTPS Working process of
HTTP Working process of
HTTP Consists of a request and a response , Is a standard client server model （C/S）.HTTP Protocols are always requests from clients , Server echo response .
- Address resolution . The domain name system DNS Resolve the domain name to get the host's IP Address
- encapsulation HTTP Request packet . The contents of the package include the above parts combined with the information of the machine itself .
- Encapsulated into TCP package , establish TCP Connect （TCP Three handshakes of ）
- Client sends request command . Once the connection is established , The client sends a request to the server
- Server response . The server receives the request , Give corresponding response information
- Server down TCP Connect . commonly Web The server sent the request data to the browser , It's going to close TCP Connect
- Client parsing message , analysis HTML Code , And render
HTTPS Working process of
https When communication , First set up ssl Layer connections , The client will ssl The version number and encryption components are sent to the server side , When the server receives it, it will reply to ssl Match the version number with the encryption component , At the same time CA The certificate and key are sent to the client . The client validates the certificate , After verification, asymmetric encryption is used to negotiate the key of data communication . After negotiation, we can get a consistent symmetric encryption key . Then use symmetric encryption algorithm to do TCP Connect , The follow-up process follows http The process is consistent . Three handshakes , Data exchange , Four waves , End of communication .
The process is as follows ：
- Client side and server side through TCP Establishing a connection .
- The client sends to the server HTTPS request .
- Server responds to requests , And send the digital certificate to the client , Digital certificates include a public key 、 domain name 、 The company applying for the certificate .
- After the client receives the digital certificate from the server , Will verify the validity of the digital certificate .
- If the public key is qualified , Then the client will generate a key for symmetric encryption client key, And use the public key of the server to encrypt the client key asymmetrically .
- The client will initiate HTTPS Second of HTTP request , Send the encrypted client key to the server .
- After the server receives the ciphertext from the client , Will use the private key to decrypt it asymmetrically , Get the client secret key . And use the client secret key for symmetric encryption , Generate ciphertext and send .
- Client receives ciphertext , And use the client secret key to decrypt , Rendering web pages .
SSL It will reduce the efficiency of communication
- The communication rate is reduced
HTTPS except TCP Connect , Send a request , Out of response , It needs to be done SSL signal communication . The total amount of communication information increased .
- The encryption process consumes resources
Each message needs to be encrypted and decrypted . Compared with HTTP Will consume more server resources .
- Certificate cost
If you want to pass HTTPS communicate , You have to buy a certificate from a certification authority .
HTTP And HTTPS The difference between
Safety ,HTTPS It's a secure hypertext protocol , stay HTTP More security on the basis of . Simply speaking ,HTTPS It's using TLS/SSL Encrypted HTTP agreement .
On the application Certificate ,HTTPS You need to use an application Certificate .
On the transport protocol , HTTP It's the hypertext transfer protocol , Plaintext transmission ;HTTPS It's safe TLS/SSL Encrypted transport protocol .
Connection mode and port ,http The connection is simple , It's stateless , The port is 80; https stay http On the basis of ssl Protocol for encrypted transmission , The port is 443.
be based on HTTP Add on protocol for
HTTP The bottleneck of the agreement
HTTP Some of the criteria will be HTTP Performance bottlenecks ：
- Only one request can be sent on a connection .
- The request can only start from the client , The client cannot receive instructions other than responses .
- request / The response header is sent uncompressed , The more the first message, the greater the delay .
- Each time sending the same header to each other causes more waste .
- You can choose any data compression format , Uncompressed send .
terms of settlement
Compared with the previous synchronous communication , Because it only updates a part of the page , The amount of data transferred in the response will be reduced . But it's still unresolved HTTP The problem with the agreement itself .
Comet The response is first put on hold , When there is content update on the server side , Return the response . So once the server has an update , You can immediately feed back to the client .
Google stay 2010 Released in , Its development goal is to solve HTTP Performance bottlenecks , To shorten the Web Page load time .SPDY Not completely rewritten HTTP agreement , But in TCP/IP The application layer and the transport layer operate through the new session layer . At the same time, security issues ,SPDY Stipulate the use of SSL.
Use SPDY after ,HTTP agreement Additional features gained ：
- Multiplexing streams : Single TCP Connect , You can handle multiple HTTP request .
- Give request priority : Assign priority to requests one by one .
- Compress HTTP The first one : Reduce the number of packets generated by communication and the number of bytes sent .
- Push function : Support the server to actively push data to the client .
- Server prompt function : The server can actively prompt the client to request the required resources .
use SPDY when ,Web The server should be changed accordingly ;SPDY It's really an effective way to eliminate HTTP Bottleneck Technology , But many Web The problem with websites is not just that HTTP Bottleneck caused by .
WebSocket It's based on HTTP Based on the agreement , So the initiator of the connection is still the client , And once established WebSocket Communication connection , No matter the server or the client , Either party can send a message directly to the other party .
WebScoket The main features of the agreement ：
Push function ： Support the push function of server push data to client .
Reduce traffic ： Just set up WebSocket Connect , Just want to be connected all the time , and HTTP comparison , Not only does the total cost per connection decrease , And because of WebSocket The first message is very small , Traffic is also reduced accordingly .
In order to achieve WebSocket signal communication , stay HTTP After the connection is established , It needs to be done once “ handshake ”（Handshaking） Steps for .
WebDAV（Web-based Distributed Authoring and Versioning, Distributed creation and version control based on the world wide web ） It's a yes Web The content on the server is copied directly 、 Distributed file system for editing and other operations , It also has file creator Management 、 In the process of file editing, it is forbidden to lock other users' content , As well as the version control function of document content modification .