Reptiles are also called Web crawler , So before we talk about reptiles , It is necessary for us to know what is The Internet ？ A network is made up of several nodes and links connecting them , And then the huge network between the network and the network is called Internet , And what we're going to talk about today HTTP（HyperText Transfer Protocol Hypertext transfer protocol ) It is the most widely used network protocol on the Internet , It's by the World Wide Web Association （World Wide Web Consortium） Make a release .
The article is mainly about one time HTTP Ask for the whole process to explain (DNS Analysis doesn't talk about )：HTTP origin 、TCP/IP agreement 、 establish TCP Connect 、 Client request 、 Server response 、 To break off TCP Connect , At the end of the article, I also give a message to HTTP Related knowledge . The article is longer , It is recommended to collect or read after forwarding ！
Today we can enjoy the Internet , Thanks to a computer scientist Tim · berners · Li Conception .1991 year 8 month 6 Japan , Tim · berners · Li is at the European Institute of particle physics （CERN） Of NeXT On computer , Officially open the world's first Web Website （http://info.cern.ch ）, Set up the basic concept and technology system of Internet , This opens the prologue of the Internet information age .
berners · Li's proposal contains the basic concept of the Internet and gradually establishes all the necessary tools ：
HTTP There are five characteristics of the agreement ：
We often hear a saying that ：HTTP It's based on TCP/IP Protocol cluster to transfer data .
How to understand the above sentence ？ Let's see TCP/IP Four layer model Will understand the .
From the picture above we can see clearly HTTP The transport layer protocol used is TCP agreement , And the network layer uses IP agreement （ Of course, there are many other protocols ）, So HTTP It's based on TCP/IP Protocol cluster to transfer data .
Also we can see ping Walking ICMP agreement , That's why sometimes we drive vps You can go online , however ping google but ping The reason why it doesn't work , Because it's a different agreement .
that TCP/IP How does the protocol cluster work , Let's take a look at the picture below ：
We can see that the data sending end is a layer by layer encapsulation of data , The data receiving end is unpacked layer by layer , Finally, the application layer gets the data .
We know TCP/IP After the working principle of protocol cluster , Let's see HTTP How to establish a connection .
We talked about HTTP It's based on TCP/IP Protocol cluster to transfer data , So this HTTP To establish a connection is to establish TCP Connect ,TCP How to establish a connection , Let's see TCP Package information structure .
TCP Packet =TCP Header information +TCP Data body , And in the TCP The header contains 6 Control bit （ In the red box above ）, These six signs stand for TCP The state of the connection ：
I understand TCP After Baotou information , We can have a formal look at TCP Three handshakes to establish the connection .
Three handshakes ：
interviewer ： Why? http Establishing a connection requires three handshakes , Not twice or four times
answer ： Three is the minimum number of safe times , It's not safe twice , Waste resources four times
After the client is connected to the server , The client can start to request resources from the server , You can start sending HTTP Request the .
We talked about that before TCP Packet =TCP Header information +TCP Data body ,TCP We've already talked about the head message , For now TCP Data body , That's our HTTP Request message .
Take a look at the actual HTTP Request example ：
There are many request header parameters , Brother pig doesn't explain one by one , Only two low-level anti pickpocketing parameters ：
The server needs to respond after receiving the client's request and return it to the client , and HTTP The response message structure is consistent with the request structure .
In response message, we focus on ： Response status code of the server , It's easy to ask , Now, brother pig only lists the categories , Detailed status code self access to the Internet to find out .
After the server responds , A conversation ends , Will the connection be disconnected at this time ？
We need to distinguish between HTTP edition ：
Be careful ： A long connection is one time TCP Connections are allowed multiple times HTTP conversation ,HTTP It's always a request / Respond to , End of session ,HTTP There is no such thing as a long connection .
As early as 1999 year HTTP1.1 To popularize , So now the browser will carry a parameter in the request header ：Connection:keep-alive, This means that the browser requires a long connection to the server , The server can also set whether it is willing to establish a long connection .
For servers, there are advantages and disadvantages to establishing long connections ：
So whether to turn on the long connection , Long connection time needs to be set reasonably according to the website itself .
ps： Don't look down on this one TCP Connect , In one client HTTP In the complete request （DNS Addressing 、 establish TCP Connect 、 request 、 wait for 、 Parse web pages 、 To break off TCP Connect ） establish TCP Connection takes up a lot of time .
In establishment TCP Three handshakes when connecting , And disconnect TCP The connection is four waves ！
Let's talk about TCP/IP At the time of the agreement, we said the sign bit ：FIN Indicates that the local terminal of the other party is about to close the connection ,** Why do you need to wave four times to disconnect ？** Here is the homework for you , Can give your understanding in the message , See if it's right .
interviewer ： Why does it take three handshakes to establish a connection and four waves to close it . After class homework for everyone , Give your opinion in the message ！
HTTP/1.1 Has served us 20 year , and HTTP/2.0 Actually in 2015 Was released , But it hasn't been popularized yet , About HTTP/2.0 New features you can also go to the Internet to see relevant information
because http Slow response 、 The request header is bulky Etc , So in the age of microservice , Everyone uses rpc To invoke the service ,rpc Interested in learning related concepts online .
http There are two other big disadvantages Plaintext And There is no guarantee of integrity , So now it's going to be HTTPS Instead of ,HTTPS Knowledge pig elder brother next period will explain for everybody .