Reptiles are also called Web crawler , So before we talk about reptiles , It is necessary for us to know what is The Internet ? A network is made up of several nodes and links connecting them , And then the huge network between the network and the network is called Internet , And what we're going to talk about today HTTP(HyperText Transfer Protocol Hypertext transfer protocol ) It is the most widely used network protocol on the Internet , It's by the World Wide Web Association (World Wide Web Consortium) Make a release .

The article is mainly about one time HTTP Ask for the whole process to explain (DNS Analysis doesn't talk about ):HTTP origin 、TCP/IP agreement 、 establish TCP Connect 、 Client request 、 Server response 、 To break off TCP Connect , At the end of the article, I also give a message to HTTP Related knowledge . The article is longer , It is recommended to collect or read after forwarding !
One 、 brief introduction

1. origin

Today we can enjoy the Internet , Thanks to a computer scientist Tim · berners · Li Conception .1991 year 8 month 6 Japan , Tim · berners · Li is at the European Institute of particle physics (CERN) Of NeXT On computer , Officially open the world's first Web Website ( ), Set up the basic concept and technology system of Internet , This opens the prologue of the Internet information age .
berners · Li's proposal contains the basic concept of the Internet and gradually establishes all the necessary tools :

  1. Put forward HTTP (Hypertext Transfer Protocol) Hypertext transfer protocol , Allow users to access resources by clicking on hyperlinks ;
  2. Put forward to use HTML Hypertext markup language (Hypertext Markup Language) As a standard for creating web pages ;
  3. Created a unified resource locator URL (Uniform Resource Locator) As a website address system , That's what we use today http://www URL Format ;
  4. Create the first one Web browser , It's called the world wide web browser , This is also a Web Editor ;
  5. Create the first one Web The server And the first to describe the project itself Web page .

2. characteristic

HTTP There are five characteristics of the agreement :

  1. Support customers / Server mode .
  2. Simple and fast : When a client requests a service from the server , Just send the request method and path .
  3. flexible :HTTP Allow transfer of any type of data object . The type being transmitted is by Content-Type(Content-Type yes HTTP The identity used in the package to represent the content type ) To mark .
  4. There is no connection : Connectionless means that you are limited to one request per connection . The server completes the client's request , And received the customer's response , disconnect . This way you can save transmission time .
  5. No state : Stateless is a protocol that has no memory for transactions , The server does not know what the client status is . That is, we send... To the server HTTP After the request , The server according to the request , Will send us data , however , Finished sending , No information will be recorded (Cookie and Session Give birth to , Later on ).

Two 、TCP/IP agreement

We often hear a saying that :HTTP It's based on TCP/IP Protocol cluster to transfer data .

How to understand the above sentence ? Let's see TCP/IP Four layer model Will understand the .
From the picture above we can see clearly HTTP The transport layer protocol used is TCP agreement , And the network layer uses IP agreement ( Of course, there are many other protocols ), So HTTP It's based on TCP/IP Protocol cluster to transfer data .

Also we can see ping Walking ICMP agreement , That's why sometimes we drive vps You can go online , however ping google but ping The reason why it doesn't work , Because it's a different agreement .

that TCP/IP How does the protocol cluster work , Let's take a look at the picture below :
We can see that the data sending end is a layer by layer encapsulation of data , The data receiving end is unpacked layer by layer , Finally, the application layer gets the data .

3、 ... and 、 establish TCP Connect

We know TCP/IP After the working principle of protocol cluster , Let's see HTTP How to establish a connection .

1.TCP Baotou information

We talked about HTTP It's based on TCP/IP Protocol cluster to transfer data , So this HTTP To establish a connection is to establish TCP Connect ,TCP How to establish a connection , Let's see TCP Package information structure .
TCP Packet =TCP Header information +TCP Data body , And in the TCP The header contains 6 Control bit ( In the red box above ), These six signs stand for TCP The state of the connection :

  1. URG: Emergency data (urgent data)— This is an urgent message
  2. ACK: Acknowledge receipt of
  3. PSH: Prompt the receiving application to immediately start from tcp Accept data read in buffer
  4. RST: Ask the other party to re-establish the connection
  5. SYN: Indicates a request to establish a connection
  6. FIN: Indicates that the local terminal of the other party is about to close the connection

2. Establish connection process

I understand TCP After Baotou information , We can have a formal look at TCP Three handshakes to establish the connection .
Three handshakes :

  1. The bit code sent by the client is syn=1, Randomly generated seq number=1234567 Of packets to the server , Server by SYN=1 Know that the client requires online ( client : I want to connect you )
  2. The server should confirm the online information after receiving the request , towards A send out ack number=( Client's seq+1),syn=1,ack=1, Randomly generated seq=7654321 My bag ( The server : well , You can join us )
  3. Check after the client receives ack number Whether it is right , First sent seq number+1, Bit code ack Is it 1, If correct , The client will send again ack number=( Server's seq+1),ack=1, The server receives it and confirms seq Value and ack=1 Connection established successfully .( client : well , I'm coming )

interviewer : Why? http Establishing a connection requires three handshakes , Not twice or four times
answer : Three is the minimum number of safe times , It's not safe twice , Waste resources four times

Four 、 Client request

After the client is connected to the server , The client can start to request resources from the server , You can start sending HTTP Request the .

1.HTTP Request message structure

We talked about that before TCP Packet =TCP Header information +TCP Data body ,TCP We've already talked about the head message , For now TCP Data body , That's our HTTP Request message .
2.HTTP Request instance

Take a look at the actual HTTP Request example :
  1. ① It's the request method ,HTTP/1.1 The defined request methods are 8 Kind of :GET、POST、PUT、DELETE、PATCH、HEAD、OPTIONS、TRACE, The two most common GET and POST, If it is RESTful Interface is usually used GET、POST、DELETE、PUT
  2. ② For the request URL Address , It's with the header Host Properties make up a complete request URL
  3. ③ Is the agreement name and version number
  4. ④ yes HTTP The header of , The header contains several attributes , The format is “ Property name : Property value ”, The server obtains the information of the client
  5. ⑤ It's newspaper style , It passes component values in a page form through param1=value1&param2=value2 The key value pair of is encoded as a format string , It carries data for multiple request parameters . Not only can newspaper style pass request parameters , request URL It can also be done through something like “/chapter15/user.html? param1=value1&param2=value2” Pass request parameters in .

There are many request header parameters , Brother pig doesn't explain one by one , Only two low-level anti pickpocketing parameters :

  1. User-Agent: Name and version of the operating system and browser used by the client , Some sites will restrict the requested browser
  2. Referer: Address of previous web page , Indicates where the request came from , Some sites limit the source of requests

5、 ... and 、 Server response

The server needs to respond after receiving the client's request and return it to the client , and HTTP The response message structure is consistent with the request structure .

1.HTTP Response message structure

2.HTTP Response examples

3. Response status code

In response message, we focus on : Response status code of the server , It's easy to ask , Now, brother pig only lists the categories , Detailed status code self access to the Internet to find out .
6、 ... and 、 disconnect

After the server responds , A conversation ends , Will the connection be disconnected at this time ?

1. Long short connection

We need to distinguish between HTTP edition :

  • stay HTTP/1.0 At version time , The client and the server complete a request / After responding , Will build the TCP Connection is broken , The next time you ask for it, you have to rebuild TCP Connect , This is also called Short connection
  • stay HTTP1.0 Only half a year after the launch (1997 year 1 month ) ,HTTP/1.1 Release and bring a new feature : Complete a request between the client and the server / After responding , Allow to keep opening TCP Connect , This means that the next request will use this directly TCP Connect without having to re handshake to establish a new connection , This is also called A long connection

Be careful : A long connection is one time TCP Connections are allowed multiple times HTTP conversation ,HTTP It's always a request / Respond to , End of session ,HTTP There is no such thing as a long connection .

As early as 1999 year HTTP1.1 To popularize , So now the browser will carry a parameter in the request header :Connection:keep-alive, This means that the browser requires a long connection to the server , The server can also set whether it is willing to establish a long connection .

2. Advantages and disadvantages of long connection

For servers, there are advantages and disadvantages to establishing long connections :

  • advantage : When there are a lot of static resources in the website ( picture 、css、js etc. ) You can turn on long connections , This is also a few pictures can be passed once TCP Connect to send .
  • shortcoming : When the client requests once, it is not requesting , But the server has a long connection and the resource is occupied , This is a serious waste of resources .

So whether to turn on the long connection , Long connection time needs to be set reasonably according to the website itself .

ps: Don't look down on this one TCP Connect , In one client HTTP In the complete request (DNS Addressing 、 establish TCP Connect 、 request 、 wait for 、 Parse web pages 、 To break off TCP Connect ) establish TCP Connection takes up a lot of time .

3. Disconnection process

In establishment TCP Three handshakes when connecting , And disconnect TCP The connection is four waves !
Let's talk about TCP/IP At the time of the agreement, we said the sign bit :FIN Indicates that the local terminal of the other party is about to close the connection ,** Why do you need to wave four times to disconnect ?** Here is the homework for you , Can give your understanding in the message , See if it's right .

7、 ... and 、 Digression

1. Interview questions are required :http Three handshakes 、 Four waves

interviewer : Why does it take three handshakes to establish a connection and four waves to close it . After class homework for everyone , Give your opinion in the message !


HTTP/1.1 Has served us 20 year , and HTTP/2.0 Actually in 2015 Was released , But it hasn't been popularized yet , About HTTP/2.0 New features you can also go to the Internet to see relevant information


because http Slow response The request header is bulky Etc , So in the age of microservice , Everyone uses rpc To invoke the service ,rpc Interested in learning related concepts online .


http There are two other big disadvantages Plaintext And There is no guarantee of integrity , So now it's going to be HTTPS Instead of ,HTTPS Knowledge pig elder brother next period will explain for everybody .
