HTTP protocol – Useful blog

What is the HTTP protocol

The HTTP protocol is an abbreviation for Hyper Text Transfer Protocol, which is a transfer protocol used to transfer hypertext from World Wide Web servers to local browsers.

HTTP is a TCP/IP communication protocol used to transmit data (HTML files, image files, query results, etc.).

HTTP is an object-oriented protocol belonging to the application layer, suitable for distributed hypermedia information systems due to its concise and fast approach. It was proposed in 1990 and has been continuously improved and expanded after several years of use and development. Currently, the sixth version of HTTP/1.0 is being used in WWW, and the standardization work for HTTP/1.1 is underway. Additionally, suggestions for HTTP-NG (Next Generation of HTTP) have been proposed.

The HTTP protocol works on the client server architecture. The browser, as an HTTP client, sends all requests to the HTTP server, i.e. the web server, through a URL. The web server sends response information to the client based on the received request. If there is a request, there must be a response.

The Five Characteristics of HTTP

Support client/server mode.

Simple and fast: When a client requests a service from a server, it only needs to transmit the request method and path. The commonly used request methods include GET, HEAD, and POST. Each method specifies a different type of contact between the client and the server. Due to the simplicity of the HTTP protocol, the program size of the HTTP server is small, resulting in fast communication speed.

Flexibility: HTTP allows for the transmission of any type of data object. The type being transmitted is marked by Content Type.

No connection: The meaning of no connection is to limit each connection to processing only one request. After the server processes the customer’s request and receives a response from the customer, it disconnects. This method can save transmission time. The reason for doing this early on was to request fewer resources and pursue speed. Later, long connections were implemented through Connection: Keep Live

Stateless: The HTTP protocol is a stateless protocol. Statelessness refers to the protocol’s lack of memory ability for transaction processing. Lack of status means that if subsequent processing requires previous information, it must be retransmitted, which may result in an increase in the amount of data transmitted each time a connection is made. On the other hand, when the server does not require previous information, its response is faster.

HTTP URL

HTTP uses Uniform Resource Identifiers (URI) to transmit data and establish connections. A URL is a special type of URI that contains sufficient information to find a resource

URL, also known as UniformResourceLocator in Chinese, is an address used on the internet to identify a resource. Taking the following URL as an example, let’s introduce the various components of a regular URL

From the above URL, it can be seen that a complete URL includes the following parts:

Protocol section: The protocol section of this URL is “http:”, which indicates that the webpage is using the HTTP protocol. Multiple protocols can be used in the Internet, such as HTTP, FTP, etc. In this case, the HTTP protocol is used. The “//” after “HTTP” is a delimiter

Domain Name Section: The domain name section of this URL is “www.aspxfans. com”. In a URL, an IP address can also be used as the domain name

Port section: Following the domain name is the port, and “:” is used as a separator between the domain name and the port. Port is not a necessary part of a URL. If the port part is omitted, the default port will be used

Virtual directory section: Starting from the first “/” after the domain name and ending with the last “/”, it is the virtual directory section. A virtual directory is also not a necessary part of a URL. The virtual directory in this example is “/news/”

File Name Part: Starting from the last “/” after the domain name and ending with “?” is the file name part. If there is no “?”, starting from the last “/” after the domain name and ending with “#” is the file name part. If there are no “?” and “#”, starting from the last “/” after the domain name and ending with “#” is the file name part. The file name in this example is “index. asp.”. The file name part is also not a mandatory part of a URL. If this part is omitted, the default file name will be used

Anchor section: From “#” to the end, it is the anchor section. The anchor part in this example is “name”. The anchor part is also not a necessary part of the URL

Parameter section: The section between “?” and “#” is the parameter section, also known as the search section or query section. The parameter section in this example is “boardID=5&ID=24618&page=1”. Multiple parameters can be allowed, with “&” used as a separator between parameters.

The difference between URL and URI

URI is a uniform resource identifier used to uniquely identify a resource.

Each type of resource available on the web, such as HTML documents, images, video clips, programs, etc., is located by a URI, which generally consists of three parts: ① naming mechanism for accessing resources, ② host name for storing resources, ③ name of the resource itself, represented by a path, emphasizing the resource.

A URL is a uniform resource locator, which is a specific URI that can be used to identify a resource and also indicates how to locate it.

URL is a string used on the Internet to describe information resources, mainly used in various WWW client and server programs, especially the famous Mosaic. Using URLs can describe various information resources in a unified format, including files, server addresses, and directories.

A URL generally consists of three parts: ① protocol (or service method) ② IP address of the host containing the resource (sometimes including port number) ③ specific address of the host resource. Such as directories and file names

URN, uniform resource name, is a resource naming convention that identifies resources by name, such as mailto: java-net@java.mjj.com .

URI is an abstract, high-level concept that defines a unified resource identifier, while URL and URN are specific ways of resource identification. URL and URN are both types of URI. Generally speaking, every URL is a URI, but not necessarily every URI is a URL. This is because the URI also includes a subclass, Uniform Resource Name (URN), which names resources but does not specify how to locate them. The mailto, news, and isbn URI above are all examples of URNs.

In Java’s URI, a URI instance can represent absolute or relative, as long as it conforms to the syntax rules of the URI. The URL class not only conforms to semantics, but also contains information to locate the resource, so it cannot be relative. In the Java class library, the URI class does not contain any methods for accessing resources, and its only function is to parse. On the contrary, URL classes can open a stream that reaches the resource.