Source string Source string

English Actions
In the late 1980s, high energy physicists working at CERN_ had to efficiently exchange documents about their ongoing and planned experiments. `Tim Berners-Lee`_ evaluated several of the documents sharing techniques that were available at that time [B1989]_. As none of the existing solutions met CERN's requirements, they chose to develop a completely new document sharing system. This system was initially called the `mesh`. It was quickly renamed the `world wide web`. The starting point for the `world wide web` are hypertext documents. An hypertext document is a document that contains references (hyperlinks) to other documents that the reader can immediately access. Hypertext was not invented for the world wide web. The idea of hypertext documents was proposed in 1945 [Bush1945]_ and the first experiments were done during the 1960s [Nelson1965]_ [Myers1998]_ . Compared to the hypertext documents that were used in the late 1980s, the main innovation introduced by the `world wide web` was to allow hyperlinks to reference documents stored on different remote machines.
A document sharing system such as the `world wide web` is composed of three important parts.
A standardized addressing scheme that unambiguously identifies documents
A standard document format : the `HyperText Markup Language <>`_
A standardized protocol to efficiently retrieve the documents stored on a server
Open standards and open implementations
Open standards play a key role in the success of the `world wide web` as we know it today. Without open standards, the world wide web would have never reached its current size. In addition to open standards, another important factor for the success of the web was the availability of open and efficient implementations of these standards. When CERN started to work on the `web`, their objective was to build a running system that could be used by physicists. They developed open-source implementations of the `first web servers <>`_ and `web clients <>`_. These open-source implementations were powerful and could be used as is, by institutions willing to share information. They were also extended by other developers who contributed to new features. For example, the NCSA_ added support for images in their `Mosaic browser <>`_ that was eventually used to create `Netscape Communications <>`_ and the first commercial browsers and servers.
The first components of the `world wide web` are the Uniform Resource Identifiers (URI), defined in :rfc:`3986`. A URI is a character string that unambiguously identifies a resource on the world wide web. Here is a subset of the BNF for URIs ::
The first component of a URI is its `scheme`. A `scheme` can be seen as a selector, indicating the meaning of the fields after it. In practice, the scheme often identifies the application-layer protocol that must be used by the client to retrieve the document, but it is not always the case. Some schemes do not imply a protocol at all and some do not indicate a retrievable document [#furiretrieve]_. The most frequent schemes are `http` and `https`. We focus on `http` in this section. A URI scheme can be defined for almost any application layer protocol [#furilist]_. The characters `:` and `//` follow the `scheme` of any URI.
The second part of the URI is the `authority`. With retrievable URIs, this includes the DNS name or the IP address of the server where the document can be retrieved using the protocol specified via the `scheme`. This name can be preceded by some information about the user (e.g. a user name) who is requesting the information. Earlier definitions of the URI allowed the specification of a user name and a password before the `@` character (:rfc:`1738`), but this is now deprecated as placing a password inside a URI is insecure. The host name can be followed by the semicolon character and a port number. A default port number is defined for some protocols and the port number should only be included in the URI if a non-default port number is used (for other protocols, techniques like service DNS records can used).
The third part of the URI is the path to the document. This path is structured as filenames on a Unix host (but it does not imply that the files are indeed stored this way on the server). If the path is not specified, the server will return a default document. The last two optional parts of the URI are used to provide a query parameter and indicate a specific part (e.g. a section in an article) of the requested document. Sample URIs are shown below.
The first URI corresponds to a document named `rfc3986.html` that is stored on the server named `` and can be accessed by using the `http` protocol on its default port. The second URI corresponds to an email message, with subject `current-issue`, that will be sent to user `infobot` in domain ``. The `mailto:` URI scheme is defined in :rfc:`2368`. The third URI references the portion `BaseHTTPServer.BaseHTTPRequestHandler` of the document `basehttpserver.html` that is stored in the `library` directory on the `` server. This document can be retrieved by using the `http` protocol. The query parameter `highlight=http` is associated to this URI. The fourth example is a server that operates the telnet_ protocol, uses IPv6 address `2001:db8:3080:3::2` and is reachable on port 2323. The last URI is somewhat special. Most users will assume that it corresponds to a document stored on the `` server. However, to parse this URI, it is important to remember that the `@` character is used to separate the user name from the host name in the authorization part of a URI. This implies that the URI points to a document named `top_story.htm` on the host having IPv4 address ``. The document will be retrieved by using the `ftp` protocol with the user name set to ``.
The second component of the `word wide web` is the HyperText Markup Language (HTML). HTML defines the format of the documents that are exchanged on the `web`. The `first version of HTML <>`_ was derived from the Standard Generalized Markup Language (SGML) that was standardized in 1986 by :term:`ISO`. SGML_ was designed to support large documents maintained by government, law firms or aerospace companies that must be shared efficiently in a machine-readable manner. These industries require documents to remain readable and editable for tens of years and insisted on a standardized format supported by multiple vendors. Today, SGML_ is no longer widely used beyond specific applications, but its descendants including :term:`HTML` and :term:`XML` are now widespread.
A markup language is a structured way of adding annotations about the formatting of the document within the document itself. Example markup languages include troff_, which is used to write the Unix man pages or Latex_. HTML uses markers to annotate text and a document is composed of `HTML elements`. Each element is usually composed of three parts: a start tag that potentially includes some specific attributes, some text (often including other elements), and an end tag. A HTML tag is a keyword enclosed in angle brackets. The generic form of an HTML element is ::
More complex HTML elements can also include optional attributes in the start tag ::
The HTML document shown below is composed of two parts: a header, delineated by the `<head>` and `</head>` markers, and a body (between the `<body>` and `</body>` markers). In the example below, the header only contains a title, but other types of information can be included in the header. The body contains an image, some text and a list with three hyperlinks. The image is included in the web page by indicating its URI between brackets inside the `<img src="...">` marker. It is important to note that the image can reside on any server. The client will automatically download it when rendering the web page. The `<h1>...</h1>` marker is used to specify the first level of headings. The `<ul>` marker indicates an unnumbered list while the `<li>` marker indicates a list item. The `<a href="URI">text</a>` indicates a hyperlink. The `text` will be underlined in the rendered web page and the client will fetch the specified URI when the user clicks on the link.
A simple HTML page
Over the years, various extensions to HTML have been proposed and implemented. These include the specification of style sheets that adjust the layout of the document and the possibility of adding or referencing javascript code. Additional details about the various extensions to HTML may be found in the `official specifications <>`_ maintained by W3C_.
The third component of the `world wide web` is the HyperText Transfer Protocol (HTTP). HTTP is a text-based protocol like SMTP. The client sends a request and the server returns a response. HTTP runs above the bytestream service and HTTP servers listen by default on port `80`. The design of HTTP has largely been inspired by the Internet email protocols. Each HTTP request contains three parts :
a `method`, that indicates the type of request, a URI, and the version of the HTTP protocol used by the client
a `header`, that is used by the client to specify optional parameters for the request. An empty line is used to mark the end of the header
an optional MIME document attached to the request
The response sent by the server also contains three parts :
a `status line` , that indicates whether the request was successful or not
a `header`, that contains additional information about the response. The response header ends with an empty line.
a MIME document
Several types of method can be used in HTTP requests. The three most important ones are :
the `GET` method is the most popular one. It is used to retrieve a document from a server. The `GET` method is encoded as `GET` followed by the path of the URI of the requested document and the version of HTTP used by the client. For example, to retrieve the URI, a client must open a TCP connection on port `80` with host `` and send a HTTP request containing the following line:
the `HEAD` method is a variant of the `GET` method that allows the retrieval of the header lines for a given URI without retrieving the entire document. It can be used by a client to verify if a document exists, for instance.
the `POST` method can be used by a client to send a document to a server. The document is attached to the HTTP request as a MIME document.
HTTP clients and servers can include different HTTP headers in HTTP requests and responses. Each HTTP header is encoded as a single ASCII-line terminated by `CR` and `LF`. Several of these headers are briefly described below. A detailed discussion of the standard headers may be found in :rfc:`1945`. The MIME headers can appear in both HTTP requests and HTTP responses.


No matching activity found.
Browse all component changes

Things to check


The string uses three dots (...) instead of an ellipsis character (…)



English English
No related strings found in the glossary.

String information

Source string location
String age
2 years ago
Source string age
2 years ago
Translation file
locale/pot/protocols/http.pot, string 22