Another Look at Web Publishing. Part I: Host Headers without SSL

The purpose of this series of blog entries is to describe some details of Web publishing with the hope that by getting back to basics and taking another look at how Web publishing works, you’ll be able to create more efficient, error-free Web publishing rules with greater ease. In this post, I’ll try to explain how Host headers are used in ISA Server Web publishing in the simple HTTP scenario without SSL, and then in Part II, I’ll try to extend the picture described here to scenarios with SSL.

URLs: The Source of the Host Name

When an external user wants to retrieve content from the Internet in a Web browser, the user must provide a URL, which identifies the resource to access. The URL can be provided in several ways, such as by typing the URL in a text box or by clicking a link containing the applicable URL. The Web browser then uses the information in the URL to connect to the Web server and send a request message to the Web site for the resource requested.

Let’s examine some details of this process. A URL can include several components, some of which are optional. For example, the URL http://www.fabrikam.com:80/products/catalog.aspx?category=shoes\&model=123  includes the components listed in the following table.

Scheme

http

Authority

www.fabrikam.com:80

Path

products/catalog.aspx

Query

category=shoes&model=123

 

The first component of a URL is the scheme, which is separated from the rest of the URL by a colon followed by two slashes. The scheme determines the required syntax for the rest of the URL and indicates how it should be interpreted. In this blog entry, we will confine ourselves to the syntax for HTTP URLs, that is, the syntax of the sample URL just given.

The authority component includes the host subcomponent, which is the most important part of the URL for this discussion, because it is copied into the Host header in an HTTP request message. In this example, the authority component consists of only a host subcomponent with a host name and a port (80). In this case, the port can be omitted without altering the meaning of the URL because the default port for HTTP is 80. Note also that the query string included in the sample URL is optional and is not important for this discussion.

TCP Connections for Sending HTTP GET Requests

A Web browser, such as Internet Explorer, uses the host name in a URL both as both the name of the server with which it must establish a TCP connection in order to send request messages to the Web site and as the name of the Web site itself. Before the Web browser can establish a TCP connection with the server, it must obtain the IP address of the server, for example, by querying DNS (unless, of course, the host name is provided in the form of an IP address). The Web browser then sends a TCP SYN packet to the IP address obtained and the port specified in the URL (or to port 80, the default port for HTTP) to start the three-way handshake for establishing a TCP connection. For this process to succeed, the server must be listening on the applicable port.

If the Web site is published by ISA Server, the host name must resolve to an IP address of an ISA Server computer that is specified in the properties of a Web listener associated with a Web publishing rule (or to the IP address of a device that will direct requests to an ISA Server computer), and the TCP connection must be established with the ISA Server computer.

Note. This process may fail with the Winsock 10060 error (WSAETIMEDOUT, Connection Timed Out). This error means that there was no response to the TCP connection attempt and usually indicates some sort of network problem.

After the TCP connection is established, the Web browser sends an HTTP GET request with the information specified in the URL to the ISA Server computer. In GET requests sent directly from a Web browser to a Web server, the path is included in the first line of the request, and the host name is specified in the Host header. The request line and Host header in a GET request for our URL would appear as follows:

GET products/catalog.aspx HTTP/1.1
Host: www.fabrikam.com

Since ISA Server acts as a proxy server for Web proxy clients that send outgoing Web requests to the Internet, I should mention that when a GET request is sent to a proxy server, the request is made using the complete URL in its original form (the absolute URL), so that it can be processed by the proxy just as the original client did. A GET request sent to a proxy server for the URL http://www.fabrikam.com:8080/products/catalog.aspx would appear as follows:

GET http://www.fabrikam.com:8080/products/catalog.aspx HTTP/1.1

Host: www.fabrikam.com

Note. By default, ISA Server listens for outbound proxy requests from Web proxy clients in the Internal network on port 8080. If a client sends a CERN-compliant proxy request to the IP address and port on which the Web listener of a Web publishing rule listens for inbound Web requests, ISA Server will respond with HTTP error 400 (Bad request).

When an incoming GET request for a published Web site arrives at an ISA Server computer, ISA Server determines if the Host header in the request corresponds to a public name allowed by a Web publishing rule associated with the Web listener that allowed the TCP connection. The options on the Public Name tab of each Web publishing rule specify whether the rule applies to any host name (or IP address) that may appear in the Host header of a GET request or only to the one or several host names (or IP addresses) that are listed on the Public Name tab. ISA Server forwards only GET requests with a Host header containing a public name that is accepted by an applicable Web publishing rule.

The Host header in a GET request arriving at an ISA Server computer that publishes one or more Web sites may contain one of several fully qualified domain names (FQDNs) that resolve to the same IP address. In this case, the Host header distinguishes between different FQDNs that share a single IP address and can be used to route requests to different Web sites on the same server.

From the point of view of the client Web browser, the ISA Server computer is the Web server specified in the Host header. However, from the point of view of the ISA Server computer, the published Web server is an internal server. If the request is allowed by a Web publishing rule, the ISA Server computer must still connect to the published Web server and forward the GET request to it.

In the properties of a Web publishing rule, there are options that determine the IP address with which the ISA Server computer will establish the TCP connection for forwarding a GET request to the published Web server, as well as the Host header and path that will be used in the GET request. On the To tab of a rule that publishes a single Web server or load balancer, you must provide a resolvable name of your internal Web server (ISA Server 2004) or the internal name of your Web site (ISA Server 2006).

ISA Server 2004 always uses the IP address obtained by resolving the server name specified on the To tab to establish the TCP connection for forwarding a GET request to the published server. ISA Server 2006 offers another possibility in Web publishing rules that publish a single Web server or load balancer. In the optional Computer name or IP address field, you can type a resolvable name or IP address of the published Web server. If you do, this IP address will be used to establish the TCP connection with the published Web server, and the internal name of the Web site does not need to be a resolvable name of the internal Web server. If you leave this field blank, the internal site name must be resolvable, and the IP address obtained by resolving it will be used to establish the TCP connection with the internal Web server.

After the TCP connection between the ISA Server computer and the published Web server is established, ISA Server acts as a Web client and forwards the HTTP GET request to the published Web server. By default, except in the case of certain wizard-configured rules, ISA Server changes the original Host header to a Host header that contains the name for the published Web server (ISA Server 2004) or the internal name of the Web site (ISA Server 2006). However, if the Forward the original Host header instead of the actual one option is selected on the To tab, ISA Server will establish the TCP connection with the published Web server as described, but will include the Host header received from the original client in the HTTP GET request that it sends to the published Web server.

Note. In the Web proxy log, the field for the URL requested will always contain the name of the server (ISA Server 2004) or the internal site name (ISA Server 2006) specified on the To tab regardless of how you configure the Forward the original Host header instead of the actual one option.

In ISA Server 2006, you can replace a single internal Web server by a load-balanced Web farm. When ISA Server needs to forward an HTTP GET request to a Web farm, ISA Server selects an IP address from the set of IP addresses of the farm members specified in the Web farm properties and establishes a TCP connection with the applicable farm member for forwarding the GET request to it. In this case, the internal site name and the option for forwarding the original Host header are specified on the Web Farm tab.

Note. Web publishing fails if the internal site name specified on the Web Farm tab cannot be resolved to an IP address. We recommend that you set the internal site name to the FQDN of one of the members of the server farm.

Forwarding the original Host header is useful, for example, when you want to publish multiple Web sites on the same Web server using a single Web publishing rule. In this case, the original Host header is now used only to identify the Web site on the published server, and not the published server itself.

In Part II, I’ll try to explain how the picture described here changes when one or both of the TCP connections must become an SSL connection for encrypted communication.

 

Gabriel Koren
Forefront TMG (ISA Server) Team