This blog has moved. Please update your bookmarks.

Fetching A Web Page, Part 2 - Internet Names

In the last article, we talked a little bit about the internet itself. I tried to provide a little background on the whole thing, and I hope it wasn't too complex. Some of this will be clarified as we go on.

We ended the last article when you typed in "www.cnn.com" into your favorite browser. I use Firefox, but that's not important right now.

Let us now imagine that you press ENTER. What happens?

Well, the first thing the computer does is look at what you typed in, and it pretty darn quickly realizes (typically within a few microseconds) that you want to connect to a certain web site called "www.cnn.com". And there, the computer immediately runs into problems. This particular problem is that the computer has no idea what "www.cnn.com" is, nor how to connect to it. The only thing the computer can understand is addresses, like "200.47.96.5".

The first step, therefore, is to attempt to translate "www.cnn.com" into some kind of an internet address. How does this happen? Well, first, we've got to ask ourselves the age-old question that Shakespeare first asked, "what's in a name?"
What's in a name? That which we call a rose
By any other word would smell as sweet...
What's in a name? The name "www.cnn.com", for instance, has three different parts. The name is always read right-to-left, so the first part is "com". com is a top-level internet domain, which in this case is reserved to be used for mainly American (or global) businesses. It means, literally, commercial. CNN does indeed seem to qualify for being commercial (and global). There are other top-level domains, like "gov" for the U.S. Government, "se" for Sweden, and "uk" for the United Kingdom.

The next part is "cnn", and usually identifies the company, organization, or other body we're interested in, which exists within the "com" domain. The third part, "www", is usually thought of as a resource of some kind, but actually it indicates a specific computer within the domain "cnn.com". "www" is typically the main computer (or in these cases, computers) that handles web requests. Making sure that all web-handling computers are called "www" makes it nice and easy to address them. It theory, it would be possible to call the web computer "frank". The downside of that is that very few internet users would think of typing in "frank.cnn.com" when they wanted to access the latest headline news. In an interesting side-note, the main web computer for MIT was actually called "web.mit.edu", which led to endless confusion, but I think they've changed that by now.

How do we translate "www.cnn.com" into an internet address, then? For that, we turn our attention to the complex beast known as the Domain Name System, or DNS.

The DNS is a service available online which keeps track of all names on the internet. This is not easy, and therefore there are hundreds of thousands of computers everywhere doing this busy task. Typically, all Internet Service Providers have at least one computer, assigned to perform this duty; many have two or more, in case the first one breaks down. These are all connected to 13 mammoth root servers, which are the definite authorities for all name-based information on the internet.

The address for the DNS server used to be something you had to type in yourself into the computer, during the good old days when your internet service provider just gave you a paper with a list of complex digits on it. These days, with the remarkable invention of DHCP, this is no longer necessary; but deep down in your computer's configuration, in parts you never knew about (and hopefully will never need to touch) there exist a tiny little record of information about the address to your specific DNS server; to which your computer always sends its questions for name-based information.

So, in order to turn the name "www.cnn.com" into something a bit more useful, your computer now sends out a request to your DNS server. For a moment we'll just relax and kick back, and let the computer handle all the stuff for us. In short, this is what happens:

  • Your computer asks the DNS server "where is www.cnn.com?"

  • The DNS server has no idea, so it quickly goes out to ask the root server, "where is www.cnn.com?" The root server replies, "I don't know, but I do know who is responsible for the 'com' domain."

  • The DNS server now asks the same question to the 'com'-domain server in turn. This server replies, "I don't know, but I do know who knows all about the 'cnn.com' domain".

  • Finally, the DNS server goes on to ask the 'cnn.com' server "where is www.cnn.com", and this last server replies "oh, I have that information. The address for 'www.cnn.com' is 64.236.16.11".

  • Happy and satisfied, the DNS server now returns the answer to your computer.
Your computer now knows which address to use. We're all set to go, ready to send the first package over the world-wide web.

Next time, we'll look at how your computer makes first contact with the web server. It's a tricky process.


0 Comments:

Post a Comment

<< Home

 

Blog contents copyright © 2005 Mats Gefvert. All rights reserved.