This blog has moved. Please update your bookmarks.

Fetching A Web Page - Interlude

There's a rather funny experiment you can do, if you've never done it before. It's a little utility called "tracert" (on Windows at least, on Unix it's usually called "traceroute"). It will display the exact path that your tiny little packets travel over the internet.

Running the traceroute utility, someone said, is like sending out a large number of pirates with parrots. (Don't ask me where he got his mental pictures from.) Each pirate is instructed to go a specific number of steps, and then spontaneously die, commit suicide, dig himself into a hole or perform any other action that will result in his immediate death, and thereby causing his parrot to fly home to you and report the last known position.

traceroute, as the legend thus has it, sends out a large number of pirates. The first pirate takes one step, then dies, and his parrot flies immediately home to you and reports the first position. The second pirate takes two steps, dies immediately, and his parrot in turn flies home and reports the second position.

In TCP/IP, this is accomplished through a special little flag on each packet sent, called TTL, or Time To Live. Each time a packet passes a router, the TTL value is decreased by one, and when it reaches zero, the packet is discarded and a notice sent "home" to inform the sender of the action.

So traceroute sends out packets, each one with increasing TTL value (actually it usually sends three packets for each TTL, for better statistics) and records the answers it gets back. Let's try it:

Tracing route to []

1 1 ms 1 ms 1 ms [hidden]
2 2 ms 2 ms 2 ms [hidden]
3 2 ms 2 ms 2 ms
4 3 ms 2 ms 2 ms
5 2 ms 2 ms 2 ms
6 4 ms 4 ms 4 ms
7 12 ms 12 ms 12 ms
8 12 ms 12 ms 14 ms
9 28 ms 28 ms 28 ms
10 29 ms 29 ms 28 ms
11 29 ms 28 ms 28 ms
12 37 ms 37 ms 37 ms
13 68 ms 39 ms 68 ms
14 37 ms 37 ms 75 ms
15 111 ms 149 ms 147 ms
16 111 ms 146 ms 111 ms
17 123 ms 161 ms 123 ms
18 159 ms 159 ms 122 ms
19 * * * Request timed out.
20 * ^C
(I hid the first two entries for security reasons.)

What happened here? Well, for each TTL level (on the left hand side), three packets were sent out. traceroute measured the time it took for each packet to "come back home". The address was recorded as well, showing us that we first leave "" (that's my ISP for work), then go through various instances of "", "" and "".

Usually, some of these can be looked up. ATDN apparently means AOL Transit Data Network. Feel free to look up the others by typing in "www." and the domain name. (, for instance).

After hop number 18, our packets mysteriously disappear. Why is that? Usually that means that between router 18 and router 19, there sits a grim, menacing device called a firewall. A firewall is a security device that makes sure that only certain types of traffic pass through, according to very strict rules. Our parrot-carrying pirates typically belong to those types of traffic that get stopped. Beyond the firewall, our pirates disappear, and the parrots are efficiently shot down before they can make it back again. Alas, such is the world of Internet security.

Firewalls are usually put there to protect the network against malicious intrusion; which is part of the reason why our parrots are shot down without mercy. However, it could also have meant that the routers beyond number 18 doesn't work. is such a big company, though, so that is unlikely; had all traffic stopped at router number 5 instead (still within our ISP), you can safely bet your horses on a temporary network problem.

All in all, before we continue, traceroute is a remarkable little utility that allows you to track how traffic flows on the internet. In the right hands, it can be used to diagnose router errors, link failures and other types of common problems. Or just for playing with.

Too Much Tea

I have too much tea in my cupboards.

Since I haven't done any major cooking during the last few years or so (*shame*), I threw out some stuff that's gone stale. Macaronis from 2002... A package of instant soup from 2003... :)

But I discovered I have far too much tea. I'm in a habit of buying new, interesting tea; thereby pushing the older tea back, so I don't find it.

The teas I have stacked in there now include:

  • Twinings Earl Grey (teabags and loose)
  • Twinings English Breakfast (teabags and loose)
  • Twinings Pure Rooibos (teabags)
  • Twinings Apple, Cinnamon and Raisin Tea (teabags)
  • Twinings Lapsang Souchong (loose - and oh! so wonderful)
  • Twinings Darjeeling (loose - and a favorite)
  • Lipton Yellow Label (teabags)
  • Friggs Chamomile (teabags)
  • Friggs Rosehip Tea (teabags)
  • TGFOP Assam Mokalbari (loose)
  • TGFOP Darjeeling (loose)
  • Ceylon Pekoe (loose)
  • Colonial Tea Company Green Tea Lime (loose)
...and some other assorted teas I haven't dared to touch.

Incidentally, TGFOP stands for Tippy Golden Flowery Orange Pekoe, if you didn't know.

And speaking of teas, I have a theory. I think it's entirely possible to rate a tea producer entirely by their Earl Grey alone. All you need to do is to sample a producer's Earl Grey. If it's good, feel free to go on with the rest. If it's bad, chances are their other tea is just as bad.

For instance, be sure to check out Julie Catanzaro's site Tending Toward Tea, which is all about Earl Grey teas. Hundreds of them.

Fetching A Web Page, Part 3 - First Contact

When we left off last time, our computer had just been given the IP address of "", as the first part in the process of connecting to it to fetch the main web page. It turned out that the IP address was, and the computer is now ready to connect to it.

But in order to connect to it, we must first know what service we wish to use.

On any given server, there are (usually) a number of services, or programs, running. These programs listen to incoming connections from the internet, and process commands - usually handing out information, such as web pages. One of these programs is the web server. Another program may handle FTP, or file transfers; a third one may provide yet other services. Which of these programs should we connect to? Only one of them understands how to handle web pages. How do we tell the remote server which program we want to use?

It turns out, that there is a huge list of ports that are standardized. A port is sort of a door which we can go through. Depending on which door we choose, we get to see different programs. All of these doors have numbers, and as it turns out, door number 80 (port 80) is the standardized number for fetching web pages. A web server that wishes to serve web pages out to the global internet, stands idly by door number 80 and sees if there are any incoming requests. When the request bursts through the door, it says "hello, dear friend! What can I do for you?" But if we had connected to port 79 instead, the door might have been locked; or worse still, there might have been a program on the other side that had no idea what a web page is.

These port numbers are not set in stone. Some companies choose to put their web server at a different port number. This may happen because they may want to try to "hide" their server, or at least not make it so dashedly obvious that it's there. This is why, sometimes, you have to type in really complicated addresses like "". This URL tells the system to avoid port 80, and go looking for port 8080 instead.

Far from all ports are answered by your average server, though. In fact, the less ports the server answers on, the less chance there is of having a security hole somewhere. Ideally, a web server should only answer port 80 and no other port at all! Hackers usually scan servers, meaning that they go through all possible doors (ports) to see if any are open, and what may be behind the door.

But onwards we go! We have packets to send!

First of all, we need to connect to the web server at, port 80. This is done by asking the computer to send a very special packet, called a SYN packet. The SYN packet is very short, and typically consists of few things more that our IP address (so the server knows where the packet came from), the destination IP address (so the routers on the way know where to send it) and the destination port. SYN means, effectively, "knock on the door to see if anyone is there". So the SYN packet is added to the queue of outgoing packets in our machine.

At a lower level (remember the OSI model?), the network subsystem of the computer looks into its queue. It finds an outgoing packet, queued for immediate delivery. "Oooh", it thinks, "a packet!" And it proceeds to send it immediately to your Internet Service Provider; since there is little use to try to send it somewhere else.

Now a long, long chain of routers start looking at the packet that was sent. The first router comes along, and looks at the destination IP address of the packet, which is This first router is probably responsible for handling a reasonable segment of your ISP's customers, and looks through its list of known customers to see if the packet is going to any of them. "Nope", it says, "I can't find this address in my list. I must send it upwards." It proceeds to forward the packet to its uplink, who handles a much larger part of the network. The process is similar to a lieutenant in the army, who doesn't know what to do with a specific situation, and sends it up to his captain. The captain doesn't know what to do either, and sends it up to the colonel; and so it goes until it ends up all the way up at the Supreme Army Headquarters.

Army Headquarters, in this case, isn't a special router, though. It's the World Wide Internet. Up until now, we've only traveled within "our" Internet Service Provider, like Telia or Com Hem. But now, we leave them for a journey on the high seas.

The high seas of the Internet is ruled by a rather complicated protocol called BGP, or Border Gateway Protocol. It attempts to define routes between different networks, based on link connections, policies, and a multitude of other parameters. BGP is the protocol that controls all of the routes between all the myriads of networks, so you can find just the right one you need. It can be compared to the little GPS device you may have in the car, which finds a route all the way from Podgorica to the Ice Hotel in Jukkasjärvi. And just as your GPS device has a CD (or DVD) with all roads in Europe, the routers on the Internet have a humongous list of all networks on the Internet. Trust me, you really don't want to know more about this. But it's not uncommon for a long-distance packet to travel through 20 or even 30 different routers on the way - and it all takes less than a tenth of a second.

Ultimately, your packet travels through networks with strange names, such as "", "" and so forth until it reaches the target network. The routers on that side forward your SYN packet to the right sub-net, and finally to the right computer. The SYN packet knocks on door number 80, and a friendly program opens and says "hello there, what can I do for you?"

Well, not quite. But the SYN packet informs CNN's web server that we wish to start communicating, and it responds by transmitting a SYN/ACK packet (ACK stands for acknownledge). This SYN/ACK packet retraces the steps all over the Internet, and finally makes it home to your computer again. Your computer responds by sending an ACK packet (no SYN this time), and in doing so, both your computer and CNN's web server considers a connection to have been opened and agreed upon.

We are now ready to send our initial request for the main web page. We will cover this next week.

Cats In Sinks

Cats. In sinks.

Fetching A Web Page, Part 2 - Internet Names

In the last article, we talked a little bit about the internet itself. I tried to provide a little background on the whole thing, and I hope it wasn't too complex. Some of this will be clarified as we go on.

We ended the last article when you typed in "" into your favorite browser. I use Firefox, but that's not important right now.

Let us now imagine that you press ENTER. What happens?

Well, the first thing the computer does is look at what you typed in, and it pretty darn quickly realizes (typically within a few microseconds) that you want to connect to a certain web site called "". And there, the computer immediately runs into problems. This particular problem is that the computer has no idea what "" is, nor how to connect to it. The only thing the computer can understand is addresses, like "".

The first step, therefore, is to attempt to translate "" into some kind of an internet address. How does this happen? Well, first, we've got to ask ourselves the age-old question that Shakespeare first asked, "what's in a name?"
What's in a name? That which we call a rose
By any other word would smell as sweet...
What's in a name? The name "", for instance, has three different parts. The name is always read right-to-left, so the first part is "com". com is a top-level internet domain, which in this case is reserved to be used for mainly American (or global) businesses. It means, literally, commercial. CNN does indeed seem to qualify for being commercial (and global). There are other top-level domains, like "gov" for the U.S. Government, "se" for Sweden, and "uk" for the United Kingdom.

The next part is "cnn", and usually identifies the company, organization, or other body we're interested in, which exists within the "com" domain. The third part, "www", is usually thought of as a resource of some kind, but actually it indicates a specific computer within the domain "". "www" is typically the main computer (or in these cases, computers) that handles web requests. Making sure that all web-handling computers are called "www" makes it nice and easy to address them. It theory, it would be possible to call the web computer "frank". The downside of that is that very few internet users would think of typing in "" when they wanted to access the latest headline news. In an interesting side-note, the main web computer for MIT was actually called "", which led to endless confusion, but I think they've changed that by now.

How do we translate "" into an internet address, then? For that, we turn our attention to the complex beast known as the Domain Name System, or DNS.

The DNS is a service available online which keeps track of all names on the internet. This is not easy, and therefore there are hundreds of thousands of computers everywhere doing this busy task. Typically, all Internet Service Providers have at least one computer, assigned to perform this duty; many have two or more, in case the first one breaks down. These are all connected to 13 mammoth root servers, which are the definite authorities for all name-based information on the internet.

The address for the DNS server used to be something you had to type in yourself into the computer, during the good old days when your internet service provider just gave you a paper with a list of complex digits on it. These days, with the remarkable invention of DHCP, this is no longer necessary; but deep down in your computer's configuration, in parts you never knew about (and hopefully will never need to touch) there exist a tiny little record of information about the address to your specific DNS server; to which your computer always sends its questions for name-based information.

So, in order to turn the name "" into something a bit more useful, your computer now sends out a request to your DNS server. For a moment we'll just relax and kick back, and let the computer handle all the stuff for us. In short, this is what happens:

  • Your computer asks the DNS server "where is"

  • The DNS server has no idea, so it quickly goes out to ask the root server, "where is" The root server replies, "I don't know, but I do know who is responsible for the 'com' domain."

  • The DNS server now asks the same question to the 'com'-domain server in turn. This server replies, "I don't know, but I do know who knows all about the '' domain".

  • Finally, the DNS server goes on to ask the '' server "where is", and this last server replies "oh, I have that information. The address for '' is".

  • Happy and satisfied, the DNS server now returns the answer to your computer.
Your computer now knows which address to use. We're all set to go, ready to send the first package over the world-wide web.

Next time, we'll look at how your computer makes first contact with the web server. It's a tricky process.

Fetching A Web Page, Part 1 - Background

Few people realize just how difficult it is, and how many separate processes that are involved, in the "simple" request of fetching a web page. It may seem like the Internet is just a big, gigantic "thing" that works. People may have heard about IP addresses, routers and so forth, but maybe not really understood what it's all about or how it all fits together.

The web is an amazing tool. It has transformed our lives so to the extent that is defines the computer experience today. So I thought it could be interesting to take a closer look at some of the technologies that drive it. For instance, the process of fetching a single web page.

I intend to write this as a series, where we will be looking at each step in fetching a web page over the internet in sequence. I'm probably going to sprinkle the text with references to Wikipedia, where you can read more in detail about each step.

But to understand it all, some introductory knowledge may be required. The history goes something like this:

The internet began as a U.S. Department of Defense science project, more or less. The idea was to build a network of computers that was very loosely bound together, in order to be able to withstand, among other things, a nuclear attack. The idea went that if a network was organized extremely hierarchically, then one single missile hit could take out some very important central part of a nation-wide network. But if it was loosely bound together, there would be no single point of attack that would take out a disproportionally large part of the network. Thus the first network, called ARPANET, was based on the idea of packet switching, that is, all communication that takes place, takes place through a series of individual packets. These packets are sent out with source and destination addresses attached to it, and several routers on the way choose the best route for the packet, using the available channels, until it is delivered to the correct destination. If one part of the network went down, the packets would ideally just find a different way through the internet, and everything would work anyway (although maybe a bit slower).

ARPANET first went online in 1969, and then evolved over the years, adding a bit of technology here, growing by a couple of million users there, until the internet sort of opened up for commercial purposes in the early 90's. And the rest, as they say, is history.

To further expand upon the idea of the network, it should be clarified that there are different types of networks. The two primary talk-abouts these days are Local Area Networks and Wide Area Networks (or LAN and WAN, for short). The internet is an example of a wide area network; a network of computers that span a large area, such as a city, a country, or in this case, the entire world. Local area networks are much more ... well, local. :) They typically encompass a house, a company, or a smaller geographically limited, contigous area. The internet (a WAN) is actually built out of millions of local networks (LANs), all connected together.

What, then, defines a local network? From the internet's point of view, a network is a group of internet addresses, all bunched together. An internet address, as you may have seen, consists of four groups of digits, like, This is, so to say, the "phone number" of a single computer on the internet. The maximum number of unique addresses is four billion, which may seem like a lot, but not all of these numbers are used; and it actually is getting kind of crowded out there, so it's not as much as people thought initially. Each computer, then, has a different phone number, which uniquely identifies this machine on the internet. However, much like ordinary phone numbers have area codes and prefixes, internet addresses also have area codes. They are called, precisely, networks. Usually, the way it goes these days is that the first tree groups of digits define the network; and the last digit group defines the computer on that individual network. Of the address, the network is called 200.47.96, and the .1 tells the system that it's computer number one on that network. This makes it all easier, as we shall see later on, to know where individual packets should go.

When computers talk to each other over the network, there are many different layers of communication going on. The fancy name for this is the ISO protocol stack, which defines seven different levels of communication. The first basic level of computer-to-computer talk is to imagine the scenario with two computers linked together by a physical cable, where all they can do is to say "yes" or "no" to each other (typically, high voltage means "yes" and low voltage means "no").

The next level of complexity in this model, is when we attach many computers to the same cable. We then need some sort of communication protocol, to make sure that no two computers are saying "yes" or "no" to each other at the same time, thereby garbling the electrical signals over the wire, and instead take turn in speaking. Fancy names for these protocols include, for instance, IEEE 802.3.

On levels above this, we start working with addresses and many, many computers, connected through different cables; identifying computers, making sure that individual pieces of transmission arrive safely (and not garbled in any way); and the ultimate level of communication, which is known as the application level, where we don't concern ourselves with anything more technical than the idea of fetching web pages ("fetch a web page over this big internet thing, I say, and work out all the technical details while you're at it"). The purpose of these layers is to split the problem of communication up into many different isolated problems, and solve each one at a time. Once one layer of abstraction is built, we simply start building the next communication layer on top of the last.

TCP/IP is the main protocol that drives the internet. TCP/IP actually embodies two different layers in this ISO communication stack: IP means Internet Protocol, and it is the main part making sure that an individual packet sent anywhere, anytime, reaches its destination. TCP builds on top of this, and means Transmission Control Protocol, and uses IP packets to start conversations between computers, making sure that no part of this conversation is lost, garbled in transmission, or that the "sentences" arrive in the wrong order. TCP is also good for saying "hello" and "goodbye", something very important in computer conversations, and which IP doesn't care at all about.

Okay, having gotten this far in our story, I hope you haven't lost me in all the technical details. For now, it's important to know that there are a lot of computers out there, all talking to each other, and that they form networks. Now, let's use this network. Let's imagine that you type in the address "" into your favorite browser, and next time, we'll look at what happens when you press ENTER.

Golden Dialogs

As I've mentioned before, UI design is hard.

Take a look at this dialog, for instance.

The text says "Do you want to move or copy files from this zone?" and gives you the buttons "Yes" or "No" to click on.

Of course, the intention was to ask if the user wanted to proceed with the operation of transferring the files from the SharePoint zone to the desktop (whether that operation included moving or copying files), but it comes across as a question of whether the user wanted to 1) copy, or 2) move the files. Which is why the buttons are totally confusing for most users, unless the intent behind it can be decoded.

Tim O'Reilly: What Is Web 2.0?

Tim O'Reilly has written a very interesting article, called "What is the Web 2.0?"

Trying to describe the frontier of the new business movement on the internet, he compares companies like Microsoft and Google and explains why Microsoft will never beat the latter. He comments on the trend of data ownership of companies like Wikipedia and Amazon.Com and explains how these types of applications are the very programs that drive the Web 2.0.

He describes a society that is loosely bound together, through blogs, RSS feeds, easily deployable and easily usable web services (Amazon.Com reports that 95% of all customers use their REST web services, not SOAP), and the challenges faced by companies in this crazy arena. For instance, while Microsoft produces new releases of their software once every year, at most, companies like Flickr release new builds every thirty minutes.

Of course, the trick is purely sociological; if you can harness the power - and competency - of the internet users, then they will participate in your service, adding data and value to it themselves. Companies that harness this capability will be very successful in the Web 2.0; companies that are clinging to old-style business models will fade away. Like IBM.

Exit Grasshopper

Grasshopper left the pad today at 0300 hours; next stop Ladybug.

Funny, the place seems a lot emptier without him.

Mozilla Thunderbird And UI Design

I recently installed Mozilla Thunderbird as a replacement for Outlook Express. While I think Thunderbird probably is a better program, it is evident that some things have not been totally thought through - for me, mostly in the area of UI design and visual feedback.

One design mistake is to make the caption for the email account bold. In email programs, bold text always signify that you have new mail. Making all captions bold makes the view puzzling ... in the view to the right, it's not immediately clear which email accounts have new mail, and which have not.

I also find it a bit distressing that there is very little visual feedback about what is going on. I never quite know when Thunderbird is finished checking for email (and blogs), for instance. Some kind of visual indicator would be nice. Some sort of moving progress bar would suffice, and which would let you know by disappearing that all actions have been completed.

I'm a little bit afraid to find that it's all possible to add already by some clever hack, because I've seen the configuration file for Thunderbird. It's in Javascript!!!

My Town Is Growing Up

The big news in town is that the new road opened. Because of the deteriorating traffic situation through Skultorp, it was decided that a new road should be built. Since the road would go right through military training fields, they had to build 300-meter tunnel, where Leopard II armored tanks can drive over the road without bothering traffic.

The grand opening was yesterday, the 1st of October. I test-drove it today; and while I refuse to be overly excited about a chunk of road and a tunnel (I mean, hey, Stockholm has tunnels, man), it's still a little inspiring to see things change around here.

I don't think it was thought to coincide with the fact that Skövde passed the 50,000-population mark just a few weeks ago, but it did.

My town is growing up. :)

Windows: We've Come A Long Way

Once upon a time, Operating Systems used to be the 100,000 lines of code that kept your computer running, your files available on the hard disk, and which loaded and executed programs at your will (or, sometimes not). Depending on the size of the computer system they were running on, they may also have implemented multiple user sessions, timesharing, quota management and a number of different features, suited for making the end user believe that he was the only one using the computer and not just one out of one hundred.

Nevertheless, the system required users to know massive amounts of information about the system. It required the user, for instance, to know the difference between "mkdir" and "mkfs". The first command creates a new folder in your file system, the other command wipes out your file system along with everything in it, and doesn't even stop to ask if that's what you wanted to do. While the catchy slogans today declare "What You See Is What You Get", these systems were "You Asked For It, You Got It".

How far we have come. Today we have computer systems that aren't run by engineers in white coats, but actually used by elderly people whose previous experience with technology narrows down to repairing engines. Not without due questions, though, but they do use it.

I would like to think that Microsoft Windows, as of today, has matured to the point where it can almost be used by ordinary people. And, for most computer systems, this is a very good grade indeed. Linux, to name a random competitor, has not.

When you poke around beneath the surface of these two huge operating systems, you inevitably find odd things - strange quirks, inherited restraints, designs locked into paradigms of user interface thinking since long gone - but I do have to say, that with Windows XP, some of these things are actually improving. I've seen things change under the hood in the past few years that have made me appreciative of Windows, and this comes from a guy who once ran a BBS called Organized Programmers against Object-Oriented Programming.

Still, we have a long way to go.

Sometimes, I like to ask myself questions - radical questions. Like, "why does the operating system need to be visible at all". Why should I be able to access C:\WINDOWS at all? My friends are quick to point out "because you want to see what is going on", or "because you want to replace some things" or similar answers. Of course, I buy those arguments, because the way Windows is designed these days, makes it necessary that you see what is going on.

But let's compare it to a dish-washer. I have a dish-washer at home, which I frequently use. It's great. It has six little buttons on the top panel, with which I can select washing programs. It has three little status indicators, to show me when things go wrong (like, if it needs extra salt). Beyond that, I don't need to know anything about it. It just does its job.

Why should my computer be more difficult than that? My computer should be what it's set out to be: A personal computational tool; an organizer of documents; a play-station. It's a magnitude or two more complex, I admit, than a dish-washer; but the same inherent functionality is the same: It is designed to carry out a specific set of tasks, and if it does that, then all is fine. Beyond that, everything else is unnecessary. An end-user should be able to write documents, view and organize holiday pictures, and even write complex software, with the same ease as we load the dish-washer and turn it on. The fundamentals - the operating system, or the control program that drives the dish-washer - should never be seen; nor should it need to be seen.

This is where everybody (especially hackers) start yelling at me. "Are you crazy", they say. "It'll never work - it's too complex", "I want control over my computer". Excuse my profanity when I say: Bullshit. All of that is simply because someone didn't THINK long enough before building it.

I think it's time to get radical - not only with UI design, but with stability, functionality, and powerful easeability; until we one day build a computer that does simply what it's supposed to do.

Windows was a good first step - an insufficient, badly constructed beta, mind you - but still a good first step. Now let's take the next one.

Trying Out Thunderbird

I'm trying out the Thunderbird RSS capabilities right now. Let's see if it's better than SharpReader.


Blog contents copyright © 2005 Mats Gefvert. All rights reserved.