Dissecting the URL 

Often, marketing clients have a deep understanding of their own businesses, route to market and campaign targets, but aren’t necessarily experts in the field of digital. Some of the more staple elements of the online world require an explanation to assist in interpreting reports and campaign documentation. This article explains the URL and the nomenclature of its various components.

Using as an example, Http (Hyper Text Transfer Protocol) is the standard protocol used by most web pages. The other common protocol used is Https, which is used for secure connections. The colon is used as a separator and the double slash // is the instruction for making a connection to a server.

The www.example.com is the domain name, but it may also be called the hostname when associated with an IP address. The www part is an optional subdomain, while .com is the TLD (Top Level Domain). It should be noted that example.com (without the www) is also a domain name and similarly, if an IP address is associated with it, it can also be a host name.

The next part after the colon is the port number (80, in the example above). Port 80 is the default port for http and is rarely seen as most browsers don’t display it.

In the above example, /media represents the path. In situations in which there is no path, the slash would indicate the root of the domain. In many URLs, the last part is followed by a further / and then a file name: index.html, index.htm, default.html and index.php being four common examples.

A URL can also include sub directories: in the above example, /media/ is the sub directory.

The URL in the example contains one last section, ?id=647386768. This is a URL parameter and may well mean that the URL is dynamic, that is to say, it is generated by code (often from a content management system). Dynamic URLs can be problematical from a SEO point of view. The parameter here also uses the id=. Using the id= (or sid=) parameter is not recommended, as some search engines can construe this as denoting a session id and may not fully spider the URL. When it comes to using dynamic pages Google has offered the following: advice

If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.

You will, on occasions, see URLs like the following:

http://www.example.com/reptiles.html#terrapins

Here, the # denotes a “named anchor” which is in effect a place holder on a page. These are especially useful in long html pages when you want to link to a specific part of a page. Search engines do not follow these. As AJAX pages sometimes use the # as part of the URL structure, this can render them uncrawlable. However, Google has mentioned that it is working on crawling AJAX pages and has proposed a technique for creating search friendly AJAX pages.

Tags: , ,

1 comments Add This

  1. Ozman says:

    May 7, 2010

    Thanks for the article. This is really useful for understanding the components of a URL.

Leave a comment

Please note that job applications should be sent to careers@lbi.com