Home Blog Programming Identify webpages using slugs

Identify webpages using slugs

Written by Peter R. Bloomfield | Monday, 21 December 2009 14:21 | 0 comments
'Slugs' are becoming quite popular for constructing URLs these days. A slug is a simple text identifier for a resource, such as a blog-post. They are popular largely because they make URLs look nicer, and can be used to help search engines crawl sites more effectively.

For example, you could have a URL looking like this:

http://www.example.com/index.php?page=15

Not very nice to look at, and search engines usually ignore 'GET' parameters (everything from the question mark (?) onwards). Instead, using a slug, you could make it look like this:

http://www.example.com/company-history

That's much nicer. The "company-history" bit is the 'slug' for the page. It tells the user something about the page content, and it allows search engines to crawl your site easily as if it was a literal file-system. Under the surface, your website could be doing anything it likes, such as extracting that page from a database. There doesn't actually have to be a real file or folder called "company-history".

Using Path Info

There are a couple of common approaches to implementing slugs. The first is easier, but it often creates slightly odd-looking URLs. If you are running Apache and PHP, then you can hopefully use something called "path info". First, it needs to be enabled in Apache. You can usually do this either in your Apache configuration file, or in a .htaccess file. You can find out more information in the Apache documentation about AcceptPathInfo.

Next, create a script called "index.php", and put it on your server (could be a local or remote server). You will eventually be accessing the script using a URL like this:

http://www.example.com/index.php/company-history

As you can see, it isn't perfect. We still have the "index.php" in there. There are ways to get rid of that, but they are not always reliable. Nonetheless, search engines are quite happy with it, so if that's your main goal then it should be OK.

Ordinary, everything after the "index.php" would be ignored or it would cause an error. However, with AcceptPathInfo enabled, that information is placed into a predefined PHP variable:

$_SERVER['PATH_INFO']

Edit your index.php script to add the following:

echo $_SERVER['PATH_INFO'];

When you access the script, add "/blahblahblah" after the script name, and you will see that information displayed in your browser. It is common practice to have multiple slashes to separate different bits of information. For example, path info like "/2009/12/15/foobar" might identify a blog post called "foobar" written on 15th December 2009. Simple use of the PHP function "explode" would let you split the path info up at each slash.

One word of caution about this method: if no path info is specified, then the path info server variable will not be defined.

URL rewriting

This is a much more complex method, which also relies on Apache. In this case, you define a rewrite rule which will let you remove the "index.php" from your URL, making it a bit neater. This method doesn't rely on the AcceptPathInfo setting.

The first step is to define your rewrite rule, and it is very complex for a beginner to understand. For full Apache documentation, see "mod_rewrite".

Once again, you can define the rewrite rule in a configuration file, or in .htaccess. I favour the latter approach, because many people don't have access to their configuration files. The simplest script looks something like this:

[code]

This is a very simple solution, but it should work. The rest of the work happens in PHP (although I imagine something similar is possible in other server-side languages). The key to this in the fact that Apache communicates the complete URL (slugs and all) to the PHP script. This allows you to strip off the name of the server, and then process the 'path info' (it is commonly still called path info, even though it does not use that particular Apache feature).

The following code will create an array called "$pathinfo" which contains all the parts of the path info, assuming each part is separated by a forward slash:

[code]

You can then process each part to determine how to render the page.

This has been a very brief overview of using slugs in webpages. It is not the only way to use them My best advice is to play around with the techniques and learn

Add new comment