Previous Thread
Next Thread
Print Thread
Rate Thread
Joined: Oct 2003
Posts: 2,305
Old Hand
Old Hand
Joined: Oct 2003
Posts: 2,305
Okay this subject comes up ALOT..

First of all what it is..

[] Search Engine-Friendly URLs
By Chris Beasley
August 10th 2001
Reader Rating: 8.8

On today’s Internet, database driven or dynamic sites are very popular. Unfortunately the easiest way to pass information between your pages is with a query string. In case you don’t know what a query string is, it's a string of information tacked onto the end of a URL after a question mark.

So, what’s the problem with that? Well, most search engines (with a few exceptions - namely Google) will not index any pages that have a question mark or other character (like an ampersand or equals sign) in the URL. So all of those popular dynamic sites out there aren’t being indexed - and what good is a site if no one can find it?

The solution? Search engine friendly URLs. There are a few popular ways to pass information to your pages without the use of a query string, so that search engines will still index those individual pages. I'll cover 3 of these techniques in this article. All 3 work in PHP with Apache on Linux (and while they may work in other scenarios, I cannot confirm that they do).

Method 1: PATH_INFO

Implementation:

If you look above this article on the address bar, you’ll see a URL like this: http://www.webmasterbase.com/article.php/999/12. SitePoint actually uses the PATH_INFO method to create their dynamic pages.
The PHP Anthology: Volume I & II

* Save hours researching solutions to common problems
* Explore real-world Object Oriented Programming with PHP
* Develop secure, reliable PHP Applications
* Cut down on wasted development time with enterprise practices

* Download 4 Sample Chapters FREE

Apache has a "look back" feature that scans backwards down the URL if it doesn’t find what it's looking for. In this case there is no directory or file called "12", so it looks for "999". But it find that there's not a directory or file called "999" either, so Apache continues to look down the URL and sees "article.php". This file does exist, so Apache calls up that script. Apache also has a global variable called $PATH_INFO that is created on every HTTP request. What this variable contains is the script that's being called, and everything to the right of that information in the URL. So in the example we've been using, $PATH_INFO will contain article.php/999/12.

So, you wonder, how do I query my database using article.php/999/12? First you have to split this into variables you can use. And you can do that using PHP’s explode function:

$var_array = explode("/",$PATH_INFO);

Once you do that, you’ll have the following information:

$var_array[0] = "article.php"

$var_array[1] = 999

$var_array[2] = 12

So you can rename $var_array[1] as $article and $var_array[2] as $page_num and query your database.

Drawback:

There was previously one major drawback to this method. Google, and perhaps other search engines, would not index pages set up in this manner, as they interpreted the URL as being malformed. I contacted a Software Developer at Google and made them aware of the problem and I am happy to announce that it is now fixed.

There is the potential that other search engines may ignore pages set up in this manner. While I don't know of any, I can't be certain that none do. If you do decide to use this method, be sure to monitor your server logs for spiders to ensure that your site is being indexed as it should.


Method 2: .htaccess Error Pages

Implementation:

The second method involves using the .htaccess file. If you're new to it, .htaccess is a file used to administer Apache access options for whichever directory you place it in. The server administrator has a better method of doing this using his or her configuration files, but since most of us don't own our own server, we don't have control over what the server administrator does. Now, the server admin can configure what users can do with their .htaccess file so this approach may not work on your particular server, however in most cases it will. If it doesn't, you should contact your server administrator.

This method takes advantage of .htaccess’ ability to do error handling. In the .htaccess file in whichever directory you wish to apply this method to, simply insert the following line:

ErrorDocument 404 /processor.php

Now make a script called processor.php and put it in that same directory, and you're done! Lets say you have the following URL: http://www.domain.com/directory/999/12/. And again in this example "999" and "12" do not exist, however, as you don't specify a script anywhere in the directory path, Apache will create a 404 error. Instead of sending a generic 404 header back to the browser, Apache sees the ErrorDocument command in the .htaccess file and calls up processor.php.

Now, in the first example we used the $PATH_INFO variable, but that won’t work this time. Instead we need to use the $REQUEST_URI variable, which contains everything in the URL after the domain. So in this case, it contains: /directory/999/12/.

The first thing you need to do in processor.php is send a new HTTP header. Remember, Apache thought this was a 404 error, so it wants to tell the browser that it couldn’t find a page.

So, put the following line in your processor.php:

header("HTTP/1.1 200 OK");

At this time I need to point out an important fact. In the first example you could specify what script processed your URL. In this example all URLs must be processed by the same script, processor.php, which makes things a little different. Instead of creating different URLs based on what you want to do, such as article.php/999/12 or printarticle.php/999/12 you only have 1 script that must do both.

So you must decide what to do based on the information processor.php receives - more specifically, by counting how many parameters are passed. For instance on my site, I use this method to generate my pages: I know that if there's just one parameter, such as

http://www.online-literature.com/shakespeare/, that I need to load an author information page; if there are 2 parameters, such as http://www.online-literature.com/shakespeare/hamlet/, I know that I need to load a book information page; and finally if there are 3 parameters, such as http://www.online-literature.com/shakespeare/hamlet/3/, I know I need to load a chapter viewing page. Alternatively, you can simply use the first parameter to indicate the type of page to display, and then process the remaining parameters based on that.

There are 2 ways you can accomplish this task of counting parameters. First you need to use PHP’s explode function to divide up the $REQUEST_URI variable. So if $REQUEST_URI = /shakespeare/hamlet/3/:

$var_array = explode("/",$REQUEST_URI);

Now note that, because of the positioning of the /’ there are actually 5 elements in this array. The first element, element 0, is blank, because it contains the information before the first /. The fifth element, element 4, is also blank, because it contains the information after the last /.

So now we need to count the elements in our $var_array. PHP has two functions that let us do this. We can use the sizeof() function as in this example:

$num = sizeof($var_array); // 5

You’ll notice that the sizeof() function counts every item in the array regardless of whether it's empty. The other function is count(), which is an alias for the sizeof() function.

Some search engines, like AOL, will automatically remove the trailing / from your URL, and this can cause problems if you’re using these functions to count your array. For instance http://www.online-literature.com/shakespeare/hamlet/ becomes http://www.online-literature.com/shakespeare/hamlet, and as there are 3 total elements in that array our processor.php would load an author page instead of a book page.

The solution is to create a function that will count only the elements in an array that actually hold data. This will allow you to leave off the ending / or allow any links from AOL'’s search engine to point to the proper place. An example of such a function is:

function count_all($arg)
{
// skip if argument is empty
if ($arg) {
// not an array, return 1 (base case)
if(!is_array($arg))
return 1;
// else call recursively for all elements $arg
foreach($arg as $key => $val)
$count += count_all($val);
return $count;
}
}

To get your count, access the function like this:

$num = count_all($url_array);

Once you know how many parameters you need, you can define them like this:

$author=$var_array[1];
$book=$var_array[2];
$chapter=$var_array[3];

Then you can use includes to call up the appropriate script, which will query your database and set up your page. Also if you get a result you’re not expecting, you can simply create your own error page for display to the browser.

Drawback:

The drawback of this method is that every page that's hit is seen by Apache as an error. Thus every hit creates another entry in your server error logs, which effectively destroys their usefulness. So if you use this method you sacrifice your error logs.

Method 3: The ForceType Directive

Implementation:

You'll recall that the thing that trips up Google, and maybe even other search engines, when using the PATH_INFO method is the period in the middle of the URL. So what if there was a way to use that method without the period? Guess what? There is! Its achieved using Apache'’s ForceType directive.

The ForceType directive allows you to override any default MIME types you have set up. Usually it may be used to parse an HTML page as PHP or something similar, but in this case we will use it to parse a file with no extension as PHP.

So instead of using article.php, as we did in method 1, rename that file to just "article". You will then be able to access it like this: http://www.domain.com/article/999/12/, utilizing Apache's look back feature and PATH_INFO variable as described in method 1. But now, Apache doesn’t know to that "article" needs to be parsed as php. To tell it that, you must add the following to your .htaccess file.

<Files article>
ForceType application/x-httpd-php
</Files>

This is known as a "container". Instead of applying directives to all files, Apache allows you to limit them by filename, location, or directory. You need to create a container as above and place the directives inside it. In this case we use a file container, we identify “article” as the file we're concerned with, and then we list the directives we want applied to this file before closing off the container.

By placing the directive inside the container, we tell Apache to parse "article" as a PHP script even though it has no file extension. This allows us to get rid of the period in the URL that causes the problems, and yet still use the PATH_INFO method to manage our site.

Drawback:

The only drawback to this method as compared with method 2 is that your URLs will be slightly longer. For instance, if I were to use this method on my site, I'd have to use URLs like this: http://www.online-literature.com/ol/homer/odyssey/ instead of http://www.online-literature.com/homer/odyssey/. However if you had a site like SitePoint and used this method it wouldn't be such a problem, as the URL (http://www.SitePoint.com/article/755/12/) would make more sense.

Conclusion

I have outlined 3 methods of making search engine friendly URLs - along with their drawbacks. Obviously, you should evaluate these drawbacks before deciding which method to implement. And if you have any questions about the implementation of these techniques, they are oft-discussed topics on the SitePoint Forums so just stop in and make a post.

[/]

Now then this only works in the Linux/Unix World using Apache. What about for IIS/Windows.. The above won;t work.. so we are left with what to do.. well one alternative is IIS_Rewrite


[]IIS Rewrite

Product Summary
IISRewrite is a rule-based rewriting engine that allows a webmaster to manipulate URLs on the fly in IIS.

URLs are rewritten before IIS has handed over the request to be processed, so requests for HTML files, graphics, program files, and even entire directory structures can be rewritten before they are passed to ASP scripts for processing.

IISRewrite was written to solve some practical problems that are nearly impossible to solve with IIS and ASP. It solves the compatibility issues when doing dynamic downloads with ASP, it allows portions of dynamic sites to be indexed by search engines as if they were static HTML files, and can provide a way to customize web sites based on the client's browser type without the use of Javascript.

IISRewrite is a stripped down implementation of Apache's mod_rewrite modules for IIS. Webmasters who have used Apache's mod_rewrite in the past will find that much of the configuration and functionality is the same.

IISRewrite is compatible with Microsoft's ISAPI specification and has been tested on Windows NT Server 4.0 running IIS 4 and Windows 2000 Server running IIS 5.

IISRewrite was featured in the February 2002 edition of Microsoft's MSDN Magazine. [/]

More Information can be found at IIS_Rewrite

Its costly.. and costs $199.00 per server

Another alternative is ISAPI_Rewrite

[]

Product overview

ISAPI_Rewrite is a powerful URL manipulation engine based on regular expressions. It acts mostly like Apache's mod_Rewrite, but is designed specifically for Microsoft's Internet Information Server (IIS). ISAPI_Rewrite is an ISAPI filter written in pure C/C++ so it is extremely fast. ISAPI_Rewrite gives you the freedom to go beyond the standard URL schemes and develop your own scheme.

What you can do with ISAPI_Rewrite:

* Optimize your dynamic content like forums or e-stores to be indexed by a popular search engines.
* Block hot linking of your data files by other sites.
* Develop a custom authorization scheme and manage access to the static files using custom scripts and database.
* Proxy content of one site into directory on another site.
* Make your Intranet servers to be accessible from the Internet by using only one Internet server with a very flexible permissions and security options.
* Create dynamic host-header based sites using a single physical site.
* Create virtual directory structure of the site hiding physical files and extensions. This also helps moving from one technology to another.
* Return a browser-dependent content even for static files.

And many other problems could be solved with the power of the regular expression engine built into the ISAPI_Rewrite. [/]

Drawback is that it costs per license..

For 1 it is 69.00
For 2-5 it is 58.65 per
For 6-10 it is $49.85
and so on

you can find more info at

ISAPI_Rewrite

Hope this helps to clear it ul for you

Sponsored Links
Joined: Mar 2000
Posts: 21,079
Likes: 3
I type Like navaho
I type Like navaho
Joined: Mar 2000
Posts: 21,079
Likes: 3
Interesting, thank you

I've found for Win2K/IIS5 in the php.ini file there's a paragraph like so:


[]; cgi.fix_pathinfo provides *real* PATH_INFO/PATH_TRANSLATED support for CGI. PHP's
; previous behaviour was to set PATH_TRANSLATED to SCRIPT_FILENAME, and to not grok
; what PATH_INFO is. For more information on PATH_INFO, see the cgi specs. Setting
; this to 1 will cause PHP CGI to fix it's paths to conform to the spec. A setting
; of zero causes PHP to behave as before. Default is zero. You should fix your scripts
; to use SCRIPT_FILENAME rather than PATH_TRANSLATED.
; cgi.fix_pathinfo=0
[/]

If you change the last line to:

cgi.fix_pathinfo=1

then SEF urls work. Images within threads are spotty tho, some work and some don't. The ones that don't will have url's like:

http://www.website.com/ubbthreads/showflat.php/Cat/0/Number/98389/an/0/page/images/star.gif

These images (and the other broken images) were given {$config['images']}/star.gif (etc) and so the full url for the page was enabled. They would need to be changed to $config['imageurl'] instead to fix the broken images links I haven't checked everything yet (prelim results are good tho) but that would be a simple fix to achieve SEF urls


- Allen wavey
- What Drives You?
Joined: Oct 2003
Posts: 2,305
Old Hand
Old Hand
Joined: Oct 2003
Posts: 2,305
cool

Joined: Nov 2002
Posts: 554
Code Monkey
Code Monkey
Offline
Joined: Nov 2002
Posts: 554

Joined: May 2001
Posts: 550
Code Monkey
Code Monkey
Offline
Joined: May 2001
Posts: 550
Isn't this kinda obsolete for UBBThreads 6.5 since they already have spider friendly urls?

Sponsored Links
Joined: Oct 2003
Posts: 2,305
Old Hand
Old Hand
Joined: Oct 2003
Posts: 2,305
No Since just cause ya have it turned on doesn't mean its working

Joined: May 2001
Posts: 550
Code Monkey
Code Monkey
Offline
Joined: May 2001
Posts: 550
> Since just cause ya have it turned on doesn't mean its working

It works on this site.(since when it wasn't included in the infopop package).
Why shouldn't it work now?

Joined: Oct 2003
Posts: 2,305
Old Hand
Old Hand
Joined: Oct 2003
Posts: 2,305
It works on this site prior to 6.5 because Josh built his own spider friendly modification... He also uses a Linux server and he has a good setup... Now in most windows environments unless ya know what ya doing and own the server it doesn't work out of the box. On some Linux setups it fails also because of the way their server is probably setup..

Joined: Nov 2001
Posts: 10,369
I type Like navaho
I type Like navaho
Joined: Nov 2001
Posts: 10,369
Yeah, my experience was that it only worked on 75% of the servers without playing with the server setup.

Joined: Sep 2003
Posts: 803
Coder
Coder
Offline
Joined: Sep 2003
Posts: 803
*sniff-sniff* im off to go open my, "save for spider friendly mods" account

Sponsored Links
Joined: Aug 2004
Posts: 173
Member
Member
Offline
Joined: Aug 2004
Posts: 173
I must have done something wrong

I edited my .htaccess file using method 2. 15 mins after I uploaded I had a visit from a bot and it killed my site. It would not load any pictures or my stylesheet. Even the infopop graphic at the bottom of the page would not load.

Any ideas whats wrong?

Joined: Oct 2003
Posts: 2,305
Old Hand
Old Hand
Joined: Oct 2003
Posts: 2,305
[]
Drawback:

The drawback of this method is that every page that's hit is seen by Apache as an error. Thus every hit creates another entry in your server error logs, which effectively destroys their usefulness. So if you use this method you sacrifice your error logs.[/]


it could be the way your apache config handles errors, Not every method will work on every configuration. As far as I have seen it there is actually no right way and is server dependent.

Joined: Aug 2004
Posts: 173
Member
Member
Offline
Joined: Aug 2004
Posts: 173
[]scroungr said:it could be the way your apache config handles errors, Not every method will work on every configuration. As far as I have seen it there is actually no right way and is server dependent. [/]

I see now, oh well, nothing gained, nothing lost

Joined: Dec 2004
Posts: 1
Lurker
Lurker
Offline
Joined: Dec 2004
Posts: 1
I want to try "Method 1: PATH_INFO" to make my php/mysql based site to get SEO friendly URL's I am not very good in coding. Can some one help me with the codes that I need to add to my php scripts that will enable this method?

My URL require about 3 var_array. Example :
http://www.domain.com/user/details.php?VehicleId=303&currency_id=1&userid=2543

Should look like:
http://www.domain.com/user/details/VehicleId/303/currency_id/1/userid/2543.html

Thank you,

Joined: Nov 2001
Posts: 10,369
I type Like navaho
I type Like navaho
Joined: Nov 2001
Posts: 10,369
I thought I'd post this here for reference.

I ran against a server which was running PHP as a CGI and wouldn't accept .php/Cat/0 etc... (the slash after the .php caused an error). I think you can recompile PHP with different options, but I didn't have that ability. So essentially the PATH_INFO method, which Threads 6.5 uses was ruled out. I was able to make some quick modifications to the stock 6.5 code and use REQUEST_URI instead.

Here's how:

In ubbt.inc.php find this:
Code
<br />function explode_data() {<br />	global $HTTP_GET_VARS,$PATH_INFO;<br /><br />	if (isset($_SERVER['PATH_INFO']) && !$PATH_INFO) {<br />		$PATH_INFO = $_SERVER['PATH_INFO'];<br />	}<br />


Change to this:
Code
<br />function explode_data() {<br />	global $HTTP_GET_VARS,$PATH_INFO;<br />	<br />	if (isset($_SERVER['REQUEST_URI'])) {<br />		$PATH_INFO = $_SERVER['REQUEST_URI'];<br />		$PATH_INFO = str_replace($_SERVER['PATH_INFO'],"",$PATH_INFO);<br />		$PATH_INFO = str_replace("?","/",$PATH_INFO);<br />	}	<br /><br />	if (isset($_SERVER['PATH_INFO']) && !$PATH_INFO) {<br />		$PATH_INFO = $_SERVER['PATH_INFO'];<br />	}<br />


Then I changed this:
Code
<br />if ($config['search_urls']) {<br />	$var_start = "/";<br />


to this:
Code
<br />if ($config['search_urls']) {<br />	$var_start = "?";<br />



This still puts the ? in the URL, but makes the request appear to have only one variable. Then we make it look like the path info method and threads can split the variables out and go from there.

YMMV (Your mileage may vary) as all servers are different.

Joined: Jan 2003
Posts: 3
Lurker
Lurker
Offline
Joined: Jan 2003
Posts: 3
Hi,

I'm from the Netherlands and my English is not so well, but i'll try. Now i'm using the code above, because the PATH_INFO variable wasn't available. Everything works okay, but for example with the "view=expand" button...i get the old url: ?var=var&var=var etc. So soms options doesn't work with the method above?

Joined: Mar 2000
Posts: 21,079
Likes: 3
I type Like navaho
I type Like navaho
Joined: Mar 2000
Posts: 21,079
Likes: 3
spider-friendly url's are in threads 6.5+ - there's security exploits in the older stuff, you might want to upgrade


- Allen wavey
- What Drives You?
Joined: Jan 2003
Posts: 3
Lurker
Lurker
Offline
Joined: Jan 2003
Posts: 3
I'm running on version 6.5.2 ...

Joined: Mar 2000
Posts: 21,079
Likes: 3
I type Like navaho
I type Like navaho
Joined: Mar 2000
Posts: 21,079
Likes: 3
I see (sorry for not reading your whole post ). Josh's mod probably wasn't designed to work with all possible links, looks like his example was just for searches, tho probably it can be adapted for the other urls. Which ones are you especially interested in?


- Allen wavey
- What Drives You?
Joined: Jan 2003
Posts: 3
Lurker
Lurker
Offline
Joined: Jan 2003
Posts: 3
Ah okay, if google doesn't need those links or it doesn't matter for how good google finds you, it's no problem.

Last edited by Jongerenpraat; 04/15/2006 7:39 AM.

Link Copied to Clipboard
Donate Today!
Donate via PayPal

Donate to UBBDev today to help aid in Operational, Server and Script Maintenance, and Development costs.

Please also see our parent organization VNC Web Services if you're in the need of a new UBB.threads Install or Upgrade, Site/Server Migrations, or Security and Coding Services.
Recommended Hosts
We have personally worked with and recommend the following Web Hosts:
Stable Host
bluehost
InterServer
Visit us on Facebook
Member Spotlight
hatter
hatter
USA
Posts: 69
Joined: January 2001
Forum Statistics
Forums63
Topics37,573
Posts293,925
Members13,849
Most Online5,166
Sep 15th, 2019
Today's Statistics
Currently Online
Topics Created
Posts Made
Users Online
Birthdays
Top Posters
AllenAyres 21,079
JoshPet 10,369
LK 7,394
Lord Dexter 6,708
Gizmo 5,833
Greg Hard 4,625
Top Posters(30 Days)
Top Likes Received
isaac 82
Gizmo 20
Brett 7
WebGuy 2
Morgan 2
Top Likes Received (30 Days)
None yet
The UBB.Developers Network (UBB.Dev/Threads.Dev) is ©2000-2024 VNC Web Services

 
Powered by UBB.threads™ PHP Forum Software 8.0.0
(Preview build 20221218)