UBB.Developers Forums UBB.threads 7 Development How Do I? Spider Friendly URL's and You :)

Print Thread

Rate Thread

Spider Friendly URL's and You :) #281223 10/24/2004 12:25 AM
Joined: Oct 2003 Posts: 2,305 Richmond, VA scroungr Old Hand
scroungr Old Hand Joined: Oct 2003 Posts: 2,305 Richmond, VA	Okay this subject comes up ALOT.. First of all what it is.. [] Search Engine-Friendly URLs By Chris Beasley August 10th 2001 Reader Rating: 8.8 On today’s Internet, database driven or dynamic sites are very popular. Unfortunately the easiest way to pass information between your pages is with a query string. In case you don’t know what a query string is, it's a string of information tacked onto the end of a URL after a question mark. So, what’s the problem with that? Well, most search engines (with a few exceptions - namely Google) will not index any pages that have a question mark or other character (like an ampersand or equals sign) in the URL. So all of those popular dynamic sites out there aren’t being indexed - and what good is a site if no one can find it? The solution? Search engine friendly URLs. There are a few popular ways to pass information to your pages without the use of a query string, so that search engines will still index those individual pages. I'll cover 3 of these techniques in this article. All 3 work in PHP with Apache on Linux (and while they may work in other scenarios, I cannot confirm that they do). Method 1: PATH_INFO Implementation: If you look above this article on the address bar, you’ll see a URL like this: http://www.webmasterbase.com/article.php/999/12. SitePoint actually uses the PATH_INFO method to create their dynamic pages. The PHP Anthology: Volume I & II * Save hours researching solutions to common problems * Explore real-world Object Oriented Programming with PHP * Develop secure, reliable PHP Applications * Cut down on wasted development time with enterprise practices * Download 4 Sample Chapters FREE Apache has a "look back" feature that scans backwards down the URL if it doesn’t find what it's looking for. In this case there is no directory or file called "12", so it looks for "999". But it find that there's not a directory or file called "999" either, so Apache continues to look down the URL and sees "article.php". This file does exist, so Apache calls up that script. Apache also has a global variable called $PATH_INFO that is created on every HTTP request. What this variable contains is the script that's being called, and everything to the right of that information in the URL. So in the example we've been using, $PATH_INFO will contain article.php/999/12. So, you wonder, how do I query my database using article.php/999/12? First you have to split this into variables you can use. And you can do that using PHP’s explode function: $var_array = explode("/",$PATH_INFO); Once you do that, you’ll have the following information: $var_array[0] = "article.php" $var_array[1] = 999 $var_array[2] = 12 So you can rename $var_array[1] as $article and $var_array[2] as $page_num and query your database. Drawback: There was previously one major drawback to this method. Google, and perhaps other search engines, would not index pages set up in this manner, as they interpreted the URL as being malformed. I contacted a Software Developer at Google and made them aware of the problem and I am happy to announce that it is now fixed. There is the potential that other search engines may ignore pages set up in this manner. While I don't know of any, I can't be certain that none do. If you do decide to use this method, be sure to monitor your server logs for spiders to ensure that your site is being indexed as it should. Method 2: .htaccess Error Pages Implementation: The second method involves using the .htaccess file. If you're new to it, .htaccess is a file used to administer Apache access options for whichever directory you place it in. The server administrator has a better method of doing this using his or her configuration files, but since most of us don't own our own server, we don't have control over what the server administrator does. Now, the server admin can configure what users can do with their .htaccess file so this approach may not work on your particular server, however in most cases it will. If it doesn't, you should contact your server administrator. This method takes advantage of .htaccess’ ability to do error handling. In the .htaccess file in whichever directory you wish to apply this method to, simply insert the following line: ErrorDocument 404 /processor.php Now make a script called processor.php and put it in that same directory, and you're done! Lets say you have the following URL: http://www.domain.com/directory/999/12/. And again in this example "999" and "12" do not exist, however, as you don't specify a script anywhere in the directory path, Apache will create a 404 error. Instead of sending a generic 404 header back to the browser, Apache sees the ErrorDocument command in the .htaccess file and calls up processor.php. Now, in the first example we used the $PATH_INFO variable, but that won’t work this time. Instead we need to use the $REQUEST_URI variable, which contains everything in the URL after the domain. So in this case, it contains: /directory/999/12/. The first thing you need to do in processor.php is send a new HTTP header. Remember, Apache thought this was a 404 error, so it wants to tell the browser that it couldn’t find a page. So, put the following line in your processor.php: header("HTTP/1.1 200 OK"); At this time I need to point out an important fact. In the first example you could specify what script processed your URL. In this example all URLs must be processed by the same script, processor.php, which makes things a little different. Instead of creating different URLs based on what you want to do, such as article.php/999/12 or printarticle.php/999/12 you only have 1 script that must do both. So you must decide what to do based on the information processor.php receives - more specifically, by counting how many parameters are passed. For instance on my site, I use this method to generate my pages: I know that if there's just one parameter, such as http://www.online-literature.com/shakespeare/, that I need to load an author information page; if there are 2 parameters, such as http://www.online-literature.com/shakespeare/hamlet/, I know that I need to load a book information page; and finally if there are 3 parameters, such as http://www.online-literature.com/shakespeare/hamlet/3/, I know I need to load a chapter viewing page. Alternatively, you can simply use the first parameter to indicate the type of page to display, and then process the remaining parameters based on that. There are 2 ways you can accomplish this task of counting parameters. First you need to use PHP’s explode function to divide up the $REQUEST_URI variable. So if $REQUEST_URI = /shakespeare/hamlet/3/: $var_array = explode("/",$REQUEST_URI); Now note that, because of the positioning of the /’ there are actually 5 elements in this array. The first element, element 0, is blank, because it contains the information before the first /. The fifth element, element 4, is also blank, because it contains the information after the last /. So now we need to count the elements in our $var_array. PHP has two functions that let us do this. We can use the sizeof() function as in this example: $num = sizeof($var_array); // 5 You’ll notice that the sizeof() function counts every item in the array regardless of whether it's empty. The other function is count(), which is an alias for the sizeof() function. Some search engines, like AOL, will automatically remove the trailing / from your URL, and this can cause problems if you’re using these functions to count your array. For instance http://www.online-literature.com/shakespeare/hamlet/ becomes http://www.online-literature.com/shakespeare/hamlet, and as there are 3 total elements in that array our processor.php would load an author page instead of a book page. The solution is to create a function that will count only the elements in an array that actually hold data. This will allow you to leave off the ending / or allow any links from AOL'’s search engine to point to the proper place. An example of such a function is: function count_all($arg) { // skip if argument is empty if ($arg) { // not an array, return 1 (base case) if(!is_array($arg)) return 1; // else call recursively for all elements $arg foreach($arg as $key => $val) $count += count_all($val); return $count; } } To get your count, access the function like this: $num = count_all($url_array); Once you know how many parameters you need, you can define them like this: $author=$var_array[1]; $book=$var_array[2]; $chapter=$var_array[3]; Then you can use includes to call up the appropriate script, which will query your database and set up your page. Also if you get a result you’re not expecting, you can simply create your own error page for display to the browser. Drawback: The drawback of this method is that every page that's hit is seen by Apache as an error. Thus every hit creates another entry in your server error logs, which effectively destroys their usefulness. So if you use this method you sacrifice your error logs. Method 3: The ForceType Directive Implementation: You'll recall that the thing that trips up Google, and maybe even other search engines, when using the PATH_INFO method is the period in the middle of the URL. So what if there was a way to use that method without the period? Guess what? There is! Its achieved using Apache'’s ForceType directive. The ForceType directive allows you to override any default MIME types you have set up. Usually it may be used to parse an HTML page as PHP or something similar, but in this case we will use it to parse a file with no extension as PHP. So instead of using article.php, as we did in method 1, rename that file to just "article". You will then be able to access it like this: http://www.domain.com/article/999/12/, utilizing Apache's look back feature and PATH_INFO variable as described in method 1. But now, Apache doesn’t know to that "article" needs to be parsed as php. To tell it that, you must add the following to your .htaccess file. <Files article> ForceType application/x-httpd-php </Files> This is known as a "container". Instead of applying directives to all files, Apache allows you to limit them by filename, location, or directory. You need to create a container as above and place the directives inside it. In this case we use a file container, we identify “article” as the file we're concerned with, and then we list the directives we want applied to this file before closing off the container. By placing the directive inside the container, we tell Apache to parse "article" as a PHP script even though it has no file extension. This allows us to get rid of the period in the URL that causes the problems, and yet still use the PATH_INFO method to manage our site. Drawback: The only drawback to this method as compared with method 2 is that your URLs will be slightly longer. For instance, if I were to use this method on my site, I'd have to use URLs like this: http://www.online-literature.com/ol/homer/odyssey/ instead of http://www.online-literature.com/homer/odyssey/. However if you had a site like SitePoint and used this method it wouldn't be such a problem, as the URL (http://www.SitePoint.com/article/755/12/) would make more sense. Conclusion I have outlined 3 methods of making search engine friendly URLs - along with their drawbacks. Obviously, you should evaluate these drawbacks before deciding which method to implement. And if you have any questions about the implementation of these techniques, they are oft-discussed topics on the SitePoint Forums so just stop in and make a post. [/] Now then this only works in the Linux/Unix World using Apache. What about for IIS/Windows.. The above won;t work.. so we are left with what to do.. well one alternative is IIS_Rewrite []IIS Rewrite Product Summary IISRewrite is a rule-based rewriting engine that allows a webmaster to manipulate URLs on the fly in IIS. URLs are rewritten before IIS has handed over the request to be processed, so requests for HTML files, graphics, program files, and even entire directory structures can be rewritten before they are passed to ASP scripts for processing. IISRewrite was written to solve some practical problems that are nearly impossible to solve with IIS and ASP. It solves the compatibility issues when doing dynamic downloads with ASP, it allows portions of dynamic sites to be indexed by search engines as if they were static HTML files, and can provide a way to customize web sites based on the client's browser type without the use of Javascript. IISRewrite is a stripped down implementation of Apache's mod_rewrite modules for IIS. Webmasters who have used Apache's mod_rewrite in the past will find that much of the configuration and functionality is the same. IISRewrite is compatible with Microsoft's ISAPI specification and has been tested on Windows NT Server 4.0 running IIS 4 and Windows 2000 Server running IIS 5. IISRewrite was featured in the February 2002 edition of Microsoft's MSDN Magazine. [/] More Information can be found at IIS_Rewrite Its costly.. and costs $199.00 per server Another alternative is ISAPI_Rewrite [] Product overview ISAPI_Rewrite is a powerful URL manipulation engine based on regular expressions. It acts mostly like Apache's mod_Rewrite, but is designed specifically for Microsoft's Internet Information Server (IIS). ISAPI_Rewrite is an ISAPI filter written in pure C/C++ so it is extremely fast. ISAPI_Rewrite gives you the freedom to go beyond the standard URL schemes and develop your own scheme. What you can do with ISAPI_Rewrite: * Optimize your dynamic content like forums or e-stores to be indexed by a popular search engines. * Block hot linking of your data files by other sites. * Develop a custom authorization scheme and manage access to the static files using custom scripts and database. * Proxy content of one site into directory on another site. * Make your Intranet servers to be accessible from the Internet by using only one Internet server with a very flexible permissions and security options. * Create dynamic host-header based sites using a single physical site. * Create virtual directory structure of the site hiding physical files and extensions. This also helps moving from one technology to another. * Return a browser-dependent content even for static files. And many other problems could be solved with the power of the regular expression engine built into the ISAPI_Rewrite. [/] Drawback is that it costs per license.. For 1 it is 69.00 For 2-5 it is 58.65 per For 6-10 it is $49.85 and so on you can find more info at ISAPI_Rewrite Hope this helps to clear it ul for you

Sponsored Links

Re: Spider Friendly URL's and You :) Duck #281233 11/07/2004 7:16 AM
Joined: Aug 2004 Posts: 173 UK dparvin Member
dparvin Member Joined: Aug 2004 Posts: 173 UK	I must have done something wrong I edited my .htaccess file using method 2. 15 mins after I uploaded I had a visit from a bot and it killed my site. It would not load any pictures or my stylesheet. Even the infopop graphic at the bottom of the page would not load. Any ideas whats wrong?

Re: Spider Friendly URL's and You :) DMClark #281234 11/07/2004 11:16 AM
Joined: Oct 2003 Posts: 2,305 Richmond, VA scroungr Old Hand
scroungr Old Hand Joined: Oct 2003 Posts: 2,305 Richmond, VA	[] Drawback: The drawback of this method is that every page that's hit is seen by Apache as an error. Thus every hit creates another entry in your server error logs, which effectively destroys their usefulness. So if you use this method you sacrifice your error logs.[/] it could be the way your apache config handles errors, Not every method will work on every configuration. As far as I have seen it there is actually no right way and is server dependent.

Re: Spider Friendly URL's and You :) 234234 #281235 11/07/2004 4:40 PM
Joined: Aug 2004 Posts: 173 UK dparvin Member
dparvin Member Joined: Aug 2004 Posts: 173 UK	[]scroungr said:it could be the way your apache config handles errors, Not every method will work on every configuration. As far as I have seen it there is actually no right way and is server dependent. [/] I see now, oh well, nothing gained, nothing lost

Re: Spider Friendly URL's and You :) 234234 #281236 12/17/2004 2:52 PM
Joined: Dec 2004 Posts: 1 smartcard Lurker
smartcard Lurker Joined: Dec 2004 Posts: 1	I want to try "Method 1: PATH_INFO" to make my php/mysql based site to get SEO friendly URL's I am not very good in coding. Can some one help me with the codes that I need to add to my php scripts that will enable this method? My URL require about 3 var_array. Example : http://www.domain.com/user/details.php?VehicleId=303&currency_id=1&userid=2543 Should look like: http://www.domain.com/user/details/VehicleId/303/currency_id/1/userid/2543.html Thank you,

Re: Spider Friendly URL's and You :) stupyfide #281237 02/13/2005 11:01 PM
Joined: Nov 2001 Posts: 10,369 Charlotte, NC JoshPet I type Like navaho
JoshPet I type Like navaho Joined: Nov 2001 Posts: 10,369 Charlotte, NC	I thought I'd post this here for reference. I ran against a server which was running PHP as a CGI and wouldn't accept .php/Cat/0 etc... (the slash after the .php caused an error). I think you can recompile PHP with different options, but I didn't have that ability. So essentially the PATH_INFO method, which Threads 6.5 uses was ruled out. I was able to make some quick modifications to the stock 6.5 code and use REQUEST_URI instead. Here's how: In ubbt.inc.php find this: Code <br />function explode_data() {<br /> global $HTTP_GET_VARS,$PATH_INFO;<br /><br /> if (isset($_SERVER['PATH_INFO']) && !$PATH_INFO) {<br /> $PATH_INFO = $_SERVER['PATH_INFO'];<br /> }<br /> Change to this: Code <br />function explode_data() {<br /> global $HTTP_GET_VARS,$PATH_INFO;<br /> <br /> if (isset($_SERVER['REQUEST_URI'])) {<br /> $PATH_INFO = $_SERVER['REQUEST_URI'];<br /> $PATH_INFO = str_replace($_SERVER['PATH_INFO'],"",$PATH_INFO);<br /> $PATH_INFO = str_replace("?","/",$PATH_INFO);<br /> } <br /><br /> if (isset($_SERVER['PATH_INFO']) && !$PATH_INFO) {<br /> $PATH_INFO = $_SERVER['PATH_INFO'];<br /> }<br /> Then I changed this: Code <br />if ($config['search_urls']) {<br /> $var_start = "/";<br /> to this: Code <br />if ($config['search_urls']) {<br /> $var_start = "?";<br /> This still puts the ? in the URL, but makes the request appear to have only one variable. Then we make it look like the path info method and threads can split the variables out and go from there. YMMV (Your mileage may vary) as all servers are different.

Re: Spider Friendly URL's and You :) Daine #281238 04/10/2006 5:40 PM
Joined: Jan 2003 Posts: 3 Jongerenpraat Lurker
Jongerenpraat Lurker Joined: Jan 2003 Posts: 3	Hi, I'm from the Netherlands and my English is not so well, but i'll try. Now i'm using the code above, because the PATH_INFO variable wasn't available. Everything works okay, but for example with the "view=expand" button...i get the old url: ?var=var&var=var etc. So soms options doesn't work with the method above?

Re: Spider Friendly URL's and You :) makloon #281239 04/10/2006 11:11 PM
Joined: Mar 2000 Posts: 21,079 Likes: 3 Texas AllenAyres I type Like navaho
AllenAyres I type Like navaho Joined: Mar 2000 Posts: 21,079 Likes: 3 Texas	spider-friendly url's are in threads 6.5+ - there's security exploits in the older stuff, you might want to upgrade - Allen - What Drives You?

Re: Spider Friendly URL's and You :) SurfMinister #281240 04/11/2006 2:38 PM
Joined: Jan 2003 Posts: 3 Jongerenpraat Lurker
Jongerenpraat Lurker Joined: Jan 2003 Posts: 3	I'm running on version 6.5.2 ...

Re: Spider Friendly URL's and You :) makloon #281241 04/12/2006 12:23 PM
Joined: Mar 2000 Posts: 21,079 Likes: 3 Texas AllenAyres I type Like navaho
AllenAyres I type Like navaho Joined: Mar 2000 Posts: 21,079 Likes: 3 Texas	I see (sorry for not reading your whole post ). Josh's mod probably wasn't designed to work with all possible links, looks like his example was just for searches, tho probably it can be adapted for the other urls. Which ones are you especially interested in? - Allen - What Drives You?

Re: Spider Friendly URL's and You :) SurfMinister #281242 04/15/2006 7:38 AM
Joined: Jan 2003 Posts: 3 Jongerenpraat Lurker
Jongerenpraat Lurker Joined: Jan 2003 Posts: 3	Ah okay, if google doesn't need those links or it doesn't matter for how good google finds you, it's no problem. Last edited by Jongerenpraat; 04/15/2006 7:39 AM.

Hop To

Link Copied to Clipboard

Donate Today!

Donate to UBBDev today to help aid in Operational, Server and Script Maintenance, and Development costs.

Please also see our parent organization VNC Web Services if you're in the need of a new UBB.threads Install or Upgrade, Site/Server Migrations, or Security and Coding Services.

Recommended Hosts

We have personally worked with and recommend the following Web Hosts:
• Stable Host
• bluehost
• InterServer

Visit us on Facebook

April
S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Member Spotlight

hatter
USA

Posts: 69
Joined: January 2001

Forum Statistics

Forums63

Topics37,573

Posts293,925

Members13,849

Most Online5,166
Sep 15th, 2019

Today's Statistics

Currently Online

Topics Created

Posts Made

Users Online

Birthdays