UBB.Dev

[7.5.8] Better URL Sanitization for SEO

Posted By: isaac

[7.5.8] Better URL Sanitization for SEO - 03/18/2014 9:20 AM

Requirements:
1. Valid UBB.Threads 7.5.8 install and license.
2. The PATH_INFO environmental variable must be available for this feature to function properly.

About:
This mod converts UBBT's Spider-Friendly URL string to lower-case and strips it of HTML tags. It then uses php's regular expression "replace" to replace everything not a letter or a number with dashes (it also replaces spaces). Next, it replaces all double-dashes with a single dash (if the topic title string had "nom – nom nom" that previously would be four dashes – now it's just two) and then finally, it trims any extra dashes from the beginning and end of the string.

Example 1 -
BEFORE:
ubbthreads.php/topics/45/I_Like...._TURTLES!!!!.html
AFTER:
ubbthreads.php/topics/45/i-like-turtles.html

Example 2 -
BEFORE:
ubbthreads.php/topics/44/LoL,_?,_<,_>,_",_&,_,,_+,_|,_!,__,_#,_\,_^,_{,_},_=,_:,.html
AFTER:
ubbthreads.php/topics/44/lol.html

Warning:
If your forum's language uses any non UTF-8 characters, such as the Swedish å, ä and ö, they will be stripped. See Example 2 above.

Notes:
Using dashes/hyphens (-) rather than underscores (_) for spider-friendly URLs is the recommended format to follow and is this current standard rather than use the older style of underscores-for-spaces, which UBB.Threads uses. Some further reading on the use of hyphens (-) vs underscores (_):
http://www.ecreativeim.com/blog/2011/03/seo-basics-hyphen-or-underscore-for-seo-urls/

Quote
The short answer is that you should use a hyphen for your SEO URLs. Google treats a hyphen as a word separator, but does not treat an underscore that way. Google treats and underscore as a word joiner — so red_sneakers is the same as redsneakers to Google. This has been confirmed directly by Google themselves, including the fact that using dashes over underscores will have a (minor) ranking benefit.

Again, SEO URLs should use hyphens to separate words. Do not use underscores, do not try to use spaces, and do not smash all the words together intoonebigword. As of 2012, dashes are still the best way to optimize your SEO URLs.


A video answering the hyphen vs underscore SEO URL question by Matt Cutts.
Matthew "Matt" Cutts leads the Webspam team at Google, and works with the search quality team on search engine optimization issues
YouTube Video: https://www.youtube.com/watch?v=AQcSFsQyct8


How-To:
Confirm that Spider-friendly URLs are turned on:
Control Panel > Primary Settings > Advanced Options > Enable Spider-friendly URLs TICK-BOX
Optional: Tick also the "Enable HTML Extension" box.


FIND IN libs/ubbthreads.inc.php:
Code
	$title = ubbchars($title);
	$title = str_replace(' ', '_', trim($title));
	$title = str_replace( '%', '_', $title );
	$title = substr($title, 0, 30);


REPLACE WITH:
Code
	//SEO-friendly URL String Converter
	//ex) this is an example -> this-is-an-example
	$title = str_replace(array("&amp;","&nbsp;"), " ", $title); //replace space and ampersand markup
	$title = str_replace(array("&quot;","'"), "", $title); //replace quote markup
	$title = mb_convert_case($title, MB_CASE_LOWER, "UTF-8"); //convert to lowercase
	$title = preg_replace("#[^A-Za-z0-9]+#", "-", $title); //replace everything non-alphameric with dashes
	$title = preg_replace("#(-){2,}#", "$1", $title); //replace multiple dashes with one
	$title = trim($title, "-"); //trim dashes from beginning and end of string if any
	$title = substr($title, 0, 70);


done.

---

Further reading on why this simple modification is necessary in UBB.threads 7.5.8:
"Six show-stopping problems with UBBT 7.5.8's new Spider-Friendly SEO URLs"
http://www.ubbcentral.com/forums/ubbthreads.php/topics/255500
Posted By: Gizmo

Re: [7.5.8] Better URL Sanitization for SEO - 03/18/2014 10:30 AM

Great hack; for someone with a charset which uses non-latin characters they could convert those to latin based characters with another replace after:
Code
    $title = str_replace("&nbsp;", " ", $title);


Do something like:
Code
    $title = str_replace(array("å","ä","ö"), array("a","a","o"), $title);
Posted By: isaac

Re: [7.5.8] Better URL Sanitization for SEO - 07/27/2014 10:22 PM

I just noticed that quotes were being stored to the ubbt_TOPICS/TOPIC_SUBJECT table as markup ("&quot;") rather than absolutes. This is fine in most places throughout UBBT, but in this one location, it's not producing the identical URL we're looking for. It creates a new link for identical content, ie; raising "duplicate content" flags for "dumb" spiders/crawlers.

I've updated the OP by adding a new line to convert those quote markups.

Sometime in the future, I'll look in to why the "Link to this individual post" link (the post-icon at the top-left of each post/reply) is pulling from ubbt_TOPICS/TOPIC_SUBJECT rather than ubbt_POSTS/POST_SUBJECT -- and also why markup is being stored in the ubbt_TOPICS/TOPIC_SUBJECT table rather than absolute characters. It may just be something moving forward to a future feature "SD" mentioned on 05/10/2014, relating to getting away from being able to change the topic mid-discussion... or it may just be a UBBT bug?

EDIT: The post link pulls from ubbt_TOPICS/TOPIC_SUBJECT rather than ubbt_POSTS/POST_SUBJECT as to avoid creating new URLs for the same page content.
Posted By: Gizmo

Re: [7.5.8] Better URL Sanitization for SEO - 07/27/2014 11:29 PM

Originally Posted by id242
relating to getting away from being able to change the topic mid-discussion
I've always been against changing the topic mid-discussion; in my opinion if the topic needs changed in a thread, it should probably need to be it's own thread too (in most cases). I believe that's why Rick, at one point, added the ability to rename the entire topic.
Posted By: isaac

Re: [7.5.8] Better URL Sanitization for SEO - 07/28/2014 12:23 AM

UPDATED, once again:

Tightened up the code a bit.

I've also adjusted the $title string length so that it will show the full 50 chars topic title in the URL, instead of just the truncated 30.

The default subject title is 50 chars. My setting of "70" should more than cover that.

The main intentions of this modification are to 1) better sanitize the URL, and 2) improve the URL for the user to know where he's about to arrive before clicking the link... and SEO.
Posted By: Mark_S

Re: [7.5.8] Better URL Sanitization for SEO - 09/05/2015 4:25 AM

Is this in the newer versions by default or is it a users choice ?
I'm on 7.5.8 or do i wait for v6 my character set is currently set to iso-8859-1 in UBB language file.

My conversion didn't go as expected on my dev board so ive not changed on my live board. . . Click Me
Posted By: Gizmo

Re: [7.5.8] Better URL Sanitization for SEO - 09/05/2015 1:33 PM

This is the latest build of the url's, the same that's included in 7.6.0 (the latest Snapshot build is Fri Aug 28 2015).

As for your issue with converting to UTF8, aren't some of the characters used on your forum multibite? If so, you can't just move over to UTF8 as it doesn't support those characters. We've written a Wiki article regarding this issue at UTF-8 vs Latin-1 (ISO-8859-1), which also has links to several character set related issues.

I replied to your thread at Central for the second issue, as to not derail this thread.
Posted By: Mark_S

Re: [7.5.8] Better URL Sanitization for SEO - 09/06/2015 4:56 PM

Thanks for the Feedback Gizmo, appreciated.
I'm going to implement the code above, as i think its hurting my search results, and not sure when v6 will be installed on my live forum.
As always its good to have you guys as guidance.
Posted By: Gizmo

Re: [7.5.8] Better URL Sanitization for SEO - 09/06/2015 10:26 PM

The 7.6.0 snapshots should be stable enough to run on your forums; it's what we've been running here for some time now, if you wanted to test it with a live group.
Posted By: Mark_S

Re: [7.5.8] Better URL Sanitization for SEO - 09/07/2015 12:32 AM

Im considering doing that Gizmo.
I have time to give it a go and pick up the pieces too.
Posted By: Mark_S

Url fails - 09/27/2015 2:50 PM

On my live forum.
Running 7.58 with ID242 seo hack in place.

Im on a mobile phone, using a hotel wifi. The following link fails.
Code
http://www.wikiwirral.co.uk/forums/ubbthreads.php/topics/984942/106-to-112-bentinck-street-help-photos.html#Post984942


The actual subject is thus

106 to 112 Bentinck Street help / photos

If i pull it back to this by editing in my browser

http://www.wikiwirral.co.uk/forums/ubbthreads.php/topics/984942

Will get me there no problem

But the original via my mobile when clicking on it eventually throws this at me

http://192.168.250.1/info

The current connection is thus. "PALACE_WIFI"(50:17:ff:f4:94:00)

IP address: 192.168.248.183
Lease duration: 900 sec
Gateway: 192.168.248.1
Netmask: 255.255.252.0
DNS1: 8.8.8.8
DNS2: 195.10.102.11
Server IP: 192.168.248.1
Link speed: 144 Mbps
Hidden SSID: No

Just feedback. As this hasn't happened to me before. As im away i can't compare with a desktop pc.

The preview button, trunkated the url so its in a code block now.

The recent post island and New Topic island and the topic subject pull the same extra long broken url.

However i have a most viewed custom hack post island with the following url
http://www.wikiwirral.co.uk/forums/ubbthreads.php/topics/984942.html

And thats working just fine.

Just feed back, incase there is an issue with / in the subject line.
Posted By: Gizmo

Re: Url fails - 09/27/2015 10:04 PM

Well, accessing the URL from a desktop works without issue, and I tested on my cell as well without any issue. Likely the cause is the hotel wifi, if you've never experienced the issue before.

As for getting the URL http://192.168.250.1/info it's a local IP to the network you're connected to (as 192.x is a private range that isn't assigned online, just like 10.x).

FYI, the code that is in Isaac's mod is what's in 7.6.0, which means it's also what we've been running here for months (prior to that we where running the mod, so years); if you've never experienced the issue here then I think it'd be safe to say it's probably the hotel's connection.
Posted By: isaac

Re: Url fails - 09/27/2015 10:16 PM

Originally Posted by Mark_S
On my live forum.

Code
http://www.wikiwirral.co.uk/forums/ubbthreads.php/topics/984942/106-to-112-bentinck-street-help-photos.html#Post984942


...the original via my mobile when clicking on it eventually throws this at me

http://192.168.250.1/info

Originally Posted by Mark_S
The recent post island and New Topic island and the topic subject pull the same extra long broken url.


It looks like the hotel's wifi connection to the outside network (internet) had a hiccup or just timed-out.

Your error could have happened to any other internet website at that exact time.

Originally Posted by Mark_S
Just feed back, incase there is an issue with / in the subject line.


From the URL you've posted, it doesnt look like the topic's "slash" had anything to do with the problem you had. The topic's "slash" didnt even make it in to the url.

The sixth line of sanitization code in the OP says: Convert anything that is not an A-z, a-z, 0-9 character (ie, "A-Za-z0-9") to a dash. It basically sanitizes the whole topic, for inclusion in to the URL.
© 2018 UBB.Developers