I've seen the subject brought up before.. It would be really handy if you had a script that could spider your forums, saving the pages locally on the server as static HTML pages. This would allow users to browse a super fast copy or you could make CD-R's of your site which are browseable offline. I know there are programs out there to save webpages but they never seem to work all that well with forums.. <br /> <br />The script would have to: <br />* Modify the links in the pages to point to other static pages instead of the usual links to the dynamic site. <br />* Save images locally that are linked to and change the corresponding img tags. <br />etc.. <br /> <br />It would be quite a project but could be done.. Ideas?
To me it looks like that just converts the URL to a spider friendly one and doesn't actually save a new copy of each page as an HTML file... am I wrong? Their page doesn't exactly list what all features it includes.. <br /> <br />I guess it would make it easier to spider with archiving software but I would hate to have to buy that as well as the archiving software..
#260851 - 10/17/0311:17 AMRe: Converting forum to static HTML pages
[Re: sf49rminer]
JoshPet
I type Like navaho
Registered: 11/29/01
Posts: 11330
Loc: Charlotte, NC
No I think it just uses rewrite to make them "appear" as HTML pages. I have not ever seen a hack that physically exports the pages to HTML so that you could download and browse offline etc....
[]3DSHROOM said:<br />To me it looks like that just converts the URL to a spider friendly one and doesn't actually save a new copy of each page as an HTML file... am I wrong? Their page doesn't exactly list what all features it includes.. <br /><br />I guess it would make it easier to spider with archiving software but I would hate to have to buy that as well as the archiving software.. [/]<br /><br />You can get more info on this here http://www.ubbdev.com/ubbcgi/ultimatebb.cgi?ubb=get_topic;f=10;t=002400;p= and if you write the developer, Micah []info@totalwebpackage.com[/] he will most likely be happy to answer your questions.
I suppose it is possible that one could construct a script that would scan forum pages that are open to the public, extract links and then send a GET request for those links and then store the returned information as an html file.<br /><br />What a nightmare that would be. lol
#260855 - 10/18/0312:08 PMRe: Converting forum to static HTML pages
[Re: AleksejVL]
Gardener
Addict
Registered: 05/11/99
Posts: 1956
Loc: Sweden, Uppsala
wget is your friend.<br /><br />Type this in a unix shell account:<br />
Code:
wget -Ekpmnv http://www.threadsdev.com
<br /><br /><br />And you will have a complete working local copy of threadsdev. After a while, it'll take some time downloading everything. =]<br /><br />If you are on windows, you could use something like cygwin to use the wget program. Don't ask me how cygwin works though, because I don't know that.<br /><br />There is a windows gui available on this site: http://www.jensroesner.de/wgetgui/
#260856 - 10/22/0312:44 PMRe: Converting forum to static HTML pages
[Re: c0bra]
MTO
Addict
Registered: 01/31/00
Posts: 1524
Loc: Burgos, Spain.
A nice thing of archiving would be that once archived database could trimeed, reducing server load. <img src="http://www.ubbdev.com/forum/images/graemlins/tongue.gif" alt="" /><br /><br />Garderner, out of curiosity... so you are saying wget could be run on the server side? I sort of tested wget (never worked for me) on my Firebird, as an extension, but it needed some updating so it didnt work... But being on the server side sounds very interesting. <img src="http://www.ubbdev.com/forum/images/graemlins/tongue.gif" alt="" />
wget seems to have a few problems converting the urls to ones that are valid filenames.. you can't have question marks in the filenames... using the Spider script, wget would probably work quite well.. and yes he probably means server side, although it could be done from any machine..
#260858 - 10/24/0310:07 AMRe: Converting forum to static HTML pages
[Re: AleksejVL]
Gardener
Addict
Registered: 05/11/99
Posts: 1956
Loc: Sweden, Uppsala
Well, I ran wget with the arguments I gave and I got a complete local copy of my own site. It changes all links so that they work with the saved files, whether there are question marks or not.<br /><br />It's not really server side, since wget is a http client (like a browser) but I ran it on the same machine as the site was on. But as long as you have wget you can run it from any place you want to download the files, it'll just take a bit longer to transfer everything over internet.