Previous Thread
Next Thread
Print Thread
Rate Thread
Joined: Apr 2002
Posts: 102
Journeyman
Journeyman
Offline
Joined: Apr 2002
Posts: 102
I've seen the subject brought up before.. It would be really handy if you had a script that could spider your forums, saving the pages locally on the server as static HTML pages. This would allow users to browse a super fast copy or you could make CD-R's of your site which are browseable offline. I know there are programs out there to save webpages but they never seem to work all that well with forums..

The script would have to:
* Modify the links in the pages to point to other static pages instead of the usual links to the dynamic site.
* Save images locally that are linked to and change the corresponding img tags.
etc..

It would be quite a project but could be done.. Ideas?

Sponsored Links
Joined: Apr 2001
Posts: 3,266
Member
Member
Offline
Joined: Apr 2001
Posts: 3,266

Joined: Apr 2002
Posts: 102
Journeyman
Journeyman
Offline
Joined: Apr 2002
Posts: 102
To me it looks like that just converts the URL to a spider friendly one and doesn't actually save a new copy of each page as an HTML file... am I wrong? Their page doesn't exactly list what all features it includes..

I guess it would make it easier to spider with archiving software but I would hate to have to buy that as well as the archiving software..

Joined: Apr 2001
Posts: 3,266
Member
Member
Offline
Joined: Apr 2001
Posts: 3,266
I beleive it converts the pages to html through this hack

Here is a link from Allen's board which uses this

http://www.praisecafe.org/boards/ubb-get_topic-f-9-t-000240.html

Joined: Nov 2001
Posts: 10,369
I type Like navaho
I type Like navaho
Joined: Nov 2001
Posts: 10,369
No I think it just uses rewrite to make them "appear" as HTML pages. I have not ever seen a hack that physically exports the pages to HTML so that you could download and browse offline etc....

Sponsored Links
Joined: May 2003
Posts: 1,068
Junior Member
Junior Member
Offline
Joined: May 2003
Posts: 1,068
[]3DSHROOM said:
To me it looks like that just converts the URL to a spider friendly one and doesn't actually save a new copy of each page as an HTML file... am I wrong? Their page doesn't exactly list what all features it includes..

I guess it would make it easier to spider with archiving software but I would hate to have to buy that as well as the archiving software.. [/]

You can get more info on this here https://www.ubbdev.com/ubbcgi/ultimatebb.cgi?ubb=get_topic;f=10;t=002400;p= and if you write the developer, Micah [][email protected][/] he will most likely be happy to answer your questions.

Joined: Jun 2001
Posts: 3,273
That 70's Guy
That 70's Guy
Offline
Joined: Jun 2001
Posts: 3,273
I suppose it is possible that one could construct a script that would scan forum pages that are open to the public, extract links and then send a GET request for those links and then store the returned information as an html file.

What a nightmare that would be. lol

Joined: Apr 2002
Posts: 102
Journeyman
Journeyman
Offline
Joined: Apr 2002
Posts: 102
heh, yeah


Joined: May 1999
Posts: 1,715
Addict
Addict
Joined: May 1999
Posts: 1,715
wget is your friend.

Type this in a unix shell account:
Code
wget -Ekpmnv https://ubbdev.com



And you will have a complete working local copy of threadsdev. After a while, it'll take some time downloading everything. =]

If you are on windows, you could use something like cygwin to use the wget program. Don't ask me how cygwin works though, because I don't know that.

There is a windows gui available on this site: http://www.jensroesner.de/wgetgui/

Joined: Jan 2000
Posts: 796
MTO Offline
Addict
Addict
Offline
Joined: Jan 2000
Posts: 796
A nice thing of archiving would be that once archived database could trimeed, reducing server load.

Garderner, out of curiosity... so you are saying wget could be run on the server side? I sort of tested wget (never worked for me) on my Firebird, as an extension, but it needed some updating so it didnt work... But being on the server side sounds very interesting.

Sponsored Links
Joined: Apr 2002
Posts: 102
Journeyman
Journeyman
Offline
Joined: Apr 2002
Posts: 102
wget seems to have a few problems converting the urls to ones that are valid filenames.. you can't have question marks in the filenames... using the Spider script, wget would probably work quite well.. and yes he probably means server side, although it could be done from any machine..

Joined: May 1999
Posts: 1,715
Addict
Addict
Joined: May 1999
Posts: 1,715
Well, I ran wget with the arguments I gave and I got a complete local copy of my own site. It changes all links so that they work with the saved files, whether there are question marks or not.

It's not really server side, since wget is a http client (like a browser) but I ran it on the same machine as the site was on. But as long as you have wget you can run it from any place you want to download the files, it'll just take a bit longer to transfer everything over internet.


Link Copied to Clipboard
Donate Today!
Donate via PayPal

Donate to UBBDev today to help aid in Operational, Server and Script Maintenance, and Development costs.

Please also see our parent organization VNC Web Services if you're in the need of a new UBB.threads Install or Upgrade, Site/Server Migrations, or Security and Coding Services.
Recommended Hosts
We have personally worked with and recommend the following Web Hosts:
Stable Host
bluehost
InterServer
Visit us on Facebook
Member Spotlight
isaac
isaac
California
Posts: 1,157
Joined: July 2001
Forum Statistics
Forums63
Topics37,573
Posts293,925
Members13,849
Most Online5,166
Sep 15th, 2019
Today's Statistics
Currently Online
Topics Created
Posts Made
Users Online
Birthdays
Top Posters
AllenAyres 21,079
JoshPet 10,369
LK 7,394
Lord Dexter 6,708
Gizmo 5,833
Greg Hard 4,625
Top Posters(30 Days)
Top Likes Received
isaac 82
Gizmo 20
Brett 7
WebGuy 2
Top Likes Received (30 Days)
None yet
The UBB.Developers Network (UBB.Dev/Threads.Dev) is ©2000-2024 VNC Web Services

 
Powered by UBB.threads™ PHP Forum Software 8.0.0
(Preview build 20221218)