wget

Fri May 26 23:09:15 UTC 2006

On Fri, 2006-05-26 at 16:15 -0400, Daniel Armstrong wrote:
> I am trying out wget to do unattended downloads by attempting to
> retrieve a series of video files, but it fails because it first tries
> to download the index.html file which - in this particular case - the
> directory does not have.
> 
> The situation:
> 
> If I point Firefox to http://ocw.mit.edu/ans7870/7/7.012/f04/video/ I
> get a blank page... I take this to mean there is no index.html file at
> this location?
> 

You got a blank because more than likely the file was actually a
directory, causing wget to give up.

To do systematic downloads you have two choices: either download
specific files, or mirror the entire site. I don't know of an
"in-between" solution, and I have done both.

I have downloaded specific files over wget in a systematic way in order
to get, say things like MP3s of radio simulcasts of programs I like to
hear/play frequently. For those, I write a perl script, and keep the
site and path in one string, and for the specific filename, I only vary
the changeable parts of the string by listing them in an array. I invoke
wget on the string as part of a "system ();" command.

You can do the same things in bash without using a system command; it's
just that dealing with arrays a more of a bother if they involve
counting (another one of mine does).

> If I use wget to download a single video file from this location:
> 
> wget \ http://ocw.mit.edu/ans7870/7/7.012/f04/video/ocw-7.012-lec-mit-10250-22oct2004-1000-220k.rm
> 
> ...it works as expected.
> 
> But I would like to know how to use wget to download *all* the video
> files of a certain compression size with a single command. I checked
> the manpage and used the "-A" option to specify a filetype, using this
> command:

You need to make a list, using specific filenames. I don't know of any
truly automated ways to do this; just mildly clever ways that attempt to
cut down the work, such as downloading a file list, cutting it down in
vi to the ones I want, then reading this list into an array (this is
easy in BASH). wget with a wildcard will cause the shell to fill the
wildcard with all of the files it can find --- at YOUR end of the
connection. Then, of course, it is not going to find anything to match
at the other end.

> 
> wget -A "*220k.rm" http://ocw.mit.edu/ans7870/7/7.012/f04/video/
> 
> ...which returns the following error...
> 
> --16:10:53--  http://ocw.mit.edu/ans7870/7/7.012/f04/video/
>            => `index.html'
> Resolving ocw.mit.edu... 209.123.81.89, 209.123.81.96
> Connecting to ocw.mit.edu|209.123.81.89|:80... connected.
> HTTP request sent, awaiting response... 404 Not Found
> 16:10:53 ERROR 404: Not Found.
> 
> How do I manage to setup wget to ignore the fact that there is no
> index.html at this location, and just download the *.rm files I
> requested? wget would be a perfect tool for downloading a series of
> files like this unattended vs. downloading each file by hand
> one-by-one... Thanks in advance for any help.
> 
> 
> 

--
The Toronto Linux Users Group.      Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml