wget
Paul King
pking123-rieW9WUcm8FFJ04o6PK0Fg at public.gmane.org
Fri May 26 23:09:15 UTC 2006
On Fri, 2006-05-26 at 16:15 -0400, Daniel Armstrong wrote:
> I am trying out wget to do unattended downloads by attempting to
> retrieve a series of video files, but it fails because it first tries
> to download the index.html file which - in this particular case - the
> directory does not have.
>
> The situation:
>
> If I point Firefox to http://ocw.mit.edu/ans7870/7/7.012/f04/video/ I
> get a blank page... I take this to mean there is no index.html file at
> this location?
>
You got a blank because more than likely the file was actually a
directory, causing wget to give up.
To do systematic downloads you have two choices: either download
specific files, or mirror the entire site. I don't know of an
"in-between" solution, and I have done both.
I have downloaded specific files over wget in a systematic way in order
to get, say things like MP3s of radio simulcasts of programs I like to
hear/play frequently. For those, I write a perl script, and keep the
site and path in one string, and for the specific filename, I only vary
the changeable parts of the string by listing them in an array. I invoke
wget on the string as part of a "system ();" command.
You can do the same things in bash without using a system command; it's
just that dealing with arrays a more of a bother if they involve
counting (another one of mine does).
> If I use wget to download a single video file from this location:
>
> wget \ http://ocw.mit.edu/ans7870/7/7.012/f04/video/ocw-7.012-lec-mit-10250-22oct2004-1000-220k.rm
>
> ...it works as expected.
>
> But I would like to know how to use wget to download *all* the video
> files of a certain compression size with a single command. I checked
> the manpage and used the "-A" option to specify a filetype, using this
> command:
You need to make a list, using specific filenames. I don't know of any
truly automated ways to do this; just mildly clever ways that attempt to
cut down the work, such as downloading a file list, cutting it down in
vi to the ones I want, then reading this list into an array (this is
easy in BASH). wget with a wildcard will cause the shell to fill the
wildcard with all of the files it can find --- at YOUR end of the
connection. Then, of course, it is not going to find anything to match
at the other end.
>
> wget -A "*220k.rm" http://ocw.mit.edu/ans7870/7/7.012/f04/video/
>
> ...which returns the following error...
>
> --16:10:53-- http://ocw.mit.edu/ans7870/7/7.012/f04/video/
> => `index.html'
> Resolving ocw.mit.edu... 209.123.81.89, 209.123.81.96
> Connecting to ocw.mit.edu|209.123.81.89|:80... connected.
> HTTP request sent, awaiting response... 404 Not Found
> 16:10:53 ERROR 404: Not Found.
>
> How do I manage to setup wget to ignore the fact that there is no
> index.html at this location, and just download the *.rm files I
> requested? wget would be a perfect tool for downloading a series of
> files like this unattended vs. downloading each file by hand
> one-by-one... Thanks in advance for any help.
>
>
>
--
The Toronto Linux Users Group. Meetings: http://tlug.ss.org
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://tlug.ss.org/subscribe.shtml
More information about the Legacy
mailing list