urlgrabber (1)
Leading comments
Title: urlgrabber Author: Generator: DocBook XSL Stylesheets v1.72.0 <http://docbook.sf.net/> Date: 04/09/2007 Manual: Source:
NAME
urlgrabber - a high-level cross-protocol url-grabber.SYNOPSIS
urlgrabber [OPTIONS] URL [FILE]
DESCRIPTION
urlgrabber is a binary program and python module for fetching files. It is designed to be used in programs that need common (but not necessarily simple) url-fetching features.
OPTIONS
--help, -h
- help page specifying available options to the binary program.
--copy-local
- ignored except for file:// urls, in which case it specifies whether urlgrab should still make a copy of the file, or simply point to the existing copy.
--throttle=NUMBER
- if it's an int, it's the bytes/second throttle limit. If it's a float, it is first multiplied by bandwidth. If throttle == 0, throttling is disabled. If None, the module-level default (which can be set with set_throttle) is used.
--bandwidth=NUMBER
- the nominal max bandwidth in bytes/second. If throttle is a float and bandwidth == 0, throttling is disabled. If None, the module-level default (which can be set with set_bandwidth) is used.
--range=RANGE
- a tuple of the form first_byte,last_byte describing a byte range to retrieve. Either or both of the values may be specified. If first_byte is None, byte offset 0 is assumed. If last_byte is None, the last byte available is assumed. Note that both first and last_byte values are inclusive so a range of (10,11) would return the 10th and 11th bytes of the resource.
--user-agent=STR
- the user-agent string provide if the url is HTTP.
--retry=NUMBER
- the number of times to retry the grab before bailing. If this is zero, it will retry forever. This was intentional... really, it was :). If this value is not supplied or is supplied but is None retrying does not occur.
--retrycodes
- a sequence of errorcodes (values of e.errno) for which it should retry. See the doc on URLGrabError for more details on this. retrycodes defaults to -1,2,4,5,6,7 if not specified explicitly.
MODULE USE EXAMPLES
In its simplest form, urlgrabber can be a replacement for urllib2's open, or even python's file if you're just reading:
-
from urlgrabber import urlopen fo = urlopen(url) data = fo.read() fo.close()
-
from urlgrabber import urlopen local_filename = urlgrab(url) # grab a local copy of the file data = urlread(url) # just read the data into a string
-
* it's a little ugly to modify the default grabber because you have to reach into the module to do it * you could run into conflicts if different parts of the code modify the default grabber and therefore expect different behavior
-
from urlgrabber.grabber import URLGrabber g = URLGrabber() data = g.urlread(url)
-
from urlgrabber.grabber import URLGrabber g = URLGrabber(reget='simple') local_filename = g.urlgrab(url)
-
from urlgrabber.grabber import URLGrabber g = URLGrabber(reget='simple') local_filename = g.urlgrab(url, filename=None, reget=None)
AUTHORS
Written by: Michael D. Stenner <mstenner@linux.duke.edu> Ryan Tomayko <rtomayko@naeblis.cx>This manual page was written by Kevin Coyner <kevin@rustybear.com> for the Debian system (but may be used by others). It borrows heavily on the documentation included in the urlgrabber module. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 any later version published by the Free Software Foundation.
RESOURCES
Main web site: linux.duke.edu/projects/urlgrabber