WWW • man page

WWW (3)

Leading comments

World Wide Web Package
WWW.3

Copyright (C) 1998  Paul J. Lucas

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  ...

(The comments found at the beginning of the groff file "man3/WWW.3".)

NAME

WWW - World Wide Web Package

SYNOPSIS

extract_description( FILE )
extract_meta( FILE, NAME )
hyperlink( LIST )

DESCRIPTION

This package provides a utility functions for the World Wide Web to extract descriptions of or meta information from files, and hyperlink text.

SUBROUTINES

The following Perl subroutines are defined and available:

extract_description( FILE )

Extracts a description from an HTML or plain text file given by the FILE name; FILE should be an absolute path. The first $description::chars (default: 2048) characters are read. If the file ends in one of the extensions htm, html, or shtml, it is presumed to be an HTML file; if the file ends in txt, it is presumed to be a plain text file. Other extensions are not recognized and no description is returned for them.

For HTML files, first, if a <META NAME="description" CONTENT="..."> or a <META NAME="DC.description" CONTENT="..."> (Dublin Core) element is found, then the words specified as the value of the CONTENT attribute is returned as the description.

Otherwise, all HTML comments, text between <SCRIPT>, <STYLE>, and <TITLE> tags, and all other HTML tags are stripped. If <AREA ... ALT="..."> or <IMG ... ALT="..."> elements are found, then the words specified as the value of the ALT attributes are extracted.

Finally, for either HTML or plain text files, at most $description::words (default: 50) are returned.

extract_meta( FILE, NAME )

Extracts the value of the CONTENT attribute from a META element having the given NAME attribute from an HTML file given by the FILE name; FILE should be an absolute path. The file must end in one of the extensions htm, html, or shtml to be considered an HTML file. The first $description::chars (default: 2048) characters are read. The characters are cached between consecutive calls using the same filename.

hyperlink( LIST )

Adds hyperlinks to strings: that is strings that contain substrings that are valid URLs (according to RFC 1630) have the appropriate HTML tags ``wrapped'' around them so that they will be selectable when displayed in a browser. The ftp, gopher, http, https, mailto, news, telnet, and wais URLs are recognized. Example:



     Read all about it at
     www.usatoday.com

becomes:

     Read all about it at
     <A HREF="www.usatoday.com">www.usatoday.com</A>

AUTHOR

Paul J. Lucas <pauljlucas@mac.com>

WWW • man page

WWW • man page

WWW (3)

Leading comments

NAME

SYNOPSIS

DESCRIPTION

SUBROUTINES

SEE ALSO

AUTHOR

Installed via

Man Section

extra • Version

extra • Source

References

Referenced By