tv_extractinfo_en (1)
Leading comments
Automatically generated by Pod::Man 4.09 (Pod::Simple 3.35) Standard preamble: ========================================================================
NAME
tv_extractinfo_en - read English-language listings and extract info from programme descriptions.SYNOPSIS
tv_extractinfo_en [--help] [--outputFILE
] [FILE...
]
DESCRIPTION
ReadXMLTV
data and attempt to extract information from
English-language programme descriptions, putting it into
machine-readable form. For example the human-readable text '(repeat)'
in a programme description might be replaced by the XML
element
<previously-shown>.
--output
FILE
write to FILE
rather than standard output
This tool also attempts to split multipart programmes into their constituents, by looking for a description that seems to contain lots of times and titles. But this depends on the description following one particular style and is useful only for some listings sources (Ananova).
If some text is marked with the 'lang' attribute as being some language other than English ('en'), it is ignored.
SEE ALSO
xmltv(5).AUTHOR
Ed Avis, ed@membled.comBUGS
Trying to parse human-readable text is always error-prone, more so with the simple regexp-based approach used here. But becauseTV
listing descriptions usually conform to one of a few set styles,
tv_extractinfo_en does reasonably well. It is fairly conservative,
trying to avoid false positives (extracting 'information' which
isn't really there) even though this means some false negatives
(failing to extract information and leaving it in the human-readable
text).
However, the leftover bits of text after extracting information may not form a meaningful English sentence, or the punctuation may be wrong.
On the two listings sources currently supported by the
XMLTV
package,
this program does a reasonably good job. But it has not been tested
with every source of anglophone TV
listings.