1
2
3
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
1
2
3
4
5
6
7
8
use LWP::Simple;
use HTML::TreeBuilder;
# LWP Post request speichert Ergebnis in $response...
...
my $content = $response->content;
my $root = HTML::TreeBuilder->new_from_content($content);
QuoteParsing of undecoded UTF-8 will give garbage when decoding entities
(W) The first chunk parsed appears to contain undecoded UTF-8 and one or more argspecs that decode entities are used for the callback handlers.
The result of decoding will be a mix of encoded and decoded characters for any entities that expand to characters with code above 127. This is not a good thing.
The recommened solution is to apply Encode::decode_utf8() on the data before feeding it to the $p->parse(). For $p->parse_file() pass a file that has been opened in ":utf8" mode.
The alternative solution is to enable the utf8_mode and not decode before passing strings to $p->parse(). The parser can process raw undecoded UTF-8 sanely if the utf8_mode is enabled, or if the "attr", "@attr" or "dtext" argspecs are avoided.
QuoteDo not use this pragma for anything else than telling Perl that your script is written in UTF-8.