XML::LibXML - "Malformed UTF-8 Character (fatal" vermeiden (gelöst) - #149247 (Allgemeines zu Perl)

pktm

2011-05-30 14:57

User since
2003-08-07
2921 Artikel
BenutzerIn

Ich bekomme hier das gleich Ergebnis mit latin1-kodiertem Text und utf-8-kodiertem Text:

Quote
test with latin1: 1
test with utf-8: 1

This is perl, v5.10.0 built for darwin-thread-multi-2level.

Mache ich eventuell etwas falsch?

Code: (dl )

#!perl

use strict;
use warnings;
use FileHandle;

print "test with latin1: " . test('test-iso.txt', 'latin1') . "\n";
print "test with utf-8: " . test('test-utf8.txt', 'utf-8') . "\n";

sub test {
	my $file = shift;
	my $encoding = shift;
	my $fh = FileHandle->new($file, "<:encoding(".$encoding.")") or die('Cannot open file: ' . $!);
	my $content = join"", ($fh->getlines());
	
	my $is_utf = is_utf8($content);
	
	return $is_utf;
} # /test

# Testet, ob ein Text UTF-8 kodiert ist
sub is_utf8{
        my $text = shift;
        no warnings;
        use bytes; # es sind bytes zu verglichen

        # text in latin umwandeln, iso-8859-1
        my $iso = pack('C*', unpack('U0U*', $text));
        # diesen text wieder in utf-8 kodieren
        my $utf = pack('U0U*', unpack('C*', $iso));
        # wenn beide Bytes-Ketten gleich sind, ist $text utf-8-kodiert
        return ($utf eq $text) ? 1 : 0;
} # /is_utf8

In der Datei steht sowas hier:

Quote
test-iso.txt

Diese Datei ist latin1-Kodiert.

http://www.intergastro-service.de (mein erstes CMS :) )