Du meinst, das HTML entfernen?
Es klappt nicht immer mit dem Regex.
Probier mal das auf der Konsole, dann siehst du es:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
use strict;
use warnings;
my $string = <<'HTML';
<script><!--
if (a<b && a>c) alert('<');
//-->
</script>
<IMG SRC = "foo.gif" ALT = "A > B">
<IMG SRC = "foo.gif"
ALT = "A > B">
<!-- <A comment> -->
<script>if (a<b && a>c)</script>
<# Just data #>
<![INCLUDE CDATA [ >>>>>>>>>>>> ]]>
<!-- This section commented out.
<B>You can't see me!</B>
-->
HTML
$string =~ s/<.+?>//g;
print $string;
Ausgabe:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<!--
if (ac) alert('<');
//-->
B">
<IMG SRC = "foo.gif"
ALT = "A > B">
-->
if (ac)
>>>>>>>>>>> ]]>
<!-- This section commented out.
You can't see me!
-->