Thread Regex für Spamfilter bei Nicht-Ascii
(33 answers)
Opened by GwenDragon at 2012-06-17 18:27
Leider greift bei der Suche nach Spamworten bei Nicht-Ascii Wordboundary \b nicht.
Code (perl): (dl
)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 #!/usr/bin/perl use 5.008; use strict; use warnings; use locale ':not_characters'; my @words = ( 'ルイビトン!', 'ルイビトン', 'V1$gRa!', 'Viagra', 'интернет!', 'товары', ); while (my $line = <DATA>) { for my $spamword (sort @words) { print ($line =~ /\b\Q$spamword\E\b/i ? "Spam: $spamword -> $line" : ''); } } __END__ __DATA__ CAT & CATZE ルイビトン ルイビトン! 8I_Iy ViAGra! Test.Toast интернет! Perl's son is not Tim Towdy! Was mache ich falsch? |