Thread Haeufigkeit von Woertern: chatlog analysieren (5 answers)
Opened by styx-cc at 2006-10-18 21:40

bloonix
 2006-10-18 21:59
#70984 #70984
User since
2005-12-17
1615 Artikel
HausmeisterIn
[Homepage]
user image
Code: (dl )
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
use strict;
use warnings;
use Data::Dumper;

my @vfile = (
  "(00:10) <Henda> =(\n",
  "(00:10) <pHyL> welche denn ? seufzt :[\n",
  "(00:12) <styx> mhmh\n",
  "(00:12) <styx> jetzt passieren mhmh kommische dinge\n",
  "(00:13) <Henda> welche denn?\n",
);

my %stats;

while (defined (my $line = shift @vfile)) {
  my ($time, $user, @words) = split /\s+/, $line;
  foreach my $word (@words) {
     next if $word =~ /[;:=][\W]+/ || length($word) < 3;
     $word =~ s/(\w+)\W+$/$1/; # wegen "denn" und "denn?"
     $stats{$word}++;
  }
}

print Dumper(\%stats);


$VAR1 = {
         'jetzt' => 1,
         'seufzt' => 1,
         'kommische' => 1,
         'passieren' => 1,
         'mhmh' => 2,
         'denn' => 2,
         'welche' => 2,
         'dinge' => 1
       };


Da für Smilies aber auch sehr viele Buchstaben genutzt werden, ist das
nicht ganz so einfach mit der Regexp...\n\n

<!--EDIT|opi|1161241133-->
What is a good module? That's hard to say.
What is good code? That's also hard to say.
One man's Thing of Beauty is another's man's Evil Hack.

View full thread Haeufigkeit von Woertern: chatlog analysieren