Thread Crawler : Mehrsprachige Webseite
(4 answers)
Opened by kimmy at 2014-05-26 16:56 2014-05-26T14:56:27 kimmyManche Webseiten werten den Request-Header Accept-Language aus. Beispielsweise hat mein Opera: Accept-Language: de,de-DE;q=0.9,en;q=0.8 Schnippsel: Code (perl): (dl
)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 use LWP::UserAgent; my $url = 'http://www2.mouser.com/Search/Refine.aspx?Keyword=FODM8801A'; my $ua = LWP::UserAgent->new; $ua->default_header('Accept-Language' => 'de'); # deutsch $ua->agent('Mozilla/5.0'); # Sinnvollen Useragent, da manche LWP blocken my $html; my $response = $ua->get($url); if ($response->is_success) { $html = $response->decoded_content; } else { die $response->status_line; } print "$html\n"; |