Wednesday, 27 August 2014

Using Perl to scrape a website

I am interested in writing a perl script that goes to the following link and extracts

the number 1975:




That website is the amount of white men born in the year 1923 who live in San Diego

County, California in 1940. I am trying to do this in a loop structure to generalize

over multiple counties and birth years.

In the file, locations.txt, I put the list of counties, such as San Diego County.

The current code runs, but instead of the # 1975, it displays unknown. The number 1975

should be in $val\n.

I would very much appreciate any help!


use strict;

use LWP::Simple;

open(L, "locations26.txt");

my $url = '




open(O, ">out26.txt");
 my $oldh = select(O);
 $| = 1;
 while (my $location = <L>) {
     $location =~ s/ /+/g;
      foreach my $year (1923..1923) {
                 my $u = $url;
                 $u =~ s/%LOCATION%/$location/;
                 $u =~ s/%YEAR%/$year/;
                 #print "$u\n";
                 my $content = get($u);
                 my $val = 'unknown';
                 if ($content =~ / of .strong.([0-9,]+)..strong. /) {
                         $val = $1;
                 $val =~ s/,//g;
                 $location =~ s/\+/ /g;
                 print "'$location',$year,$val\n";
                 print O "'$location',$year,$val\n";

Update: API is not a viable solution. I have been in contact with the site developer.

The API does not apply to that part of the webpage. Hence, any solution pertaining to

JSON will not be applicbale.


No comments:

Post a Comment