flegscrap
François Briatte
July 6, 2012
François Briatte
July 6, 2012
flegscrap is a scraper for French legislative output. The first version focuses on the lower parliamentary chamber.
The code was written in R. It works by scraping pages with libcurl
via RCurl
in a prudent way.
The draft code is publicly available as a GitHub repository. The whole idea was an excuse to try that out.
The repo is at https://github.com/briatte/flegscrap
but you can also get it from its GitHub page.
This is not the optimal solution.
The code could also be improved by writing more functions.
getURL
, regex, etc.) looks like this:
# wrapper lapply(l, FUN = function(x) { # extract index ... # prudent hash ... }) # merge ... # variables ... # dataset
Session | Years | Prime Ministers | # scraped |
---|---|---|---|
8 | 1986-1988 | Chirac, Rocard | 674 |
9 | 1988-1993 | Rocard, Cresson, Bérégovoy | 1168 |
10 | 1993-1995 | Balladur | 1554 |
1995-1997 | Juppé | ||
11 | 1997-2002 | Jospin, Raffarin | 1317 |
12 | 2002-2007 | Raffarin, Villepin, Fillon | 1648 |
13 | 2007-2012 | Fillon, Ayrault | 2001 |
Year are colored by presidential left-wing and right-wing party affiliations.
Mixed colors indicate split executive government periods.
That should be an exhaustive count.
Session | Years | Prime Ministers | # scraped |
---|---|---|---|
8 | 1986-1988 | Chirac, Rocard | 205 |
9 | 1988-1993 | Rocard, Cresson, Bérégovoy | 722 |
10 | 1993-1995 | Balladur | 512 |
1995-1997 | Juppé | ||
11 | 1997-2002 | Jospin, Raffarin | 585 |
12 | 2002-2007 | Raffarin, Villepin, Fillon | 617 |
13 | 2007-2012 | Fillon, Ayrault | 616 |
Year are colored by presidential left-wing and right-wing party affiliations.
Mixed colors indicate split executive government periods.
That should be an exhaustive count.
> gsub("(.)+\\d+/dossiers/|.asp","",sort(d$url[d$ecd > .97],decreasing=TRUE)) [1] "reforme_hopital" "reforme_collectivites_territoriales" [3] "reforme_5eme" "nomination_audiovisuel_public" [5] "modernisation_economie" "loi_finances_2012" [7] "loi_finances_2011" "loi_finances_2010" [9] "loi_finances_2009" "loi_finances_2008" [11] "internet" "immigration_integration_nationalite" [13] "grenelle_environnement2" "defenseur_droits" [15] "secteur_energie" "prevention_delinquance" [17] "loi_finances_2007" "loi_finances_2006" [19] "immigration_integration" "engagement_national_logement" [21] "041800" "031206" [23] "031093" "020230" [25] "991835" "991805" [27] "991786" "981078" [29] "981071" "980977" [31] "013262" "002585" [33] "002415" "002131" [35] "temps_travail_entreprise" "981119"
The plan is now to consolidate everything and to compare with the existing literature.
I'm at f.briatte@ed.ac.uk and f.briatte.org. If you prefer Twitter, you can reach me at @phnk and @politbistro.
Kudos to Alex Storer, who is working on a scraper for the American Congress.