flegscrap

François Briatte
July 6, 2012

 

flegscrap is a scraper for French legislative output. The first version focuses on the lower parliamentary chamber.

The code was written in R. It works by scraping pages with libcurl via RCurl in a prudent way.

 

The draft code is publicly available as a GitHub repository. The whole idea was an excuse to try that out.

The repo is at https://github.com/briatte/flegscrap but you can also get it from its GitHub page.

Code routine


This is not the optimal solution.

Optimal solutions


The code could also be improved by writing more functions.

The workflow in RStudio looks like this:

The screenshot is likely not to reflect the actual repo code.

The code (getURL, regex, etc.) looks like this:

# wrapper
lapply(l, FUN = function(x) {
  # extract index ...
  # prudent hash ...
})

# merge
...

# variables
...

# dataset

Preliminary findings

Scraping the propositions de loi (parliamentary bills):

SessionYears Prime Ministers # scraped
8 1986-1988 Chirac, Rocard 674
9 1988-1993 Rocard, Cresson, Bérégovoy 1168
10 1993-1995 Balladur 1554
1995-1997 Juppé
11 1997-2002 Jospin, Raffarin 1317
122002-2007 Raffarin, Villepin, Fillon 1648
132007-2012 Fillon, Ayrault 2001

Year are colored by presidential left-wing and right-wing party affiliations.
Mixed colors indicate split executive government periods.

That should be an exhaustive count.

Scraping the projets de loi (governmental bills):

SessionYears Prime Ministers # scraped
8 1986-1988 Chirac, Rocard 205
9 1988-1993 Rocard, Cresson, Bérégovoy 722
10 1993-1995 Balladur 512
1995-1997 Juppé
11 1997-2002 Jospin, Raffarin 585
122002-2007 Raffarin, Villepin, Fillon 617
132007-2012 Fillon, Ayrault 616

Year are colored by presidential left-wing and right-wing party affiliations.
Mixed colors indicate split executive government periods.

That should be an exhaustive count.

Trends in bills submission

Most debated bills

> gsub("(.)+\\d+/dossiers/|.asp","",sort(d$url[d$ecd > .97],decreasing=TRUE))

 [1] "reforme_hopital"                     "reforme_collectivites_territoriales"
 [3] "reforme_5eme"                        "nomination_audiovisuel_public"      
 [5] "modernisation_economie"              "loi_finances_2012"                  
 [7] "loi_finances_2011"                   "loi_finances_2010"                  
 [9] "loi_finances_2009"                   "loi_finances_2008"                  
[11] "internet"                            "immigration_integration_nationalite"
[13] "grenelle_environnement2"             "defenseur_droits"                   
[15] "secteur_energie"                     "prevention_delinquance"             
[17] "loi_finances_2007"                   "loi_finances_2006"                  
[19] "immigration_integration"             "engagement_national_logement"       
[21] "041800"                              "031206"                             
[23] "031093"                              "020230"                             
[25] "991835"                              "991805"                             
[27] "991786"                              "981078"                             
[29] "981071"                              "980977"                             
[31] "013262"                              "002585"                             
[33] "002415"                              "002131"                             
[35] "temps_travail_entreprise"            "981119"

Unequal debate allocations (1): debate/no debate

Showing only items for which there was at least one public debate.

Unequal debate allocations (2): ECDF visualization

Showing only items for which there was at least one public debate.

Unequal debate allocations (3): case studies

Showing only items for which there was at least one public debate.

Unequal delays in promulgation

Showing only items for which there was at least one public debate.

The plan is now to consolidate everything and to compare with the existing literature.

 

Feedback and comments are very welcome:

I'm at f.briatte@ed.ac.uk and f.briatte.org. If you prefer Twitter, you can reach me at @phnk and @politbistro.

 

Other projects are in the works:

Kudos to Alex Storer, who is working on a scraper for the American Congress.

 

Thanks!