How to Govern Open Data ? The politics of open data portals.

17
Antoine Courmont Centre d’études européennes (CEE) / Sciences Po [email protected] How to Govern Open Data? The politics of open data portals Paper presented at the IPSA Conference, Madrid, July 812, 2012. Panel: Open Government Draft paper, please do not cite without the author’s permission. Abstract: Using a comparative overview of the open data portals, this paper analyses the open data portals as a means of regulation of the released data. In a first part, it shows how legal, economic and technical mediations may interact to enable or restrain the reuses of public data. The pluralism of ways to regulate data is pointed out. In a second part, it describes the relationships between governments and their constituents defined by the alignments of those mediations. It argues that three models of open data portals are emerging: the pipe model, the platform model and the multifront approach model.

Transcript of How to Govern Open Data ? The politics of open data portals.

Antoine  Courmont  Centre  d’études  européennes  (CEE)  /  Sciences  Po  

antoine.courmont@sciences-­‐po.org  

 

 

 

 

How  to  Govern  Open  Data?    

The  politics  of  open  data  portals    

 

 

Paper  presented  at  the  IPSA  Conference,  Madrid,  July  8-­‐12,  2012.  

Panel:  Open  Government  

 

 

 

Draft  paper,  please  do  not  cite  without  the  author’s  permission.  

 

 

Abstract:    

Using   a   comparative   overview   of   the   open   data   portals,   this   paper   analyses   the   open  data  portals  as  a  means  of  regulation  of  the  released  data.  In  a  first  part,   it  shows  how  legal,  economic  and  technical  mediations  may  interact  to  enable  or  restrain  the  re-­‐uses  of  public  data.  The  pluralism  of  ways  to  regulate  data  is  pointed  out.  In  a  second  part,  it  describes  the  relationships  between  governments  and  their  constituents  defined  by  the  alignments   of   those  mediations.   It   argues   that   three  models   of   open   data   portals   are  emerging:  the  pipe  model,  the  platform  model  and  the  multi-­‐front  approach  model.    

   

  2  

This   paper   presents   a   preliminary   analysis   of   open   data   portals   as   instruments   of  regulation  of  public  data.  Drawing  on  the  work  that  is  very  much  in  progress,  this  essay  is  quite  fragmented.  The  ambition  of  the  current  paper  is  limited  to  testing  its  hypothesis  and  research  questions.    

 

Introduction  

Information  and  data  are  key  tools  of  government.  Governments  gather  large  amounts  of  data   in   order   to   know   their   territory,   population   and   society   and   to   act   upon   them.  Without  data,  there  is  no  public  policy.  Since  few  years,  Public  Sector  Information  (PSI)  and  Freedom  of  Information  (FOI)  lobbies  have  pressured  governments  to  open  access  to   their   data   to   promote   transparency   and   commercial   re-­‐uses.   The   US   and   British  initiatives  have   launched  an   international  movement  of   setting  up  governmental  open  data  portals  all  over  the  world.  Actually,  opening  access  to  data  tends  to  become  a  legal  obligation  for  public  administrations.      

However,   administrations   could   be   reticent   to   open   access   to   their   data.   Indeed,   it  means  lose  their  monopoly  on  public  data.  This  exclusive  control  was  a  source  of  power,  and   governments   are   reluctant   to   lose   authority   and   influence.   Most   of   all,   opening  public   data   raises   some   risks   for   governments   (Bannister   et   Connolly,   2011).   Firstly,  transparency   reveals   the   functioning  of   the   state  apparatuses  of  power  and  may   force  them  to  explicit  their  action.  Open  government  data  threatens  the  formal  functioning  of  power  and  citizens  may  lose  confidence  in  state  action.  Secondly,  re-­‐uses  of  public  data  might   be   potentially   in   contradiction   with   governmental   actions   and   differ   from   the  common   purpose   sought   by   public   policies.   For   instance,  while   a   local   government   is  promoting  the  use  of  public  transport,  a  developer,  thanks  to  public  data  of  traffic  jam,  could  create  an  app  that  encourages  car  traffic.  In  addition,  released  data  may  “empower  the  empowered”  (Gurstein,  2011)  rather  than  strengthen  citizen  engagement  in  policy.  Thirdly,  data  are  always  produced  in  order  to  fulfill  a  specific  need.  There  is  a  strong  link  between  the  method  of  production  of   the  data  and  the  privileged   form  of   intervention  (Desrosières,  2003  ;  Didier,  2009),  which  allows  the  consolidation  of  the  aggregate.  This  stabilization  gives  consistency  and  relevancy   to   the  data.  And  yet,  opening  public  data  makes   the   data   unstable.   Indeed,   the   chain   between   process   and   use   of   the   data   is  broken   up,   since   there   is   a   possibility   of   using   differently   the   data.   So,   opening   data  access  may   threaten   the  reliability  and   the  accuracy  of   the  data.  For  all   these  reasons,  open   data   is   full   of   uncertainties   and  may   frighten   governments.   That’s  why   they   are  looking  to  reduce  this  uncertainty  and  to  regulate  the  released  data.    

Open   data   programs   mark   the   transition   from   a   logic   of   government   to   a   logic   of  governance,   in   which   public   authorities   regulate   society   rather   than   governing   it,  allowing  a  larger  participation  of  social  and  economical  actors.  This  reordering  of  power  doesn’t  mean  a  decline  of  public  authorities,  but  on  the  contrary,  public  authorities  that  gives  the  rules  in  a  logic  of  regulation  of  the  society.  In  this  context,  I  will  present  in  this  paper  some  modalities  of  governing  open  data   implemented  by  public  authorities  and  the   different   forms   of   relationships   between   governments   and   their   constituents   they  imply.   But   for   that,   it’s   necessary   to   admit   that   there   is   a   range   of   ways   opening  governmental  data.    

 

  3  

Defining  open  data  

In   December   2007,   Tim   O’Reilly,   Lawrence   Lessig,   Carl   Malamud,   David   Moore   and  others   theorists   of   Internet   and   Free   culture,   established   a   list   of   principles   of   open  government  data.  After  two  days  of  work,  they  defined  8  principles1:    

1. Data  must  be  complete  

2. Data  must  be  primary  

3. Data  must  be  timely  

4. Data  must  be  accessible  

5. Data  must  be  machine  processable  

6. Access  must  be  non-­‐discriminatory  

7. Data  formats  must  be  non-­‐proprietary  

8. Data  must  be  license-­‐free  

These  8  principles  became  the  basis  of  the  canonical  definition  of  open  government  data.  They  have  been  used   to  define  what  open  government  data  should  be  and   to  evaluate  governmental  initiatives.  However,  this  list  is  quite  restrictive  and  limits  governmental  possibilities   of   opening   data.   It   follows   the   tradition   of   the   open   source   movement,  which   adopts   a   restrictive  definition  of   the   adjective   “open”.   This   paper  will   break  up  with   this   normative   definition   and   follow   an   extensive   interpretation   of   open  government  data  to  point  out  the  diversity  of  governmental  choices.  Open  data  means  all  initiatives  by  public  authorities  of  making  their  data  available  to  a  wide  audience.  This  definition   allows   thinking   diversity   and   pluralism   in   the   way   of   opening   data.   Our  hypothesis  is  that  these  various  modalities  of  publishing  data  represent  different  modes  of  open  data  governance.    

Open  data  portals  as  political  objects  

By   liberating   its   data,   governments   must   adopt   indirect   modalities   of   regulation   to  restrain  the  possibilities  of  uses  of  these  datasets.  In  his  book  Code  is  law  (Lessig,  2006),  Lawrence   Lessig     points   out   four   modalities   of   regulation   that   interact   to   constrain  individuals:   law,   social   norms,  markets   and   architecture.   As   expressed   in   the   title,   he  argues   that,   in   cyberspace,   computer   code,   a   subspecies   of   physical   architecture,  may  regulate   conduct  as  well   as   legal   code.  However,   as  a  modality  of   regulation,   code  has  some  particularities:   it   is   automated,   immediate,   plastic   and,   above   all,   it   can   regulate  without   transparency   (Grimmelmann,   2005).   Lawrence   Lessig   assumption’s   invites   to  take  into  consideration  technical  architectures  as  political  objects.  This  paper  will  focus  on  legal,  economical  and,  especially,  technical  aspects  of  open  data  portals2.      

Open   data   portals   are   websites   used   to   release   public   data.   They   are   the   interfaces  between  governmental  data  on  one   side  and   re-­‐users  on   the  other   side.  They  act   as   a  

                                                                                                               1  http://www.opengovdata.org/home/8principles    2  I  will  leave  aside  social  norms  as  a  way  of  regulating  open  data  since  they  don’t  have  a  major  impact  at  the  level  of  the  open  data  portals.    

  4  

gateway  between  these  two  sides.  As   interfaces,  open  data  portals  must  be  considered  as   infrastructures.   Far   from   being   confined   to   simple   technical   considerations,  infrastructures  are  not  neutral  (Hughes,  1989).  On  the  contrary,  they  have  an  inherently  political  nature.  Indeed,  potential  practices  are  inscribed  in  the  deepest  levels  of  design  of  infrastructure.  That’s  why  open  data  portals  “do”  something.  They  enable  or  restrain  actions   and   define   a   field   of   possible   uses   of   released   data.   This   lack   of   neutrality  justifies   considering   open   data   portals   as   political   objects,   which   contribute   to   the  governance  of  released  data.  

The  political  nature  of  infrastructure  is  “a  call  to  study  boring  things”(Star,  1999),  such  as  technical  aspects  which  tend  to  be  invisible  and  so,  ignored.  In  this  paper,  I  will  focus  on   some   of   these   mediations   involved   on   open   data   portals,   which   enable   to   link  governmental  data  and   their   re-­‐users.  As  Antoine  Hennion  pointed  out,   the  concept  of  mediation  differs  from  the  one  of  intermediary,  since  it  implies  a  “strategic  relation  that  defines   at   the   same   time   the   terms   of   the   relation   and   its   modality”   (Hennion,   1993).  Indeed,  they  establish  the  relationship  between  public  data  and  their  re-­‐users,  and  they  define  the  modality  of  the  relationship.    

The  comparative  approach  to  open  data  portals  reveals  a  diversity  of  ways  in  which  the  governmental   data   is   released3.   To   study   those   different   ways   and   their   political  implications,   I   will   use   the   model   of   the   pluralist   compass   theorized   by   Dominique  Boullier  (Boullier,  2003,  2008).  Derived  from  the  works  of  Bruno  Latour  (Latour,  1997)  and   Isabelle   Stengers   (Stengers,   1996),   this   compass   crosses   two   axis:  attachments/detachment  and  certainties/uncertainty.  It  allows  to  describe  the  diversity  of   the  possible  mediations   and   to   better   understand   their   contrasts4.   Its   aim   is   not   to  narrow  the  range  of  possibilities,  but,  on  the  contrary,  to  open  up  the  political  debate  on  open  data  portals.    

This  paper  comprises  two  parts.  Firstly,  I  pose  the  question  of  how  legal,  economical  and  technical  mediations  contribute  to  a  comprehensive  understanding  of  open  data  portals  as  means  of  regulation  of  released  data.  Secondly,  I  point  out  some  alignments  of  these  mediations  and  ask  how  these  alignments  define  the  relationship  between  governments  and  their  constituents.    

 

 

 

 

                                                                                                               3  To  make  our  analysis,  we  have  selected  a  representative  sample  of  open  data  portals:  French  local   governments   (Paris,   Rennes,   Toulouse,   Nantes,   Montpellier,   Bordeaux,   Saône   et   Loire),  European  (London,  Manchester,  Leeds,  Bruxelles,  Berlin)  and  North  American  cities  (New-­‐York,  San  Francisco,  Chicago,  Boston,  Seattle,  Edmonton,  Vancouver,  Toronto,  Ottawa,  Montréal).    4  As   Dominique   Boullier   said   (Boullier,   2012),   this   compass   “forces   an   identification   of   all   the  positions,  even  those  which  are  sometimes  hardly  expressed,   in  order  to  bring  to  the  fore  possible  choices  which  may  have  been  overlooked  or   crushed  by   the  obviousness  of  others.   It   is   therefore,  first  and  foremost,  a  heuristic  tool  and  not  a  system  for  definitive  classification  and  storage”.  

  5  

Legal,  economical  and  technical  mediations  as  means  of  regulation    

1. Pluralism  of  legal  architectures  

The   first   modality   of   regulation   of   free   data   is   the   legal   aspect,   i.e.   the   juridical  architecture.  Law  is  the  first  modality  of  regulation  from  the  State.  The  legal  licenses  that  regulate  released  data  are  relatively  well  known.  I’d  like  to  briefly  go  over  four  licenses  that  may  be  placed  on  a  permissive  continuum:  public  domain  model,  authorship  model,  copyleft  model  and  restrictive  license.  Two  criterions  distinguish  these  licenses:  on  the  one  hand,  the  obligation  to  credit  the  data  producing  entity,  on  the  other  hand,  the  lack  of  certainty  about  the  legal  framework  of  the  reused  data.    

The  public  domain  model  lays  down  no  limitation  to  the  re-­‐users5.  Data  are  considered  as   belonging   to   the   public   domain.   This   most   permissive   license   offers   no   means   of  control  for  the  government.    

The  authorship  model  allows  all  uses  of  the  data  provided  the  authorship  is  mentioned  (BY).   The   French   “Open   License”   and   the   British   “Open   Government   licence   for   public  sector  information”  are  such  licenses.    

The   copyleft   model   requires   to   credit   the   source   and   to   preserve   the   same   rights   in  modified   versions   of   the   data   (BY   +   SA).   This   obligation   to   share-­‐alike   the   modified  datasets   is   a  way   for   the   governments   to   obtain   a   feedback   on  uses   of   their   data   and  possibly  to  benefit  from  these  improvements.    

The  last  kind  of  license  is  the  most  restrictive.  Re-­‐users  must,  as  the  previous  one,  credit  the  source  and  share-­‐alike  the  dataset,  but  they  can’t  transform  the  dataset  (BY  +  SA  +  ND).  This  license  is  very  rarely  used  since  these  constraints  are  in  opposition  to  the  aims  of   reuses   of   open   data.   Nevertheless,   some   governments   insert   in   their   license   a  relatively  vague  clause  of  no-­‐alteration  of  the  data  and  no  denaturing  of  their  meaning6.  If  this  clause  doesn’t  prevent  the  transformation  of  the  data,   it  establishes  a  restrictive  legal  framework  that  enables  governments  to  greatly  restrain  re-­‐uses  of  released  data.    

                                                                                                               5  For  instance,  the  PDDL  license:  “Public  Domain  for  data/databases”.  http://www.opendatacommons.org/licenses/pddl/1-­‐0/  6  For  instance,  it’s  the  case  of  the  license  of  the  French  city  of  Rennes.    

  6  

 

 

 

2. Pluralism  of  economic  models    

The   second   way   governments   have   to   restrain   uses   of   data   is   through   the   economic  model  of   the  data.   I  don’t  want  here   to  enter   into   the  debate  of   the  costs  of  collecting,  producing  and  publishing  data,  neither  on  the  sale  of  public  data  by  administrations  to  make  profit.   I   only   focus   on   the   economic  models   as   a  means  of   regulation.  Market   is  here  used  to  regulate  the  use  of  public  data.  I  know  that  canonic  definition  of  open  data  stipulates  that  this  information  should  be  available  for  free.  Nevertheless,  governments  may  choose  to  sell  their  data  in  order  to  limit  and  control  access  to  it.  I  distinguish  here  three  possibilities7,  which  differ  in  the  fact  they  constitute  an  economic  barrier  to  entry  and  they  enable  governments  to  keep  an  explicit  link  with  the  clientele  of  re-­‐users.    

2.1. Allowance  

Re-­‐users   who   want   to   use   a   public   dataset   have   to   pay   an   allowance   to   the  administration,  upon  a  variable  scale  previously  established  (depending  on   the  cost  of  publication,   the   “value”   of   the   data,   the   sales   generated   by   the   re-­‐user).   The   fee   for  access  greatly  limits  the  availability  of  the  information  and  restrains  it  to  actors,  which  are  able   to   afford   it.   Furthermore,   governments  keep   full   control  of   their  data  as   they  may  choose  the  actors  they  would  be  able  to  use  the  data.  

The  allowance  model,  which  was   the  norm  in   the  administration,   tends   to  become  the  exception.  For  instance,  in  France,  a  bill,  published  the  27th  may  2011,  told  that  gratuity  

                                                                                                               7  Another  possibility  might  be  a  purchase  per  unit  solution.  The  re-­‐user  would  purchase  only  one  part  of  the  dataset.  As  far  as  we  know,  any  governments  have  not  implemented  this  model.    

  7  

is  the  rule.  Public  data  producers  must  justify  the  direct  monetization  of  their  data.  A  list  of  fee-­‐paying  data  will  be  established.  Each  data  that  will  not  be  on  this  list  will  be  free  by  default.    

2.2. Freemium  

Data   may   also   be   free   by   default,   but   fee-­‐paying   depending   on   their   use   or   on   their  wished  access.  For  instance,  commercial  uses  can  be  submitted  to  an  allowance,  or  a  tax  is  imposed  to  access  the  data  if  re-­‐users  exceed  a  fixed  ceiling  of  use.  For  instance,  an  API  might  be  free  to  attract  developers,  and  then  fee-­‐paying  beyond  a  fix  number  of  requests  (and  so  beyond  a  significant  success  of  this  service).    

2.3. Free  

Public  data  may  be  published  freely,  without  any  constraint  linked  to  the  type  of  re-­‐use  or  the  marginal  cost  of  opening  the  data.  Supporters  of  gratuity  legitimate  it  by  the  fact  that   public   data   has   already   been   fund   by   the   tax.   By   monetizing   this   data,   re-­‐users  would  pay  two-­‐times  the  data.  When  they  release  freely  their  data,  governments  lose  all  control  over  their  data.    

 

 

 

3. Pluralism  of  technical  architectures  

Since  Winner’s  classic  paper  “Do  Artifacts  have  Politics?”  (Winner,  1986),  it’s  recognized  that  “technical  things  have  political  qualities”  and  that  “they  can  embody  specific  forms  of  power   and   authority”.   That’s   why   it   is   necessary   to   pay   attention   to   the   technical  characteristics  of  open  data  portals  and  the  meaning  of  those  characteristics.  Due  to  the  

  8  

limits  of  this  paper,  I  will  only  focus  on  three  technical  aspects:  data  access,  data  hosting  and  data  formats8.    

3.1. Data  access  

Four  kinds  of  access  to  the  data  are  possible:  no-­‐selective  feed,  bulk  download,  atomic  data  and  selective  feed.  Two  criterions  distinguish  these  four  types  of  access.  First,  they  offer  either  a  file  or  a  feed.  A  feed  maintains  a  strong  link  with  the  producing  entity  of  the  data,  whereas  a  file  leads  to  a  detachment  from  it.  Second,  these  four  solutions  differ  in  their  respect  of  the  integrity  and  the  structure  of  the  dataset.    

3.1.1. No-­‐selective  feed  (RSS)  

RSS  (RDF  Site  Summary)  allows  automatic  re-­‐uses  by  websites  and  programs.  It’s  a  well-­‐known  format,  which  can  be  used  to  publish  data.   It  offers  a  control  over  the  data  and  restrains  re-­‐uses,  since  re-­‐users  can’t  modify  data  obtained  through  a  RSS  feed.  It’s  a  no-­‐selective  feed:  developers  must  use  the  dataset  as  a  whole,  they  can’t  select  some  data,  and   there   is   no   data   processing.   Thus,   the   integrity   of   the   dataset   is   respected.   For  instance,   the   Socrata’s   platform   (used   by   New-­‐York,   Chicago,   Seattle)   allows   a   RSS  export  of  all  the  dataset.    

3.1.2. Bulk  download  

The  most   common  modality   of   accessing   data   is   downloading   the   dataset   file   from   a  remote  server  to  a   local  storage  disk.  Direct  download  is  the  simplest  solution  to  open  up  data.  However,  once  the  dataset  has  been  downloaded,  governments  lose  all  means  of  control  over  it.  Administrations  have  no  idea  who  and  what  users  are  doing  with  their  data.  Users   can   easily  modify,   transform,   and  manipulate   the  dataset.   Administrations  only   know   the   number   of   downloads   for   each   file.   If   the   dataset   is  modified,   re-­‐users  must   download   it   again   to   update   their   application.   So,  with   this  modality   of   opening  data,  there  is  a  risk  of  an  uncontrolled  dissemination  of  public  data  and  of  a  use  of  stale  data.      

3.1.3. Atomic  data  

Governments   can   also   publish   their   data   preliminary   extracted   from   the   dataset.   Re-­‐users  have  only  access  to  a  part  of  the  dataset.   Indeed,   it  might  be  broken  into  several  subsets   (for   instance,   in   relation   to   different   geographic   locations).   So,   the   dataset’s  integrity  isn’t  respected.    

3.1.4. Selective  feed  (API)  

Some   open   data   portals   provide   access   to   their   data   through   an   API   (Application  Programming   Interface).   API   allows   direct   access   to   the   data:   an   application  automatically   requests   another   application   the   data   it   needs.   It   makes   practically  impossible  a  mass  dissemination  of  public  data.  Nevertheless,  contrary  to  the  RSS  feed,  the  application  can  choose  to  use  only  some  data  of  the  dataset.  So,  the  dataset  integrity  is   not   always   respected.   Access   to   the   data   is   furnished   as   a   feed:   dataset   are  

                                                                                                               8  Other   technical   mediations   could   be   analyzed   to   complete   this   study,   for   instance:   the   data  structure   (unstructured,   semi-­‐structured,   structured),   the   data   granularity,   and   the   data  released  time  (immediate,  recent,  delayed,  archive).      

  9  

automatically  updated  and  the  administration  can  stop  the  access  to  the  feed.  Moreover,  access   to   the   data   trough   an   API   is   registered   as   logs.   That’s   why   API   allows   a   fine  control  of  the  accesses,  their  potential  limitation,  a  monitoring  of  the  use  of  the  service  by  the  applications  and  a  guaranty  that  all  the  re-­‐users  are  using  the  accurate  version  of  the  dataset.  Finally,  developers  must  sometimes  be  registered  to  use  the  API.  Re-­‐users  are  dependent  on  data  producers:  if  they  decide  to  stop  the  API,  the  accessibility  chain  is  broken  up,  the  application  don’t  receive  data  anymore.    

 

3.2. Data  hosting  

As  for  the  previous  aspect,  I  will  distinguish  four  kinds  of  data  hosting:  internal  hosting,  cloud  hosting,  peer-­‐to-­‐peer  and  external  platform.  Each  of  those  solutions  differs  in  the  capacity  for  the  administration  control  the  whole  of  the  opening  process  and  to  keep  a  hold  over  the  released  data  (for  instance,  the  more  or  less  easily  abilities  of  removing  or  replacing  the  dataset).  Moreover,  the  data  hosting  raise  the  issue  of  the  visibility  of  the  governmental  open  data  program.  Indeed,  by  publishing  their  data  through  the  peer-­‐to-­‐peer  networks  or  a  multi-­‐actor  platform,  governments  suspect  their  action  will  not  have  a  sufficient  visibility  and  their  relationships  with  re-­‐users  will  be  limited.        

3.2.1. Internal  hosting  

The   administration   develops   its   own   platform   and   stores   its   data   in   its   own   servers.  Platform  solutions   to  publish  data  are  various.  Several  are   in  open-­‐source,  such  as   the  content  management  platform  Typo3  developed  for  the  French  city  of  Rennes  and  used  by   the   city   of   Nantes,   or   the   CKAN   (Comprehensive   Knowledge   Archive   Network)  solution,   proposed   by   the   Open   Knowledge   Foundation.   With   these   solutions,  governments  keep  the  entire  control  of  its  data.      

3.2.2. Cloud  hosting  

  10  

Another   possibility   for   the   administration   is   to   use   a   SaaS   solution   (Software   as   a  Service).  The  platform,  created  by  an  external  company,  is  hosting  –  as  well  as  the  data  –  in   the   cloud.  This  actually   is   the  easiest   solution   since   it   requires  any   technical  needs.  Moreover,  the  administration  keeps  all  its  right  over  its  data.    Nevertheless,  it  loses  the  control   over   the   location   of   its   data.   Data   are   stored   in   external   servers,   what   could  cause   legal  problems.  Moreover,   the  administration  doesn’t  have  any  guarantee  of   the  evolution  of  the  SaaS  and  is  dependent  on  the  SaaS  company.    

3.2.3. Peer  to  peer  

Another   way   to   store   data   is   to   share   the   file   through   a   peer-­‐to-­‐peer   network.   For  instance,  Great  Britain  chose   this   solution   to   release   its   financial  data.  The  distributed  architecture   reduces   the   bandwidth   and   storage   costs:   peers   are   both   suppliers   and  consumers   of   resources.   Nevertheless,   the   lack   of   a   centralized   server   prevents   the  administration  from  controlling  diffusion  and  use  of  its  data.      

3.2.4. External  platform  

The  last  possibility  is  to  commit  its  data  to  an  external  multi-­‐actors  portal.  For  instance,  in  France,  the  national  website  data.gouv.fr  hosts  the  data  of  the  towns  of  Coulommiers,  Saint-­‐Quentin   and   Longjumeau.   By   sharing   the   platform,   administrations   lose   their  independency   and   their   abilities   to   choose   the  means   of   publishing   their   data.   All   the  technical  aspects  are  assigned  to  the  designers  of  the  external  platform.  

 

3.3. Data  formats    

Formats  are  another  way  to  limit  the  boundaries  of  what  is  possible  to  do  with  the  data.  There  are  many  kinds  of   formats,  and  the  selection  of  one  of   it  depends  on  the  type  of  data   (statistical,   geographical…).  However,   four  categories  of   formats  may  be  basically  distinguished:  PDF,  closed  and  proprietary  formats,  open  and  free  formats  and  API.  The  interoperability  they  allow  and  the  integrity  of  the  dataset  they  respect  distinguish  them.  

  11  

Contrary  to  the  previous  analyzed  technical  aspects,  it  is  a  no-­‐contradictory  policy:  some  open  data  portals  offer  different  formats  for  each  dataset9.    

3.3.1. PDF    

Publishing  data  as  PDF  documents  greatly  limits  the  ability  for  others  to  reuse  that  data.  Indeed,  while  humans  can   read  PDF  documents,   they  are  very  hard   for  a   computer   to  use.   The   stability   of   the   presentation   of   PDF   format   produces   a   foreclosure   effect,  preventing  interoperability.    

3.3.2. Closed  and  proprietary  formats  

Publishing  data  as  closed  and  proprietary   formats  (for   instance,   .xls  or   .doc)   limits   the  re-­‐uses,  by   restricting   interoperability.  Moreover,   the  owner  of   the   format  can  restrict  the   use,   impose   users   a   royalty   on   the   use   or   intimidate   legally   the   developers   of  applications  using  it.  So,  a  proprietary  and/or  closed  format  creates  a  barrier  for  access.  

3.3.3. Open  and  free  formats  

Open   formats   are   standards   established   by   public   authorities   or   international  institutions,  whom  aim  is  to  set  norms  to  ensure  interoperability  between  software.  No  entities  have  an  exclusive  control  on   this   format,  which  are  designed   for  an  automatic  processing.  For  instance,  the  CSV  format  (Comma  Separated  Values)  stores  tabular  data  in  plain-­‐text  form;  each  entity  is  separated  by  line  breaks.  Others  common  open  formats  used   to   publish   public   data   are   XML   or   RDF.   Thus,   providing   open  machine-­‐readable  formats  allows  greatest  re-­‐use,  and  so,  less  modalities  of  control  for  governments.    

3.3.4. API  

Another   alternative   for   governments   is   to   publish   their   data   through   an   API.   An   API  allows   large   re-­‐uses   of   datasets.   Interoperability   is  maximal   since   the   API   enables   an  automatic   exploitation   of   data  while   the   other   formats   require  mediation   to   be   used.  However,   contrary   to   the   open   formats,   publishing   data   with   an   API   prevents   any  modification  of  the  dataset  structure,  what  preserves  the  accuracy  of  the  data10.      

                                                                                                               9  In  addition,  the  choice  of  a  format  depends  on  the  technical  abilities  of  the  administration  and  on   the   software   they  use.  This  phenomenon  of   path  dependency   leads   to  political   decision  by  default.      10  To   complete   this   analyze,   API   formats   (XML,   JSON,   SOAP,   REST)   should   also   be   taken   into  consideration.   It   is   possible   to   subdivide   the   general   category   “API”   to   another   compass  with  four  types  of  API  available.      

  12  

 

 

Technical  architectures  and  mediation  alignments    

I  have  outlined  some  of   the  mediations,  which  take  part   in   the   linkage  between  public  data  and  their  re-­‐users  on  the  open  data  portals.  The  nature  of  these  mediations  is  very  heterogeneous:   they   are   legal,   economic   and   technical.   After   having   explored   the  diversity  of  the  mediations,   it’s  now  necessary  to  examine  the  forms  of  stability,  which  are   emerging.   Indeed,   although   several   mediations   are   possible,   not   all   of   which   are  compatible.  That’s  why  compromises  between  the  different  means  of  regulation  must  be  established   to   hold   public   data   and   their   re-­‐users   in   a   coherent   way.   Some   of   these  mediation  alignments  will  be  outlined  to  define  more  precisely  the  nature  of  the  relation  they  define  (Hennion,  1993).  Three  types  of  portals  may  be  sketched  out:  the  pipe  model,  the  multi-­‐front  approach  model  and  the  platform  model11.    

 

 

 

 

                                                                                                               11  These  three  models  constitute   in   this  way   ‘ideal   types’.  Every  open  data  portals  may  not   fall  neatly   into   one   of   the   models.   However,   they   do   help   make   visible   the   divergences   between  different   types   of   open   data   portals   and   so   to   analyze   open   data   governance.   This  work   is   in  progress  and  the  political   implication  of   those  models  should  still  be  developed.  Moreover,  we  need  to  be  cautious  and  to  not  make  definitive  judgments  since  these  process  are  still  unstable  and  questioned  by  the  actors  themselves.    

  13  

  Pipe  model   Platform  model   Multi-­‐front  approach  model  

License12   BY  or  SA  +  BY   BY  or  SA  +  BY   BY  or  SA  +  BY  

Economic  model12   Free   Free   Free  

Data  access   Bulk  downloading  (files)   API  (feed)   Pluralist  (files  +  

feed)  

Data  formats   Closed  or  open  formats   API   Diversity  of  formats  

Data  hosting   Internal  hosting   Internal  or  cloud  hosting   Cloud  hosting  

Political  implication   Uncertainty   Accessibility  and  control   Pluralism  

Target  audience   Larger  audience   Developers   Pluralist    

Nature  of  the  relationship  between  

government  and  re-­‐users  

Vertical   Collaboration   Vertical  and/or  collaboration  

Table  1:  Mediation  alignments  and  political  implications  

 

1. The  pipe  model  

The  pipe  model  is  a  single  access  point,  which  offers  a  bulk  downloading  of  the  dataset.  It’s   the   traditional  and   the  easiest  way  of  publishing  governmental  data.  Most  of  open  data   portals   fall   into   this   model.   The   simplicity   of   access   to   the   data   enables   a   large  audience  to  use  the  data.  Indeed,  download  doesn’t  need  users  to  be  a  developer  and  to  have  a   lot  of  specific  software   to  handle  data  13,  especially  when  the  open  data  portals  integrate  a  module  of  direct  interaction  or  of  visualization  of  the  dataset14.    

However,  the  pipe  model  is  quite  uncertain  for  governments  since  they  can  hardly  limit  reuses   of   the   data   after   having   released   it.   Once   the   dataset   has   been   downloaded,                                                                                                                  12  A   large  majority   of   open  data  portals   are  publishing   their   data   freely   and  under   authorship  (BY)   or   copyleft   (SA   +   BY)  models.   However,   although   there   are   the  most   common   solutions,  some  governments  might   choose  other  possibilities,   as,   for   instance,   a   freemium  access   to   the  data.    13  Nevertheless,  the  unequal  ability  of  understanding  the  data  greatly  restrains  the  appropriation  of   data   by   a   wide   audience   and   encourages   the   apparition   of   “intermediaries   of   open   data”  (Mayer-­‐Schönberger  et  Zappia,  2011  ;  Davies,  2010  ;  Gurstein,  2011  ;  Peugeot,  2012).    14  This  aspect  was  not  analyzed  in  this  paper  as  a  modality  of  access  the  data.  However,  several  portals   offer,   in   parallel   with   the   download,   an   insight   of   the   dataset   (with   some  modules   of  datavisualization)  as,  for  instance,  the  open  data  website  of  the  French  department  of  Saône  et  Loire  (http://www.opendata71.fr).    

  14  

administrations   have   no   idea   who   and   what   users   are   doing   with   their   data15.   The  relationship   between   governments   and   re-­‐users   defines   by   this   type   of   open   data  infrastructure  is  a  vertical  one.  The  model  of  government  is  quite  as  a  “vending  machine  government”  (O’Reilly,  2010).  It  is  similar  to  a  supply  and  demand  relationship,  with  on  one   side   governments,   which   provide   the   data,   and,   on   the   other   side,   users,   which  consume  it.      

2. The  platform  model  

The  platform  model   is   inspired  by  the  current  transformation  of  website  and  software  companies   into   service-­‐oriented   architecture.   Platform   architectures   are   gateway  technologies   that   permit   multiple   systems   to   interoperate.   Governments   don’t   just  provide  an  open  data  website,  they  provide  open  data  web  services.  These  services  are  governmental   software   development   kit   (SDK):   APIs   that   offer   an   opportunity   for  citizens   and   companies   to   build   applications   based   on   these   data.   The   strategy   of   the  platform   is   to   seek   to   leverage   the   time   and   the   expertise   of   a   globally   distributed  community  of  developers16.  The  platform  model  allows  a  great  accessibility  to  the  data  for   the   developers  while   giving   administrations   the   opportunity   to   control   re-­‐uses.   It  subscribes   in   a   strategy   of   collaboration   and   co-­‐creation   between   governments   and  developers.  The  objective   is  to  develop  an  ecosystem,  a  community  of  developers  with  whom  governments   can   share   and   interact.  However,   publishing   data   through   an  API  restrains   the   re-­‐uses   to   actors  who   have   the   technical   abilities   to   handle   this   type   of  architecture.  So,  the  open  data  platform  targets  a  smaller  audience  than  the  pipe  model.  

With   this   aim   in   mind,   the   American   government   wants   to   transform   its   open   data  website   data.gov   to   an   open   data   platform.   On   May   23,   the   White   House   issued   a  directive  that  requires  all  agencies  to  establish  programming  interfaces  for  internal  and  external   developers   to   use,   and   make   “applicable   Government   information   open   and  machine-­‐readable  by  default”17.  All  data  should  be  available  through  web  APIs  by  default.  The  American  government  seeks  more  interoperability  and  consistency  in  the  manner  it  delivers   information.   This   directive   follows   the   recommendation   of   Tim   O’Reilly   to  constitute  a  “government  as  a  platform”  (O’Reilly,  2010).    

3. The  multi-­‐front  approach  model  

The  multi-­‐front  approach  model  is  a  pluralist  one.  It  shares  characteristics  with  both  of  the  two  previous  models.  Indeed,  designed  by  specialized  companies,  it  offers  both  bulk  downloading   and   API   access   to   the   datasets.   Moreover,   data   are   available   in   several  formats.  Several  North  American  cities   (New-­‐York,  Chicago,  Seattle…)  have  chosen   the  SaaS  solution  provided  by  the  company  Socrata  to  publish  their  data  through  this  type  of  pluralist  portals.    

 

                                                                                                               15  Actually,   governments   seek   tracking   the   uses   developers  make   of   their   data   through   other  apparatus,  such  as  the  organization  of  hackathons  or  open  data  contests.  The  analysis  of   these  events  as  means  of  regulation  is  yet  to  be  done.    16  This   strategy   has   been   successfully   developed   by   IT   companies,   such   as  Apple  with   its   App  Store,  or  Facebook  with  the  Facebook  platform.    17  http://www.whitehouse.gov/sites/default/files/omb/egov/digital-­‐government/digital-­‐government.html    

  15  

Conclusion:  what  arena  of  discussion  of  these  political  choices?    

The   purpose   of   this   paper  was   to   demonstrate   that   governments   could   regulate   their  released   data   through   their   open   data   portals.   By   exploring   different   mediations,   I  showed   that   political   principles   are   inscribed   into   the   design   of   those   infrastructures  and  that  a  political  pluralism  of  open  government  data  was  possible.  Then,  I  pointed  out  three  alignments  of  these  mediations:  the  pipe  model,  the  platform  model  and  the  multi-­‐front   approach   model.   I   argued   that   each   of   those   models   defined   a   different  relationship   between   governments   and   their   constituents.     These   three   types   of   open  data  portals  are  compatible  with  the  principles  established  by  the  group  of  Sebastopol,  which  constitutes  the  benchmark  definition  of  open  data.    

The  pluralism  of  possible  mediations  and  the  diversity  open  data  portals  emphasize  the  fact  that  different  political  choices  are  possible  in  the  way  of  opening  governmental  data.  However,   there  are   few  arenas  of  discussion  of   these  political  choices.  The  mediations  seem   imposed   by   public   authorities   without   any   debate,   although   few   organizations  from  civil   society   try   to   get   their  point  of   view  across18.  The  analysis   of   these   groups,  their  forms  of  action  and  their  influence  on  the  governmental  open  data  strategies  is  yet  to  be  done.    

Finally,  I  would  like  to  mention  that  infrastructure  is  a  relational  concept,  which  must  be  studied  in  relation  to  organized  practices  (Hughes,  1989  ;  Star  et  Ruhleder,  1996).  This  paper   focuses  only  on   the  mediations   that   intervene   in   the  open  data  portals,  without  taking   into   consideration   the   social   groups   who   design   and   practice   those  infrastructures.   This   relational   characteristic   avoids   falling   into   a   technological  determinism.  Once   the   infrastructure  has  been  conceived,   the  distribution  of  power   is  not  definitely  established.  On  the  contrary,  as  all  technical  devices,  open  data  portals  are  hackable   and   changeable.  As   they  produce  patterns  of  power   and  authority,   they  may  also  encounter  resistance.  The  confrontation  with  users  may  reinvent  and  reshape  these  infrastructures  (Akrich,  1992).  The  study  of  the  open  data  portals  in  relation  with  their  environment  is  necessary  to  complete  the  analysis  of  the  politics  of  open  data  portals.        

                                                                                                               18  For   instance,   The   Guardian,   the   Sunlight   Foundation,   the   Open   Knowledge   Foundation  (OKFN),  Regards  Citoyens  and  the  FING  (Fondation  Internet  Nouvelle  Génération).    

  16  

Bibliography:  

AKRICH,  Madeleine.  The  De-­‐scription  of  Technical  Objects.  In  BIJKER,  Wiebe  E.,  LAW,  John  dirs.  Shaping  Technology/Building  Society  Studies  in  Sociotechnical  Change.  Cambridge  Mass.  :  MIT  Press,  1992,  p.  205-­‐224.  

BANNISTER,  Frank,  CONNOLLY,  Regina.  The  Trouble  with  Transparency:  A  Critical  Review  of  Openness  in  e-­‐Government.  BANNISTER,  Frank,  CONNOLLY,  Regina.  Policy  &  Internet.  2011,  vol.  3,  no  1.  

BOULLIER,  Dominique.  Déboussolés  de  tous  les  pays !  Paris  :  Editions  Cosmopolitiques,  2003.  

-­‐-­‐-­‐.  Politiques  plurielles  des  architectures  d’internet.  BOULLIER,  Dominique.  Sens  Public.  octobre  2008,  no  7-­‐8.  

-­‐-­‐-­‐.  Preserving  diversity  in  social  networks  architectures.  In  MASSIT-­‐FOLLÉA,  Françoise,  MÉADEL,  Cécile,  MONNOYER-­‐SMITH,  Laurence  dirs.  Normative  Experience  in  Internet  Politics.  Paris  :  Presses  de  l’Ecole  des  Mines,  2012.  

DAVIES,  Tim.  Open  data,  democracy  and  public  sector  reform.  A  look  at  open  government  data  use  from  data.gov.uk.  Oxford  Internet  Institute,  2010.  

DESROSIERES,  Alain.  Historiciser  l’action  publique.  L’État,  le  marché  et  les  statistiques.  In  LABORIER,  Pascale,  TROM,  Danny  dirs.  Historicités  de  l’action  publique.  Paris  :  PUF,  2003.  

DIDIER,  Emmanuel.  En  quoi  consiste  l’Amérique ?  Les  statistiques,  le  New  Deal  et  la  démocratie.  Paris  :  La  Découverte,  2009,  320  p.  

GRIMMELMANN,  James.  Regulation  by  Software.  GRIMMELMANN,  James.  The  Yale  Law  Journal.  2005,  vol.  114,  p.  1719-­‐1758.  

GURSTEIN,  Michael.  Open  data:  Empowering  the  empowered  or  effective  data  use  for  everyone?  GURSTEIN,  Michael.  First  Monday.  2011,  vol.  16,  no  2.  

HENNION,  Antoine.  La  passion  musicale.  Une  sociologie  de  la  médiation.  Paris  :  Métailié,  1993.  

HUGHES,  Thomas  P.  The  Evolution  of  Large  Technological  Systems.  In  BIJKER,  Wiebe  E.,  HUGHES,  Thomas  P.,  PINCH,  Trevor  dirs.  The  Social  Construction  of  Technological  Systems.  Cambridge  Mass.  :  MIT  Press,  1989,  p.  51-­‐82.  

LATOUR,  Bruno.  Nous  n’Avons  Jamais  Ete  Modernes.  La  Decouverte,  1997.  

LESSIG,  Lawrence.  Code:  Version  2.0.  Basic  Books,  2006.  

MAYER-­‐SCHÖNBERGER,  Viktor,  ZAPPIA,  Zarino.  Participation  and  Power:  Intermediaries  of  Open  Data.  The  1st  Berlin  Symposium  on  Internet  and  Society,  Berlin,  octobre  2011.  

  17  

O’REILLY,  Tim.  Government  as  a  Platform.  O’REILLY,  Tim.  innovations.  2010,  vol.  6,  no  1,  p.  13-­‐40.  

PEUGEOT,  Valérie.  Ouverture  des  données  dans  les  collectivités  territoriales :  ambitions,  racines  politiques  et  premiers  effets.  Paris,  2012.  

STAR,  Susan  Leigh.  The  Ethnography  of  Infrastructure.  STAR,  Susan  Leigh.  American  Behavioral  Scientist.  1999,  vol.  43,  no  3,  p.  377-­‐391.  

STAR,  Susan  Leigh,  RUHLEDER,  Karen.  Steps  Toward  an  Ecology  of  Infrastructure:  Design  and  Access  for  Large  Information  Spaces.  STAR,  Susan  Leigh,  RUHLEDER,  Karen.  Information  Systems  Research.  1996,  vol.  7,  no  1,  p.  111-­‐134.  

STENGERS,  Isabelle.  Cosmopolitiques.  Paris  :  La  Découverte  /  Les  empêcheurs  de  penser  en  rond,  1996.  7  vol.  

WINNER,  Langdon.  Do  artifacts  have  politics?  In  MACKENZIE,  Donald,  WAJCMAN,  Judy  dirs.  The  social  shaping  of  technology:  how  the  refrigerator  got  its  hum.  Milton  Keynes,  UK  :  Open  University  Press,  1986,  p.  26-­‐37.