Data Commons

A repository of combined government transparency data sets. Our initial release will combine state and federal campaign contribution data from NIMSP and CRP.

Looking at the various political data we have available to us, almost all data can be represented as transactional records between entities:

  • A gave $2000 to X's campaign
  • A paid lobbyist B to meet with X
  • X requested earmark for $1 million to A
  • Agency D awarded a contract to A for $6 million

The biggest challenge is not reconciling the transactions, but matching the transaction participants across the data. Each data set has it's own representation for entities; they usually have different IDs and different names. We need a way to look at two data sets and decide that entity A from the CRP data is the same organization as entity Z from NIMSP data. Additionally, we need to keep track of any attributes, such as the original CRP and NIMSP IDs, that each entity contains.

Our first tool, Matchbox, allows us to load and store entities from each data set. We can then merge records that are deemed to represent the same entity. You can interact with Matchbox using the included Python module or by calling the basic API over HTTP.

We will be adding additional features to Matchbox over the next several months including name standardization algorithms, importers/exporters, and a web-based administration interface.

Project Participants


Join This Project?

If you're interested in helping out on this project, you should! Here's how:

  1. Contact

    Contact the Project Lead

  2. Contribute

    Contribute Some Code

  3. Join

    After you've contacted and contributed to the project, click the Join This Project button below to be listed as a contributor.

Follow The Labs And See What We're Up To

1818 N Street NW, Suite 300
Washington, DC 20036
202.742.1520