Kirk Morris, Melissa Elliott, Jeannine Lemaire – SBTF
“On Friday, June 1st, USAID’s GeoCenter and Development Credit Authority (DCA) launched the Agency’s first-ever crowdsourcing initiative to pinpoint the location of USAID DCA loan data. Forty people came to USAID’s Innovation Lab throughout the day to crowdsource live. Online volunteers, working from Canada to the United Kingdom to Uganda, worked nonstop until the project was complete. The event, which was planned for the entire weekend, concluded after only 16 hours as the first 150 people completed 2,300 records. Each of these records is associated with multiple entries in the original database so the final output from the volunteers will result in approximately 10,000 unique records. The event relied heavily on partnerships from online volunteer communities – the Standby Task Force and GIS Corps who both brought many volunteers and leaders to the table. These records are part of a larger dataset containing over 100,000 records, 70,000 of which were automatically geocoded in collaboration with the Department of Defense. The initiative took place using the data.gov platform, manipulated for the first time as a crowdsourcing tool.” – Shadrock Roberts, USAID.
We have immense respect for the heavy lifting done by Shadrock Roberts and Stephanie Grosser of USAID. Walking through the Government bureaucracy and legal hurdles required tenacity and patience to bring the effort to fruition. Appreciation must also be shown to the many unknown Government workers who contributed in making the collaboration possible.
It started here:
This partnership between USAID and the Standby Task Force was unique for a number of reasons. One, we, the SBTF, had the luxury of weeks in which to prepare, inform and galvanize our membership. Two, it was the first effort by the SBTF to map and clean data not related to crisis. There was an initial concern that membership might be put off by the thought of data mining knowing it wasn’t for a critical crisis response and that manipulating pure data might be, well, boring. But, membership showed great enthusiasm and excitement for the detective work required to identify the individual reports. We dare say many members had fun meeting the challenges presented by non-standard location data. As Shadrock Roberts of USAID pointed out, “The records that were given to volunteers were records that we could not automate. This means that they contained some of the most difficult, confusing, and partial geocoded data of the whole set.”
This effort represents a significant view of the future for digital volunteers. As open data becomes more readily accessible, a wealth of information becomes available to be used for good. We were also impressed with the response to the call for volunteers from the global (“crowd”) public. They proved to be wholly reliable, competent and committed to the event as much as the seasoned volunteers. A major lesson learned is with proper work flows, instruction and experienced guidance the “crowd” is an extraordinary asset we all have to learn to trust.
Trust is an element that can’t be passed by casually. Establishing it has required effort, diligence and dedicated volunteers who take pride in the veracity of their efforts. This long road began with UN OCHA, Andrej Verity and the Colombia team (@ochacolombia). Along the way we made incremental advances with UNHCR, WFP, WHO, Amnesty International USA and the Harvard Humanitarian Initiative. And now a working partnership with USAID made possible by an effort of two-and-a-half years of dedicated membership. The future is bright for mapsters and open data.
Think the Volunteer Technical Community and SBTF have not made a difference? Then ponder this:
Dr. Rajiv Shah, Administrator of USAID, is quoted this week:
“All of these developments have made me think about how crucial it is to expand the community of individuals and organizations that we listen to and work with. This past week, our GeoCenter and the Development Credit Authority hosted our Agency’s first-ever crowdsourcing event, enlisting 150 volunteers to clean up and geotag thousands of loan data records. That event not only increased our Agency’s transparency, it created a model for the entire government—our event was the first time data.gov was opened to crowdsourcing. It won’t be the last.”
“The crowdsourcing event was implemented at no cost to the Agency and is paving the way for the USG [US Government] to allow an interested public to play a role in our efforts to open more data. The substantive effects of the released data and maps will change the way our partners work with DCA in the future. After reviewing the data for quality control, the complete dataset, case study, and the associated map will be released and presented at the Woodrow Wilson Center for International Scholars on June 28th.” – Shadrock Roberts, USAID.
Results (thus far) of the USAID CrowdSourcing Event:
Volunteers processed more than 2,300 records in approximately 16 hours. Many of these records have been used to populate multiple entries in the original dataset (where there were multiple entries only one “parent record” was given to the volunteers). At present, USAID has been able to complete 8,615 records from the work of the volunteers! They are fairly confident that the final number will be around 10,000. Only 2,393 records were labeled as “bad data,” which can still be mapped at the national level. Of the ones that were “completed” over 4,000 of them returned a good enough placename match to be assigned a latitude and longitude point.
We pulled together a few statistics from the crowdsourcing event (reflecting only the active sixteen hours of the event, as the event concluded earlier than the originally planned 60 hours thanks to our amazing volunteers!):
- Total volunteers who actively participated in the crowdsourcing: 143
- Total USAID, GISCorps and general public volunteers: 75
- Total SBTF volunteers: 68
- Total SBTF volunteers active in Skype channels: 58
- Total SBTF volunteers who RSVPed for the full 60-hour event: 142
The Next Phases:
The first phase of using crowdsourcing to geocode the data records and perform data cleansing is now complete. Phases 2 and 3 of the project are now being performed by USAID and GISCorps. During Phase 2, “hard-to-geocode” records are being worked on further by GISCorps volunteers who have specialized expertise in geolocation and writing automated scripts to perform these tasks. During Phase 3, quality control and analysis of all geocoded records will be performed, meaning the geocoding of data by both the “crowd” and automated systems will be checked for accuracy. Once these phases are finalized, the complete data set, map and case study will be released to the public, promoting open data and transparency.
Quotes from volunteers:
“The true meaning of crowdsourcing: I’m skyping with my mom to get help with the Sri Lanka-based tasks our volunteers are having some trouble with (she’s from there)”. – Jeannine Lemaire, Standby Task Force Volunteer
“I’m between jobs right now and this is a great opportunity for me to connect with people doing similar work as me in the DC area.” – Dan, GeoDC
“I wasn’t sure what I was going to be doing but I appreciate what USAID does and wanted to help.” – Stephanie, volunteer who works full time at National Defense University
The following Skype chat illustrates the wonder of crowdsourcing volunteers:
[6/2/2012 5:10:57 PM] Rick: I work in the field of Environmental science, with work also in Toxicity, Exposure, Epidemiology and Risk Assessment.
[6/2/2012 5:14:00 PM] Joy: Thank you Richard. Get out of town I worked for a bio montoring lab all through my under grad +5 years aquatic toxicology for NELAP compliance. I tried really hard to selll my boss on creating a GIS for his clients he just didn’t see the value in it so it was time to leave.
[6/2/2012 5:16:01 PM] Rick: I like the work I do from a task standpoint, and from the challenge. What I have not had is the fulfillment of feeling like I have done something good. I think that is why it is so hard for me to leave now:)
[6/2/2012 5:17:23 PM] Joy: Adeiu to everyone that worked vigilantly to finish ahead of schedual. I think ya’ ll broke some records. te he
[6/2/2012 5:18:30 PM] Joy: Richard you did lots of good today and you can feel good about that.
[6/2/2012 5:19:26 PM] Rick: I do. I haven’t felt like this since the soup kitchens and food drives I used to do in college. I love this.
The main offices of USAID hosted volunteer members of the “crowd” in DC:
The Platform and Tools:
Below are a list of some of the tools we were fortunate to have at our fingertips for this event, including Rabble, a custom-built microtasking application designed specifically for this crowdsourcing event by Socrata, a leader in open data helping to make Kenya a leader in the open data movement. This app enabled volunteers to request records from the US government’s open data site, Data.gov.
The Data.gov Dashboard:
Here the members were presented with the data records. Using various tools, including search engines and online maps, combined with much investigation and detective work, the volunteers were able to mark the data record as complete or as bad data.
USAID/ESRI Lookup Tool:
ESRI developed a tool just for the USAID crowdsourcing event to aid volunteers in their search for good location data matching the record.
NGA Geonames Tool:
A summation: We’ve come a long way, baby. The White House noticed! But, “with miles to go before WE sleep.”
We also got noticed in the press. Below is a brief list: