I was recently invited to give a 5-minute talk on the SBTF in action. This was for the Frontiers in Development conference organized by USAID just a few weeks ago. They have just made the video public. In this presentation, I describe the SBTF’s recent projects in Libya and Somalia, and with USAID’s Credit Authority Program. Huge, huge thanks to all Mapsters (SBTF volunteers) who made these three groundbreaking projects possible!
[Guest post by Timo Luege - I’m passionate about information, communication and how they can be used to make the world a better place.
My two main areas of expertise are:
• Communication through digital media
• Media relations during disasters
Over the last thirteen years I have worked for the Red Cross Red Crescent Movement, the UN, German national public radio and a wire agency Social Media 4 Good] on June 29 – 2012
On June 1st, USAID launched its first ever crowdsourcing project. Yesterday, they shared their lessons learned during a webcast, in a power-point and a case study. Here are the main takeaways.
The USAID project was a little different from what many people have in mind when they think about crowdsourcing; it did not involve Ushahidi, nor did it have anything to do mapping data submitted by beneficiaries. Instead it was a clean-up operation for data that was too messy for algorithms to comprehend.
USAID wanted to share the locations of regions around the world where it had made loans available. The problem was that these locations were not captured in a uniform format. Instead, different partners had submitted the information in different formats. In order to show all loans on a map, USAID needed a uniform data structure for all locations.
Combined human-machine approach
Before sharing the data with the volunteers, USAID had already tried to clean the data with scripts. This meant that only the datasets remained, that were too difficult to be treated automatically.
I like that USAID did not simply outsource all the data to the crowd, but used human intelligence only for the cases that were too hard for the algorithm. This demonstrates that human capacity is seen as a valuable resource that should only be requested and used where it can have the highest impact.
Humans more accurate than algorithms
After the project, USAID asked the GISCorps to take a random sample of the machine-generated records as well as the human-generated records and compare their accuracy. According to analysis, the volunteers were more accurate than the machines, even though most volunteers weren’t GIS experts:
While 85 per cent of the records cleaned up by the volunteers were accurate, only 64 per cent of the records treated by the algorithm were correct. The volunteers were also much faster than expected – instead of the predicted three days, it only took the volunteers 16 hours to go through the data.
Comparatively little room for “creativity”
As one of the volunteers involved in the clean-up operation, I think that one of the reasons for the high accuracy rate was that the project was very focused and didn’t leave the volunteers a lot of room to be “creative”. USAID asked us to do something very specific and gave us a tool that only allowed us to operate within very restrictive parameters: during the exercise, each volunteer requested five or ten datasets that were shown in a mask where he could only add the requested information. This left very little room for potential destructive errors by the users. If USAID had done this through a Google Spreadsheet instead, I’m sure the accuracy would have been lower.
My takeaway from this is that crowdsourced tasks have to be as narrow as possible and need to use tools that help maintain data integrity.
Walk, crawl, run
Prior to the project launch, USAID ran incrementally larger tests that allowed them to improve the workflow, the instructions (yes, you need to test your instructions!) and the application itself.
If you ask people in 24 time zones to contribute to a project, you also need to have 24 hour tech support. It is very frustrating for volunteers if they cannot participate because of technical glitches.
It’s a social experience
This was emphasized a few times during the webcast and I think it’s an extremely important point: people volunteer their time and skills because they enjoy the experienceof working on a joint project together. That means you also have to nurture and create this feeling of belonging to a community. During the project duration, multiple Skype channels were run by volunteer managers where people could ask questions, exchange information or simply share their excitement.
In addition, USAID also invited volunteers from the Washington DC area to come to their office and work from there. All of this added to making the comparatively boring task of cleaning up data a fun, shared experience.
You need time, project managers and a communications plan
During the call USAID’s Shadrock Roberts said that he “couldn’t be happier” with the results, particularly since the costs of the whole project to the agency were “zero Dollars”. But he also emphasized that three staff members had to be dedicated full time to the project. So while USAID didn’t need a specific budget to run the project, it certainly wasn’t free.
To successfully complete a crowdsourcing project, many elements need to come together and you need a dedicated project manager to pull and hold it all together.
In addition to time needed to organize and refine the technical components of the project, you also need time to motivate people to join your project. USAID reached out to existing volunteer and tech communities, wrote blog post and generated a buzz about the project on social media – in a way they needed to execute a whole communications plan.
Case study and presentation
USAID published a very good case study on the project which can be downloaded here. It is a very practical document and should be read by anyone who intends to run a crowdsourced project.
In addition, here is the presentation from yesterday’s call:
PPT credit Shadrock Roberts/Stephanie Grosser – USAID
The entire case study was presented by Roberts, Grosser and Swartley at the Wilson Center 7/28/2012. The event was livestreamed:
Event video courtesy of the Wilson Center
Kirk Morris, Melissa Elliott, Jeannine Lemaire – SBTF
“On Friday, June 1st, USAID’s GeoCenter and Development Credit Authority (DCA) launched the Agency’s first-ever crowdsourcing initiative to pinpoint the location of USAID DCA loan data. Forty people came to USAID’s Innovation Lab throughout the day to crowdsource live. Online volunteers, working from Canada to the United Kingdom to Uganda, worked nonstop until the project was complete. The event, which was planned for the entire weekend, concluded after only 16 hours as the first 150 people completed 2,300 records. Each of these records is associated with multiple entries in the original database so the final output from the volunteers will result in approximately 10,000 unique records. The event relied heavily on partnerships from online volunteer communities – the Standby Task Force and GIS Corps who both brought many volunteers and leaders to the table. These records are part of a larger dataset containing over 100,000 records, 70,000 of which were automatically geocoded in collaboration with the Department of Defense. The initiative took place using the data.gov platform, manipulated for the first time as a crowdsourcing tool.” – Shadrock Roberts, USAID.
We have immense respect for the heavy lifting done by Shadrock Roberts and Stephanie Grosser of USAID. Walking through the Government bureaucracy and legal hurdles required tenacity and patience to bring the effort to fruition. Appreciation must also be shown to the many unknown Government workers who contributed in making the collaboration possible.
It started here:
This partnership between USAID and the Standby Task Force was unique for a number of reasons. One, we, the SBTF, had the luxury of weeks in which to prepare, inform and galvanize our membership. Two, it was the first effort by the SBTF to map and clean data not related to crisis. There was an initial concern that membership might be put off by the thought of data mining knowing it wasn’t for a critical crisis response and that manipulating pure data might be, well, boring. But, membership showed great enthusiasm and excitement for the detective work required to identify the individual reports. We dare say many members had fun meeting the challenges presented by non-standard location data. As Shadrock Roberts of USAID pointed out, “The records that were given to volunteers were records that we could not automate. This means that they contained some of the most difficult, confusing, and partial geocoded data of the whole set.”
This effort represents a significant view of the future for digital volunteers. As open data becomes more readily accessible, a wealth of information becomes available to be used for good. We were also impressed with the response to the call for volunteers from the global (“crowd”) public. They proved to be wholly reliable, competent and committed to the event as much as the seasoned volunteers. A major lesson learned is with proper work flows, instruction and experienced guidance the “crowd” is an extraordinary asset we all have to learn to trust.
Trust is an element that can’t be passed by casually. Establishing it has required effort, diligence and dedicated volunteers who take pride in the veracity of their efforts. This long road began with UN OCHA, Andrej Verity and the Colombia team (@ochacolombia). Along the way we made incremental advances with UNHCR, WFP, WHO, Amnesty International USA and the Harvard Humanitarian Initiative. And now a working partnership with USAID made possible by an effort of two-and-a-half years of dedicated membership. The future is bright for mapsters and open data.
Think the Volunteer Technical Community and SBTF have not made a difference? Then ponder this:
Dr. Rajiv Shah, Administrator of USAID, is quoted this week:
“All of these developments have made me think about how crucial it is to expand the community of individuals and organizations that we listen to and work with. This past week, our GeoCenter and the Development Credit Authority hosted our Agency’s first-ever crowdsourcing event, enlisting 150 volunteers to clean up and geotag thousands of loan data records. That event not only increased our Agency’s transparency, it created a model for the entire government—our event was the first time data.gov was opened to crowdsourcing. It won’t be the last.”
“The crowdsourcing event was implemented at no cost to the Agency and is paving the way for the USG [US Government] to allow an interested public to play a role in our efforts to open more data. The substantive effects of the released data and maps will change the way our partners work with DCA in the future. After reviewing the data for quality control, the complete dataset, case study, and the associated map will be released and presented at the Woodrow Wilson Center for International Scholars on June 28th.” – Shadrock Roberts, USAID.
Results (thus far) of the USAID CrowdSourcing Event:
Volunteers processed more than 2,300 records in approximately 16 hours. Many of these records have been used to populate multiple entries in the original dataset (where there were multiple entries only one “parent record” was given to the volunteers). At present, USAID has been able to complete 8,615 records from the work of the volunteers! They are fairly confident that the final number will be around 10,000. Only 2,393 records were labeled as “bad data,” which can still be mapped at the national level. Of the ones that were “completed” over 4,000 of them returned a good enough placename match to be assigned a latitude and longitude point.
We pulled together a few statistics from the crowdsourcing event (reflecting only the active sixteen hours of the event, as the event concluded earlier than the originally planned 60 hours thanks to our amazing volunteers!):
- Total volunteers who actively participated in the crowdsourcing: 143
- Total USAID, GISCorps and general public volunteers: 75
- Total SBTF volunteers: 68
- Total SBTF volunteers active in Skype channels: 58
- Total SBTF volunteers who RSVPed for the full 60-hour event: 142
The Next Phases:
The first phase of using crowdsourcing to geocode the data records and perform data cleansing is now complete. Phases 2 and 3 of the project are now being performed by USAID and GISCorps. During Phase 2, “hard-to-geocode” records are being worked on further by GISCorps volunteers who have specialized expertise in geolocation and writing automated scripts to perform these tasks. During Phase 3, quality control and analysis of all geocoded records will be performed, meaning the geocoding of data by both the “crowd” and automated systems will be checked for accuracy. Once these phases are finalized, the complete data set, map and case study will be released to the public, promoting open data and transparency.
Quotes from volunteers:
“The true meaning of crowdsourcing: I’m skyping with my mom to get help with the Sri Lanka-based tasks our volunteers are having some trouble with (she’s from there)”. – Jeannine Lemaire, Standby Task Force Volunteer
“I’m between jobs right now and this is a great opportunity for me to connect with people doing similar work as me in the DC area.” – Dan, GeoDC
“I wasn’t sure what I was going to be doing but I appreciate what USAID does and wanted to help.” – Stephanie, volunteer who works full time at National Defense University
The following Skype chat illustrates the wonder of crowdsourcing volunteers:
[6/2/2012 5:10:57 PM] Rick: I work in the field of Environmental science, with work also in Toxicity, Exposure, Epidemiology and Risk Assessment.
[6/2/2012 5:14:00 PM] Joy: Thank you Richard. Get out of town I worked for a bio montoring lab all through my under grad +5 years aquatic toxicology for NELAP compliance. I tried really hard to selll my boss on creating a GIS for his clients he just didn’t see the value in it so it was time to leave.
[6/2/2012 5:16:01 PM] Rick: I like the work I do from a task standpoint, and from the challenge. What I have not had is the fulfillment of feeling like I have done something good. I think that is why it is so hard for me to leave now:)
[6/2/2012 5:17:23 PM] Joy: Adeiu to everyone that worked vigilantly to finish ahead of schedual. I think ya’ ll broke some records. te he
[6/2/2012 5:18:30 PM] Joy: Richard you did lots of good today and you can feel good about that.
[6/2/2012 5:19:26 PM] Rick: I do. I haven’t felt like this since the soup kitchens and food drives I used to do in college. I love this.
The main offices of USAID hosted volunteer members of the “crowd” in DC:
The Platform and Tools:
Below are a list of some of the tools we were fortunate to have at our fingertips for this event, including Rabble, a custom-built microtasking application designed specifically for this crowdsourcing event by Socrata, a leader in open data helping to make Kenya a leader in the open data movement. This app enabled volunteers to request records from the US government’s open data site, Data.gov.
The Data.gov Dashboard:
Here the members were presented with the data records. Using various tools, including search engines and online maps, combined with much investigation and detective work, the volunteers were able to mark the data record as complete or as bad data.
USAID/ESRI Lookup Tool:
ESRI developed a tool just for the USAID crowdsourcing event to aid volunteers in their search for good location data matching the record.
NGA Geonames Tool:
A summation: We’ve come a long way, baby. The White House noticed! But, “with miles to go before WE sleep.”
We also got noticed in the press. Below is a brief list:
- USAID Impact Blog: http://blog.usaid.gov/2012/06/doing-more-for-development-through-inclusivity/ [this was actually a big deal because it’s infamously hard to get blog posts cleared on this!]
- US State Department Dip Note: http://blogs.state.gov/index.php/site/entry/doing_more_for_development_through_inclusivity
- All Africa: http://allafrica.com/stories/201205250526.html
- DevEx: http://www.devex.com/en/news/usaid-taps-crowdsourcing-for-aid-transparency/78290
- Wall Street Journal Market Watch: http://www.marketwatch.com/story/esri-supports-usaid-crowdsourcing-event-2012-05-29
- Reuters: http://www.reuters.com/article/2012/05/29/idUS206383+29-May-2012+PRN20120529
- Federal News Radio: http://www.federalnewsradio.com/85/2889652/USAID-crowdsources-to-clean-up-aid-data
- Blog post by Woodrow Wilson Center Fellow, John Crowley: http://intertwingler.com/blog/connecting-grassroots-and-government
- Original SBTF Blog post: http://blog.standbytaskforce.com/sbtf-usaid-partnership-on-poverty-alleviation-and-smarter-development/
The Standby Volunteer Task Force (SBTF) continues to break new ground in 2012. This time around we’re partnering with colleagues at the US Agency for International Development (USAID) who recognize, like we do, that equitable and sustainable economic growth is instrumental for countering extreme poverty across the globe. Being one of the biggest development organizations in the world, USAID has the resources to have significant impact on the livelihoods of millions. To this end, our colleagues want to better understand the link between their economic growth initiatives and their subsequent impact on poverty alleviation. This is where we as SBTF volunteers come in.
Our partners have access to a considerable amount of data which, if analyzed, will yield some very important empirical insights on the link between economic growth projects and poverty alleviation. The challenge, simply put, is to geo-code these datasets so that we can all better understand the geographic impact of various local economic initiatives vis-a-vis extreme poverty. Geo-code simply means finding the geographic location of said projects so that the resulting data can be mapped. USAID has already used automated methods to do this, but some datasets can only be processed by humans. But why map this data in the first place? Because maps can reveal powerful new insights that can catalyze new areas of potential collaboration with host countries, researchers, other development organizations and the public.
The local economic growth projects in question are aimed at reducing poverty and thereby changing people’s lives for the better. The results of the analysis, all of which shall be made public, will be used directly by USAID to fine tune their programs and thus increase their impact on poverty alleviation; welcome to Smarter Development! Given the extraordinary commitment of SBTF volunteers in projects past, our USAID colleagues have approached us to help them geo-code these important datasets. This is the very first time that USAID has reached out to online volunteer communities to actively help them process data about their organization’s impact in the field.
The result of this partnership will be a unique geo-coded dataset and a case study of said dataset that will be completely public for anyone to review. We’re excited to be partners in this effort since the project will demonstrate how crowdsourcing and online volunteers can play a significant role in both opening up development data and analyzing said data for the purposes of Smarter Development. This project will also provide SBTF volunteers with the opportunity to develop new skills while refining their existing skills and learning about how to work with new technologies.