Mass collaboration to improve climate data — a new frontier in citizen science

Category: Environment & Sustainability
Published on Sep 06, 2010

Scientists meeting in the UK this week are crafting a revolutionary new project aimed at transforming their ability to predict meteorological disasters. The goal, as reported by the Guardian, “is to create an international databank that would generate forecasts of unprecedented precision.” To make that happen, the scientists behind the project are contemplating something even more radical: enlisting thousands of ordinary citizens around the world to gather, classify and even help analyze the meteorological data required to build more accurate, real-time models of the Earth’s climate.

Today, the data is too sparse and intermittent to provide reliable long-term weather forecasts, let alone predict a catastrophic weather event. But if a global climate databank could be built, scientists could better anticipate events like the floods in Pakistan that have killed about 2,000 and left millions homeless. Many scientists believe that catastrophic events of this nature will occur more frequently as the climate further destabilizes in the decades to come. The ability to give vulnerable areas accurate warnings about potential catastrophes could save millions of lives.

So where does the citizen science come in? It turns out that climate researchers have been inspired by projects like Galaxy Zoo, where scientists have dramatically increased the person power available to code and analyze satellite imagery by inviting ordinary citizens to participate in their research. At some 250,000 “citizen scientists” are helping astronomers at Yale, Oxford and other institutions classify galaxies using simple classification tools on the Web. The results so far have been impressive. Galaxy Zoo members have made nearly 75 million classifications of one million different images — far beyond the researcher’s original goal of getting the public to help classify a set of 50,000 galaxies. If the researchers were still laboring on their own, it would have taken them roughly 124 years to classify that many images!

What impressed the researchers the most, however, was the surprising ability of the community to contribute genuine scientific insights. Bill Keel, an astronomy professor at the University of Alabama who studies overlapping galaxies, decided to ask Galaxy Zoo users to contact him if they came across an example of this rare phenomenon. Throughout his career, Keel had studied the dozen or so overlapping galaxies then known to astronomers. Within a day of posting his question on the Galaxy Zoo forum, he had more than 100 responses from users who had indeed found such objects. Today, thousands have been identified.

Now climate scientists are contemplating something similar. As part of the project, climate scientists want to create a global network of weather stations that would provide daily temperature readings for any spot on the planet — a vast improvement over the monthly averages scientists currently get for data about temperatures, wind, precipitation and other variables in North America and Europe. The problem is that many countries outside these regions keep their data proprietary — preferring to sell it commercial enterprises and news organizations. So the big challenge initially will be to convince national governments to open up their weather station data for the betterment of climate research.

The rearview perspective on the Earth’s shifting climate will be key too. The scientists envision a role for citizen scientists in digitizing old sea logs (including daily temperature readings) from British naval records that date back to the 19th Century. Such historical data — newly digitized — would provide a welcome boost to researchers trying to more accurately model historical weather patterns. But if the Galaxy Zoo experience proves anything it’s that merely digitizing old sea logs is probably too narrow a role for the public.

To capture the imagination of citizen scientists, the researchers should invent broader roles for contributors in collecting and analyzing data. Perhaps one day soon, volunteers could upload data directly from their mobile devices. Scientists could also host climate modelling competitions for grad students and amateur enthusiasts. Like the astronomers mentioned earlier, they could be surprised at what comes of it. As one Galaxy Zoo researcher put it to me: “Mass collaboration in the Internet is a powerful multiplier. It makes research possible that just wasn’t possible before.”

Be Sociable, Share!


It appears to me that the recent experience with mass collaboration in science endeavours has shown that there is a spectrum of intensity for engaging participation – intersected by a spectrum of technology mechanism – and that providing a range of ways for people to engage with science is a good strategy for maximizing contributions. Rather than say one method or another represents “too narrow a role for the public”, I’d suggest that there are a wide-range of opportunities (let alone a host of challenges) for engaging the Internet-enabled public in science (and public policy development, for that matter).

On the one end of this spectrum is the tradition of citizen science, albeit supported by new technology platforms: “Web2.0-enabled citizen science”, if you will. At the other end is the exploitation of powers of human perception, reasoning and pattern recognition (that still outperform the most powerful machine computers) – e.g., interpreting imagery or accurately transcribing handwriting – through the harnessing of the Internet-public’s “cognitive surplus” (from Clay Shirky’s work). (This harnessing of human cognitive ability can be either obscure to the individual – e.g., the reCAPTCHA system for helping to digitize text – or presented in a game interface where the participant is motivated not by the output but by the game itself – e.g., the idea of “games with a purpose” or “serious games”). Somewhere between these two poles is what I’d call “science-oriented crowdsourcing”: the engagement of public participants in science endeavours through low-intensity pattern recognition tasks – e.g., this is where I’d place projects like Galaxy Zoo (though, to be clear, they do prefer to call their contributors “citizen scientists”). The best, earliest example of this approach is NASA’s Clickworkers project that has spawned a genre of volunteer science-oriented crowdsourcing.

If we look beyond the traditional view of citizen science, the use of crowdsourcing for science projects reveals that some participants will be interested in making lower-intensity contributions, and that these interactions can have additional results beyond simple image tagging or categorization:

* Wisdom of crowds: Rather than rely on the work of a small number of experts, gathering observations from multiple independent sources allows for the appreciation of accurate categorization from a different perspective.
* Software agent training: Large volume observations can produce powerful training data for improving machine learning approaches to classification problems. (We need to do this soon: as the number and volume of massive data sets grows, as does the number of applications competing for the attention of volunteer contributors, the potential for crowdsourcing approaches to solve these data analysis problems will become diluted. Science will require the capacity of software agents to address this gap that will continue to widen).
* Serendipitous discovery: the “many eyes” concept from open source software relates to an additional outcome of crowdsourcing – exposing datasets to large numbers of users increases the opportunity for even casual observers to stumble upon unique, rare and unanticipated events.
* Outreach: crowdsourcing engages the user in an active perspective which can serve to increase the potential for learning.

Your last paragraph hints at the next frontier in mass collaboration science: distributed collaboration through autonomous (and increasingly mobile) nodes. This spectrum intersects the “citizen science / cognitive surplus” spectrum described above, and distinguishes the mechanism for contributing. On the one end is distributed computing initiatives (e.g., SETI@Home) where through very little effort on my part I can donate the unused computer cycles on my computer to a much larger science mission. At the other end is the new realm of distributed observation points via mobile platforms of opportunity (see, e.g., Urban Atmospheres Lab). This last bit seems to me to represent a really exciting opportunity for engaging the public in ubiquitous, autonomous and highly-automated data-gathering operations in support of scientific inquiry and policy analysis. Rather than characterize this as a narrow or menial application of a person’s interest in contributing to a broader initiative, allowing for one-button contributions from a person’s smartphone would revolutionize the concept of participatory science.

posted by Justin Longo on 09.06.10 at 11:15 pm

[…] comment Anthony Williams (of Wikinomics fame) blogged on Sep 06, 2010 on Mass collaboration to improve climate data — a new frontier in citizen science: It turns out that climate researchers have been inspired by projects like Galaxy Zoo, where […]

Justin, you’ve evidently given this all a lot of thought. I agree with your notion that there is a spectrum of engagement for citizen science projects, and collaborative projects in general. I also suspect that the most successful citizen science projects will be the ones that leverage the full spectrum by allowing for differing levels of engagement among contributors.

Thanks for sharing!

posted by Anthony D. Williams on 09.07.10 at 1:32 am

[…] Category: Environment & Sustainability Published on Sep 13, 2010 After posting about the efforts of climate scientists to build a global climate databank for predicting extreme weather even…, I got a note from Dave Jarvis who recently built a cool open source tool for exploring weather […]

Leave a comment