Monday, March 8, 2010

Statistics for a changing world: Google Public Data Explorer in Labs

Last year, we released a public data search feature that enables people to quickly find useful statistics in search. More recently, we expanded this service to include information from the World Bank, such as population data for every region in the world. More and more public agencies, non-profits and other organizations are looking for ways to open up their data and expand global access to this kind of information. We want to help keep that momentum going, so today we're sharing a snapshot of some of the most popular public data search topics on Google. We're also launching the Google Public Data Explorer, an experimental visualization tool in Google Labs.

Popular public data topics on Google
We know people want to be able to find reliable data and statistics on a variety of subjects. But what kind of statistics are they looking for most? To help us better prioritize which data sets to include in our public data search feature, we've analyzed anonymous search logs to find patterns in the kinds of searches people are doing, similar to the patterns you can find on Google Trends and Insights for Search. Some public data providers have asked us to share what we've learned, so we decided to put together an approximate list of the 80 most popular data and statistics search topics.

You can read the complete list at this link (PDF), but here's the top 20 to get you started:

1. School comparisons
2. Unemployment
3. Population
4. Sales tax
5. Salaries
6. Exchange rates
7. Crime statistics
8. Health statistics (health conditions)
9. Disaster statistics
10. Gross Domestic Product (GDP)
11. Last names
12. Poverty
13. Oil price
14. Minimum wage
15. Consumer price index, inflation
16. Mortality
17. Cost of living
18. Election results
19. First names
20. Accidents, traffic violations

You'll notice some interesting entries in the list. For example, we were surprised by how many people search for data about popular first and last names. Perhaps people are trying to decide what to name a new baby boy or girl? As it turns out, people are interested in a wide range of statistical information.

To build the list, we looked at the aggregation of billions of queries people typed into Google search, using data from multiple sources, including Insights for Search, Google Trends and internal data tools — similar to what we do for our annual Zeitgeist. We combined search terms into groups, filtering out spam and repeats, to prepare a list reflecting the most popular public data topics. As a statistician, it's important for me to note that the data only covers one week's worth of searches in the U.S., so there could be seasonal and other confounding factors (perhaps there was an election that week). In addition, preparing a study like this requires a fair amount of manual grouping of similar queries into topics, which is fairly subjective and prone to human error. While imperfect, we still think the list is helpful to consider.

The Public Data Explorer
As you can see, people are interested in a wide variety of data and statistics, but this information is only useful if it's easy to access, understand and communicate. That's why today we're also releasing the Google Public Data Explorer in Labs, a new experimental product designed to help people comprehend data and statistics through rich visualizations. With the Data Explorer, you can mash up data using line graphs, bar graphs, maps and bubble charts. The visualizations are dynamic, so you can watch them move over time, change topics, highlight different entries and change the scale. Once you have a chart ready, you can easily share it with friends or even embed it on your own website or blog. We've embedded the following chart using the new feature as an example:



This chart compares life expectancy and the number of births per woman over the last 47 years for most economies of the world. The bubble sizes show population, and colors represent different geographic regions. Press the play button to see the dramatic changes over time. Click "explore data" to dig deeper.

Animated charts can bring data to life. Click the play button in the chart to watch life expectancy increase while fertility rates fall around the world. The bubble colors make it quick and easy to see clusters of countries along these variables (e.g., in 1960 the European and Central Asian countries were in the lower right and Sub-Saharan Africa in the upper left). The bubble sizes help you follow the most populous countries, such as India and China. These charts are based on the Trendalyzer technology we acquired from the Gapminder Foundation, which we've previously made available in the Motion Chart in Google Spreadsheets and the Visualization API.

With a handful of data providers, there are already billions of possible charts to explore. We currently provide data from the same three providers currently available in our search feature: the World Bank, the U.S. Bureau of Labor Statistics and the U.S. Census Bureau. In addition, we've added five new data providers: the Organization for Economic Co-Operation and Development (OECD), the California Department of Education, Eurostat, the U.S. Center for Disease Control, and the U.S. Bureau of Economic Analysis. We're excited that all around the world new data providers are deciding to make their information freely available on the Internet, enabling innovators to create interesting applications, mash up the data in new ways and discover profound meaning behind the numbers.

We hope our list and new tool help demonstrate both the public demand for more data and the potential for new applications to enlighten it. We want to hear from you, so please share your feedback in our discussion forum. If you're a data provider interested in becoming a part of the Public Data Explorer, contact us.

No comments:

Post a Comment