Google Launches New Dataset Search Engine

toggle-button

Google Dataset search engine

Google's newest search engine accesses datasets in thousands of data repositories around the web.

Dataset Search, Google's newest search engine, was created to help people find data. Find and access a variety of datasets, like daily weather maps, the NASA thesaurus, TechCrunch articles, IMDB movies, data.gov, ocean temperatures, ProPublica, and more.

The search engine is geared towards scientists and journalists, though anyone can use it. Dataset Search can be used to find references to most datasets, wherever they are hosted, in environmental and social sciences, whether it's a publisher's site, a digital library, or an author's personal web page.  Datasets are fragmented and hard to find, so a search engine dedicated to them is a welcome addition for many, including data geeks or those of us who are intensely curious.

Dataset Search is easy to use, enter a term and DataSet Search will guide you to the available data. The datasets are growing, and Google has done a good job at indexing what's available. As more publishers use Dataset markup that allows publishers to describe their data in a way that search engines can better understand the content of their pages, more datasets will appear in search results. To get started, have a look at the large variety of datasets that are open to the public in this article at Quora.

Google Dataset Search


(h/t Google Keyword)

You can find more Tech Treats here.

 

Please rate this article: 

Your rating: None
4.4
Average: 4.4 (10 votes)
toggle-button

Comments

It searches datasets exclusively. Search engines don't index everything on the internet - thus the creation of specialty search engines such as Google Scholar, Springer, BASE, FOIA.gov, VADLO, CiteSeerx, the Icon Archive, Kaggle, refseek, Academic Index, etc.

Run a search in a standard search engine for boston education or "boston AND education".
A wide variety of returns appears, depending on the search engine, most of which are not useful if you're looking for data/datasets. Pioneers Happy Hour announcement, a Forbes article on Paul English, a pdf from Northeast Insurance Agency, and a CNN article on school gardens are not what I would want if I were looking for data/datasets on boston education.

Search Dataset for the same "boston AND education" without quotes, and it returns two datasets. Remove the operator AND, and more datasets are listed. These may not be exactly what I'm looking for, but are much closer to what I'm seeking.
(Using quotation marks in Dataset returns a "boston AND education" - did not match any datasets" message).

So Google's Dataset Search can find a set of docs relating to "Boston education". A regular ol' Boolean search should find any docs that have "Boston AND education" as index terms, no matter where the docs are, and those docs are the "data". So how is Google's Dataset Search an improvement?

In Boolean set theory:
AND equals ONLY the union where both/all search-terms are true but excludes the subsets where only one or the other is true
OR equals the complete aggregate of either/all search-terms independently that are true.