Posted by John Delligatti on Wed, 09/26/2018
Google Dataset Search Engine
Google has launched a new search type specifically engineered to assist people in finding datasets. Although the number of available datasets is currently limited due to the fact that the search engine just launched, this program shows promise in that it covers a wide array of subjects spanning from weather reports published by the National Weather Service to NFL player game stats.
In order for content to appear on Google Dataset Search, it must be formatted in accordance with the schema markup for dataset providers; this markup which made its debut in July 2018. Through this effort, Google hopes their dataset search engine will help drive standardization since search engine algorithms rely heavily on this dataset schema so that it can provide accurate search results.
Why Standardization is Important
All companies generate data; and in today’s connected world, these massive amounts of data can be overwhelming. Below are some examples of what Google qualifies as a dataset:
• A table or CSV with data.
• An organized collection of tables.
• A file in a proprietary format that contains data.
• A collection of files that together constitute a meaningful dataset.
• Images capturing data.
In order to properly leverage this data, companies must be able to work with it. This may sound simple, but considering every company generates a high volume of what Google considers as data, there exists a need for converting raw data into a malleable dataset ripe for analysis. In fact, a study from Data Science Central estimates that data scientists spend 60% of their time preparing data for analysis, rather than performing the analysis itself.
Enter Dataset Markup Schema
The goal of this schema—and some would argue that of the dataset search engine itself—is to drive standardization so that more time can be spent on analyzing data versus data preparing it. By providing a schema that must be followed to be searchable, Google is leveraging its position as the world’s leading search engine in driving data standardization.
Google is also careful to note that though the search engine is powerful, it will only be as powerful as the metadata which sources are willing to provide.
What it Means for SDI
SDI is pioneering ways to leverage big data in the MRO space using the exclusive ZEUS platform we have developed. Although ZEUS has a tremendous capacity for data analysis, the program is only as good as the data loaded into it; much like is the case with Google’s schema. Even though the data we have may be very useful, ZEUSs cannot interpret it without the proper insights. Therefore, it is crucial that users integrate their data usage onto ZEUS in order to minimize efforts, maximize results, and in turn increase profitability.
This issue is particularly glaring when we receive data from outside sources. Unlike internal reports, we have no idea what type of data we may receive from a vendor or client. Even if we’re being sent reports out of a popular ERP system like SAP, the actual data we receive varies widely from source to source, and time must be spent standardizing the data to be ingested by ZEUS. If a global dataset schema was successfully implemented, our analysis would become much more powerful and even more efficient.
Although Google touts the new search engine as a way for everyone to access big data that would not have been previously been accessible, the conversation moving forward pertains to their standardization attempt across multiple industries. With that being said, this is just the first step on the journey to data standardization and it is one that SDI, and the data community will be watching very closely.
Contact us today to learn more.