Teaching Open Datasets to Dance Together



1.Teaching Open Datasets to Dance Together By Alon Peled The Hebrew University of Jerusalem

2. What is Open Data? Datasets published by public authorities worldwide • on the Internet

3. Economic Potential Public sector organizations, at all levels of government, publish tens of millions datasets on Open Data portals as dictated by the law.

4. Challenges Discovery Use Lack of uniform No tools to publication analyze and standards and Classification integrate the difficulty finding data datasets Inconsistent and poor tagging of datasets

5.Smart integration of datasets from multiple sources Example: Open Data About Natural Gas Projects

6.Search Results Compared to Google’s Search Engine Example: Data about Toyota Flaws

7.Search Results Compared to Google’s Search Engine Example: Data about Public Sector Tenders

8. Technological Innovation: The Process Visited Open Data Open Data Open Data Crawlers discover Repositories Portals open data catalogues ETL-Extract, Transform, Load Server Original Metadata (EDW) of the Open Data Publications Smart Tagging Algorithm (MECA)

9. Technological Innovation: Smart Tagging Enterprise Data Warehouse Open Dataset (EDW) In-Database Crowdsourcing Crowdsourcing Affinity - Behavioral - Survey Smart Tagging History Context Expert Text Analytics Repository Analysis Dictionary Detailed Adding Smart Smart Chronology of Enterprise Data Tags to the the Smart Tagging Warehouse Metadata of a Tagging Selection (EDW) Specific Open Process Algorithm With Smart Tagging Dataset

10. Patents Metadata-Driven Smart Smart Indexing Tagging PCT Application No. PCT/IL2016/051052 United States Patent Application No. 15/272,058 "Advanced Computer Implementation For Crawling "Method of enriching And/Or Detecting Related metadata usable for Electronically Catalogued Data content searching and Using Improved Metadata system thereof" Processing"

11. Existing Dataset Search Solutions Portals Vendors Limited to a single city/state without smart Open data upload per individual client without smart tagging tagging Marketplaces Search Engines Limited to data integration or data trading without smart tagging Limited to a single economic vertical or a single tagging technique

12. New Dataset Search Solutions Portals Vendors Marketplaces Search Engines

13.Classification Example 01 - A NOAA Dataset

14. Classification Example 01 - MECA-in-Action In-Database Affinity -- Dataset Comparison Context Analysis -- Corpus Comparison Data Analytics -- Twitter -- Domain & Demographics -- WalframAlpha -- Column Analysis -- GoogleTrends -- Raw Analysis Expert Dictionary Crowdsourcing Analytics -- Hints -- People (Expert/Layman) -- Survey

15.Dancing Together!


17.The Politics of (Very Large) Datasets

18. The Whole is Greater Than Its Parts (OR “Where in the World is my Garbage?”) Asset #106213, Germany(Federal Asset #86888, -German), European Union Contractors– (International- years of English), installation. XLS Composition of municipal data. API Asset #26470 Kenya (National- Swahili), County estimates of households. CSV Asset #84834, Buenos Aires (Municipal-Spanish), Garbage Collection Asset #49857, – Division of Queensland (State- Services. API English), Corporate report of dumps.CSV

19.Thank You!