请输入您要查询的百科知识:

 

词条 Data exploration
释义

  1. Interactive Data Exploration

  2. Software

  3. See also

  4. References

Data exploration is an approach similar to initial data analysis, whereby a data analyst uses visual exploration to understand what is in a dataset and the characteristics of the data, rather than through traditional data management systems[1]. These characteristics can include size or amount of data, completeness of the data, correctness of the data, possible relationships amongst data elements or files/tables in the data.

Data exploration is typically conducted using a combination of automated and manual activities.[1][2] Automated activities can include data profiling or data visualization or tabular reports to give the analyst an initial view into the data and an understanding of key characteristics.[3]

This is often followed by manual drill-down or filtering of the data to identify anomalies or patterns identified through the automated actions. Data exploration can also require manual scripting and queries into the data (e.g. using languages such as SQL or R) or using Excel or similar tools to view the raw data.[4]

All of these activities are aimed at creating a clear mental model and understanding of the data in the mind of the analyst, and defining basic metadata (statistics, structure, relationships) for the data set that can be used in further analysis.[5]

Once this initial understanding of the data is had, the data can be pruned or refined by removing unusable parts of the data, correcting poorly formatted elements and defining relevant relationships across datasets[2]. This process is also known as determining data quality[4].

Data exploration can also refer to the ad hoc querying and visualization of data to identify potential relationships or insights that may be hidden in the data[1].

Traditionally, this had been a key area of focus for statisticians, with John Tukey being a key evangelist in the field. [6]. Today, data exploration is more widespread and is the focus of data analysts and data scientists; the latter being a relatively new role within enterprises and larger organizations.

Interactive Data Exploration

This area of data exploration has become an area of interest in the field of machine learning. This is a relatively new field and is still evolving.[4] As its most basic level, a machine-learning algorithm can be fed a data set and can be used to identify whether a hypothesis is true based on the dataset. Common machine learning algorithms can focus on identifying specific patterns in the data.[2] Many common patterns include regression and classification or clustering, but there are many possible patterns and algorithms that can be applied to data via machine learning.

By employing machine learning, it is possible to find patterns or relationships in the data that would be difficult or impossible to find via manual inspection, trial and error or traditional exploration techniques.[7]

Software

  • Trifacta – a data preparation and analysis platform
  • Paxata – self-service data preparation software
  • Alteryx – data blending and advanced data analytics software
  • IBM Infosphere Analyzer – a data profiling tool
  • Microsoft Power BI - interactive visualization and data analysis tool
  • OpenRefine - a standalone open source desktop application for data clean-up and data transformation
  • Tableau software – interactive data visualization software

See also

{{Portal|Information technology}}
  • Exploratory data analysis
  • Machine learning
  • Data profiling
  • Data visualization
{{-}}

References

1. ^[https://www.fosteropenscience.eu/sites/default/files/pdf/2933.pdf FOSTER Open Science], Overview of Data Exploration Techniques: Stratos Idreos, Olga Papaemmonouil, Surajit Chaudhuri.
2. ^Stanford.edu, 2011 Wrangler: Interactive Visual Specification of Data Transformation Scripts, Kandel, Paepcke, Hellerstein Heer.
3. ^[https://www.fosteropenscience.eu/sites/default/files/pdf/2933.pdf FOSTER Open Science], Overview of Data Exploration Techniques: Stratos Idreos, Olga Papaemmonouil, Surajit Chaudhuri.
4. ^Stanford.edu, IEEE Visual Analytics Science & Technology (VAST), Oct 2012 Enterprise Data Analysis and Visualization: An Interview Study., Sean Kandel, Andreas Paepcke, Joseph Hellerstein, Jeffrey Heer Proc.
5. ^[https://www.fosteropenscience.eu/sites/default/files/pdf/2933.pdf FOSTER Open Science], Overview of Data Exploration Techniques: Stratos Idreos, Olga Papaemmonouil, Surajit Chaudhuri.
6. ^ [https://www.stat.berkeley.edu/~brill/Papers/EDASage.pdf Exploratory Data Analysis], Pearson. {{ISBN|978-0201076165}}
7. ^Machine Learning for Data Exploration

4 : Machine learning|Data analysis|Data management|Data quality

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/17 21:09:45