Eds Unit 3
Eds Unit 3
DATA SCIENCE
Unit III
0
Unit III-Data Modeling and Exploration
Data Modeling
1
2
3
4
5
6
Data Exploration: Introduction
7
8
9
10
11
Data Visualization Process/Workflow
The next step is collecting data. You can use existing datasets if
they’re relevant to your research question. Alternatively, you can
download open-source datasets from the internet or do web scraping to
collect data.
Real-world data are messy. So, you need to clean them before using
them for visualization. You can identify missing values and outliers
and treat them accordingly. You can perform feature selection and
remove unnecessary features from the data. You can create a new
set of features based on the original features.
12
4. Choose a chart type
6. Prepare data
7. Create a chart
This is the final step. Here. You define the title and names for the
axes. You should also choose a proper chart background to ensure
the content is easily readable.
13
Data Visualization Techniques in Data Science
1. Univariate Analysis
2. Bivariate Analysis
3. Multivariate Analysis
Advantages
14
• Communicate your results or findings with your audience
• Tune hyperparameters
• Identify trends, patterns and correlations between variables
• Monitor the model’s performance
• Clean data
• Validate the model’s assumptions
Disadvantages
15
16
17
18
19
Importance of data visualization
• the ability to absorb information quickly, improve insights and make faster decisions;
• an increased understanding of the next steps that must be taken to improve the organization;
• an improved ability to maintain the audience's interest with information they can
understand;
20
• an easy distribution of information that increases the opportunity to share insights with
everyone involved;
• eliminate the need for data scientists since data is more accessible and understandable; and
• an increased ability to act on findings quickly and, therefore, achieve success with greater
speed and less mistakes.
The increased popularity of big data and data analysis projects have made visualization more
important than ever. Companies are increasingly using machine learning to gather massive
amounts of data that can be difficult and slow to sort through, comprehend and explain.
Visualization offers a means to speed this up and present information to business owners and
stakeholders in ways they can understand.
Big data visualization often goes beyond the typical techniques used in normal visualization,
such as pie charts, histograms and corporate graphs. It instead uses more complex
representations, such as heat maps and fever charts. Big data visualization requires powerful
computer systems to collect raw data, process it and turn it into graphical representations that
humans can use to quickly draw insights.
While big data visualization can be beneficial, it can pose several disadvantages to
organizations. They are as follows:
• To get the most out of big data visualization tools, a visualization specialist must be hired.
This specialist must be able to identify the best data sets and visualization styles to
guarantee organizations are optimizing the use of their data.
• Big data visualization projects often require involvement from IT, as well as management,
since the visualization of big data requires powerful computer hardware, efficient storage
systems and even a move to the cloud.
• The insights provided by big data visualization will only be as accurate as the information
being visualized. Therefore, it is essential to have people and processes in place to govern
and control the quality of corporate data, metadata and data sources.
21