0% found this document useful (0 votes)
3 views

Introduction to Data Science Module 1 (1)

The document provides an introduction to data science, defining data types, sources, and the importance of data in decision-making and problem-solving. It discusses various forms of data visualization, including line charts, column charts, bar charts, pie charts, and scatter plots, along with best practices for their effective use. Additionally, it emphasizes the significance of understanding data for extracting meaningful insights and improving processes.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Introduction to Data Science Module 1 (1)

The document provides an introduction to data science, defining data types, sources, and the importance of data in decision-making and problem-solving. It discusses various forms of data visualization, including line charts, column charts, bar charts, pie charts, and scatter plots, along with best practices for their effective use. Additionally, it emphasizes the significance of understanding data for extracting meaningful insights and improving processes.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

INTRODUCTION TO

DATA SCIENCE
Instructor
Abubakar Yussuf
EXPERIENCE
ANALYTICS
What is data?
• Data are the raw facts about a certain thing or idea. It refers to any
thing that can bring useful information.
• Data can be volunteered ,observed, or inferred.
• Also data can be structured or unstructured
Data science
• Data science is the field of study that combines domain expertise,
programming skills, and knowledge of mathematics and statistics to
extract meaningful insights from data
Where does data come from?
• Volunteered data are data shared by the individual out of free will
and voluntarily. this is created and explicitly shared by individuals,
such as social network profiles. this type of data might include video
files ,pictures , text or audio files.

• Observed data is the data observed from a persons behavior and


actions such as location data when using cell phones.
Inferred Data
• Inferred data is the data obtained due to a persons action when doing
things. Inferred data is the result of the analytical processing of other
data—user data collected directly from companies or indirectly from
external sensors and sources—to infer characteristics of the data
subjects and make predictions about those data subjects.
Variety of data

Structured data Unstructured data


• is when data is in a standardized • Is the data that has no
format, has a well-defined identifiable structure. Example
structure, complies to a data Social media Videos, images,
model, follows a persistent order, documents etc.
and is easily accessed by humans
and programs. Example data in
JSON format, Excel(comma
based) and databases
{ name:juma } juma, elisha
Types of data
• Categorical data or qualitative data is the data that is non-numerical
mainly presented in words and some in categories . Example peoples
gender can be female or male. This data can either be nominal or
ordinal.
• Numerical data or quantitative data is the data that is presented
numerically. This data can either be continuous or discrete. Numerical
data gives information about the quantities of a specific thing.
Types of data
Qualitative data
• Nominal data is the data that shows labels but can not be ordered.
Example of nominal data is Gender(female or male), a group of
fruits ,a category of colors etc.
• Ordinal data is the categorical data that can be ordered. Example of
ordinal data can be education levels ,
Numerical/Quantitative data
• Discrete data is the data that can only contain finite number of values
such as things that can be counted as whole.
• Continuous data is the data that has an infinite number of probable
values that can be selected within a given specific range. Example
temperature
Data types
Data Type Definition Examples
Numeric data type for numbers
Integer (int) -707, 0, 707
without fractions
Numeric data type for numbers
Floating Point (float) 707.07, 0.7, 707.00
with fractions
Single letter, digit, punctuation
Character (char) a, 1, !
mark, symbol, or blank space

Sequence of characters, digits, or


String (str or text) hello, +1-999-666-3333
symbols—always treated as text

Boolean (bool) True or false values 0 (false), 1 (true)


Importance of data
• For Informed Decision-Making
• Data is used For Problem-Solving
• For Greater Understanding
• For Improving Processes
• For Understanding Behaviour
Data Visualization
• Data Visualization is the representation of data through use of
common graphics, such as charts, plots, infographics and even
animations.
• These visual displays of information communicate complex data
relationships and data-driven insights in a way that is easy to
understand.
Factors to consider when choosing a visualization:

• The number of variables, which are the characteristics measured, that


need are shown
• The number of data points, or units of information, in each variable
• Whether the data illustrates changes over time (hourly, daily, weekly)
• The need to make a comparison or correlation between different data
points
Types of Data Visualization

• Line Chart
• Column chart
• Bar chart
• Pie Chart
• Scatter plots
Line Chart
• are a type of visualization that uses lines to connect data points
• They are particularly useful for showing trends and changes over time.
• Tracking trends: Line charts are great for spotting trends in data,
such as growth in sales figures or changes in stock prices over time.
• Comparing data sets: You can use multiple lines on the same chart to
compare trends between different groups or categories.
• Highlighting seasonality: Line charts can reveal seasonal patterns in
data, such as fluctuations in website traffic or product sales throughout
the year.
Best practices when drawing Line Charts
• Make sure to add a title
• Label the Axis: The line Charts has Two axis that is X-axis and Y-axis
• Use solid lines to connects the data points
• Limit the number of lines in one chart for easier understanding of the
chart.
• Color the lines with different colors in order to distinguish which line
represent which trend.
• Make sure to add a legend that will explain each line represents what
based on colors
Column Chart
• A column chart, also sometimes called a vertical bar chart, is a type of
visualization that uses vertical bars to represent data . These bars are
helpful for comparing different categories of data and their values.
• A column chart visually displays data by using rectangles (columns),
where the height of each column corresponds to the values being
plotted.
Key elements of Column Chart
Categories: These are the groups or types of data being represented.
Each category is typically displayed on the horizontal axis (X-axis) of
the chart.
Values: These are the numerical quantities being compared. The
values are typically represented on the vertical axis (Y-axis) of the
chart. The height of each column is proportional to the value it
represents.
Columns: Rectangles that extend upwards from the X-axis for each
category. The length of each column reflects the value associated with
that category.
Axes:
• X-axis (horizontal): This axis lists the categories being compared.
• Y-axis (vertical): This axis displays the scale of the values being
measured. It usually starts at zero to accurately represent comparisons.
Bar Charts
• Bar charts, also sometimes called bar graphs, are a type of
visualization that uses rectangular bars to represent data .
• They are a versatile tool for displaying comparisons among different
categories of data.
• They is drawn horizontaly.
• For a bar chart the Y axis typically displays a category such as top
grossing movies of 2019 in the example below, whilst the X axis
displays a discrete value.
Best practices for Bar and Column Charts
• Label the axes.
• Consider ordering the bars so that the lengths go from longest to
shortest. The data type will most likely determine whether the longest
bar should be on the bottom or the top to best illustrate the intended
pattern or trend.
• Start the value of the x-axis at zero to accurately reflect the total value
of the bars.
• The spacing between bars should be roughly half the width of a bar.
Pie Charts
• A pie chart is a circular graph used to represent portions of a whole.
It's a great way to visualize data that represents categories and their
contribution to a total value.
Uses of Pie Charts:
• Showcasing proportions: Pie charts excel at highlighting the relative
sizes of different categories and their contribution to the whole.
• Simple comparisons: They are effective for comparing a few major
categories (ideally 4-6) at a glance.
• Limit the number of slices: Stick
to 4-6 slices ideally. Too many
slices make the pie chart difficult to
interpret and visually clutter the
data.
• Focus on proportions: Ensure the
pie chart is used to represent parts
of a whole, where all slices
combined add up to 100%.
• Use different colors for each
segment/slice
Scatter plot
• is a type of visual representation
used to display relationships
between two continuous variables
Uses of Scatter Plots:

• Identifying relationships: Scatter plots are a powerful tool for


exploring relationships between variables and discovering potential
cause-and-effect connections.
• Highlighting outliers: Scatter plots can reveal outliers, which are data
points that fall far away from the main cluster of points. These outliers
might warrant further investigation.
• Visualizing trends: They can visually represent trends over time or
across different categories, even if the relationship isn't perfectly
linear.
Best practices for Scatter plots
• Label your axes.
• Make sure the data set is large enough to provide visualization for
clustering or outliers.
• Start the value of the y-axis at zero to represent the data accurately.
The value of the x-axis will depend on the data. For example, age
ranges might be labeled on the x-axis.
• Consider adding a trend line if a scatter plot shows a correlation
between x- and y-axes.
• Do not use more than two trend lines.
Thank You

You might also like