Data Science Introduction
Data Science Introduction
Data Science is about finding patterns in data, through analysis, and make
future predictions.
Data Science enables companies to efficiently understand gigantic data from multiple sources
and derive valuable insights to make smarter data-driven decisions. Data Science is widely
used in various industry domains, including marketing, healthcare, finance, banking, policy
work, and more.
Consumer goods
Stock markets
Industry
Politics
Logistic companies
E-commerce
Machine Learning
Statistics
Programming (Python or R)
Mathematics
Databases
A Data Scientist must find patterns within the data. Before he/she can find the
patterns, he/she must organize the data in a standard format.
Structured data
Unstructured data
Unstructured Data
Unstructured data is not organized. We must organize the data for analysis
purposes.
Structured Data
Structured data is organized and easier to work with.
Example of an array:
[80, 85, 90, 95, 100, 105, 110, 115, 120, 125]
Example
Array = [80, 85, 90, 95, 100, 105, 110, 115, 120, 125]
print(Array)
Try it Yourself »
Database Table
A database table is a table with structured data.
The following table shows a database table with health data extracted from a
sports watch:
30 80 120 240 10
30 85 120 250 10
45 90 130 260 8
45 95 130 270 8
Variables
A variable is defined as something that can be measured or counted.
In the example under, we can observe that each column represents a variable.
Duration Average_Pulse Max_Pulse Calorie_Burnage Hours_Work
30 80 120 240 10
30 85 120 250 10
45 90 130 260 8
45 95 130 270 8
But if there are 11 rows, how come there are only 10 observations?
It is because the first row is the label, meaning that it is the name of
the variable.
Python. Python is the most widely used data science programming language in the world today.
It is an open-source, easy-to-use language that has been around since the year 1991.