Introduction to Data Science
Introduction to Data Science
Unit -1
Contents
• Introduction
• Need for Data Science
• Components of Data Science
• Data Acquisition and Data Science Life-Cycle
• Basic Tools of Data Science
• Difference between BI and Data Science
• Applications of Data Science
• Role of Data Scientist
What is Data Science?
• Data science is an
interdisciplinary field that uses
scientific techniques, procedures,
algorithms, and structures to
extract knowledge and insights
from structured and unstructured
data.
• It combines elements of statistics,
mathematics, programming, and
domain expertise to transform
data into actionable insights.
Need for Data Science
1. Informed Decision Making:
– Empowers data-driven decisions
– Enhances forecasting and planning
2. Competitive Advantage:
– Optimizes operations
– Improves customer experience
4. Personalization:
– Tailors products and services
– Increases customer satisfaction
5. Risk Management:
– Assesses and mitigates risks
– Detects fraud and anomalies
Need for Data Science
6.Healthcare Improvements:
– Enables predictive diagnostics
– Enhances patient care
7.Scientific Research:
– Accelerates discoveries
– Validates hypotheses
8.Social Good:
– Accelerates discoveries
– Validates hypotheses
9.Customer Insights:
– Understands customer behavior
– Enhances retention strategies
5. Advanced computing: Heavy lifting of data science is advanced computing. Advanced computing involves
designing, writing, debugging, and maintaining the source code of computer programs.
6. Mathematics: Mathematics is the critical part of data science. Mathematics involves the study of quantity,
structure, space, and changes. For a data scientist, knowledge of good mathematics is essential.
7. Machine learning: Machine learning is backbone of data science. Machine learning is all about to provide
training to a machine so that it can act as a human brain. In data science, we use various machine learning algorithms to
solve the problems.
Data Acquisition
• Data acquisition is the comprehensive process of systematically collecting,
measuring, and recording data from various sources to facilitate analysis and
decision-making. This process encompasses a wide range of techniques and
tools designed to gather raw data from different origins, ensuring that the
data is accurate, relevant, and suitable for further analysis.
• Data acquisition, also known as the process of collecting data, relies on
specialized softwarethat quickly captures, processes, and stores information.
It enables scientists and engineers to perform in-depth analysis for scientific
or engineering purposes.
• Data acquisition systems are available in handheld and remote versions to
cater to different measurement requirements. Handheld systems are suitable
for direct interaction with subjects while remote systems excel at distant
measurements, providing versatility in data collection.
Components of Data Acquisition
Components of Data Acquisition
• Sensors: Devices that gather information about physical or environmental conditions, such as te
mperature, pressure, or light intensity.
• Signal Conditioning: To ensureaccurate measurement, the raw sensor data undergoes preprocessing
to filter out any noiseand scale it appropriately.
• Data Logger: Hardware or software that records and stores the conditioned data over time.
• Analog-to-Digital Converter (ADC): Converts analog sensor signals into digital data that
computers can process.
• Interface: Connects the data acquisition system to a computer or controller for data transfer and
control.
• Power Supply: Provides the necessary electrical power to operate the system and sensors.
• Control Unit: The management of the data acquisition system involves overseeing its overall
operation, which includes tasks such as triggering, timing, and synchronization.
• Software: Allows users to configure, monitor, and analyze the data collected by the system.
Components of Data Acquisition
• Communication Protocols: The transmission and reception of data between a system and
external devices or networks is known as data communication.
• Storage: For storing recorded data, there are a rangeof options available, including memory
cards, hard drives, or cloud storage. These provide both temporary and permanent storage
solutions.
• User Interface: This system allows users to interact with and control the data acquisition
system effectively.
• Calibration and Calibration Standards: To ensure accuracy the sensors and system are
periodically calibrated against known standards.
• Real-time Clock (RTC): Accurate timing is maintained to ensure synchronized data
acquisition and timestamping.
• Triggering Mechanism: Data capture is initiated based on predefined events or specific
conditions.
• Data Compression: Efforts are made to reduce the size of collected data for storage and
transmission in remote or resource limited applications.
Key Elements of Data Acquisition
• The stage of Model deployment involves the creation of a delivery mechanism required to get the mode out in the market among
the users or to another system. Machine learning models are also deployed on devices and gaining adoption and popularity in
the field of computing. From simple model output in a Tableau Dashboard to a complex as scaling it to cloud in front of millions
of users, this step is distinct for different projects.
Basic Tools of Data Science
1. Programming Languages
• Python: Widely used for its simplicity and rich ecosystem of libraries for data analysis, visualization, and
machine learning.
• R: Popular in the statistics and data analysis community, with strong visualization capabilities.
4 Flexibility Data science is much more flexible as It is less flexible as in case of business
data sources can be added as per intelligence data sources need to be pre-
requirement. planned.
5 Method It makes use of the scientific method. It makes use of the analytic method.
6 Complexity It has a higher complexity in comparison It is much simpler when compared to data
to business intelligence. science.
Difference between BI and Data Science
Sr No Factor Data Science Business Intelligence
7 Expertise It’s expertise is data scientist. It’s expertise is the business user.
8 Questions It deals with the questions of what will happen and what if. It deals with the question of what happened.
9 Storage The data to be used is disseminated in real-time clusters. Data warehouse is utilized to hold data.
10 Integration of The ELT (Extract-Load-Transform) process is generally The ETL (Extract-Transform-Load) process is generally
data used for the integration of data for data science used for the integration of data for business intelligence
applications. application
11 Tools It’s tools are SAS, BigML, MATLAB, Excel, etc. It’s tools are InsightSquared Sales Analytics, Klipfolio,
ThoughtSpot, Cyfe, TIBCO Spotfire, etc.
12 Usage Companies can harness their potential by anticipating the Business Intelligence helps in performing root cause
future scenario using data science in order to reduce risk analysis on a failure or to understand the current status.
and increase income.
13 Greater business value is achieved with data science in Business Intelligence has lesser business value as the
Business comparison to business intelligence as it anticipates future extraction process of business value carries out
Value events. statically by plotting charts and KPIs (Key Performance
Indicator).
14 Handling data The technologies such as Hadoop are available and others The sufficient tools and technologies are not available
sets are evolving for handling understandingItsItsarge data sets. for handling large data sets.
Applications of Data Science
1.Image recognition and speech recognition:
• Data science is currently using for Image and speech recognition. When you upload an image on Facebook and
start getting the suggestion to tag to your friends. This automatic tagging suggestion uses image recognition
algorithm, which is part of data science.
• When you say something using, "Ok Google, Siri, Cortana", etc., and these devices respond as per voice control,
so this is possible with speech recognition algorithm.
2.Gaming world:
• In the gaming world, the use of Machine learning algorithms is increasing day by day. EA Sports, Sony, Nintendo,
are widely using data science for enhancing user experience.
3.Internet search:
• When we want to search for something on the internet, then we use different types of search engines such as
Google, Yahoo, Bing, Ask, etc. All these search engines use the data science technology to make the search
experience better, and you can get a search result with a fraction of seconds.
4.Transport:
• Transport industries also using data science technology to create self-driving cars. With self-driving cars, it will be
Applications of Data Science
5.Healthcare:
• In the healthcare sector, data science is providing lots of benefits. Data science is being used for tumor
detection, drug discovery, medical image analysis, virtual medical bots, etc.
6.Recommendation systems:
• Most of the companies, such as Amazon, Netflix, Google Play, etc., are using data science technology for
making a better user experience with personalized recommendations. Such as, when you search for
something on Amazon, and you started getting suggestions for similar products, so this is because of data
science technology.
7.Risk detection:
• Finance industries always had an issue of fraud and risk of losses, but with the help of data science, this can
be rescued.
• Most of the finance companies are looking for the data scientist to avoid risk and any type of losses with an
increase in customer satisfaction.
Role of Data Scientist.
Data scientist roles and responsibilities include: