Introduction To: Data Science
Introduction To: Data Science
Data Science
UNIT 1: LECTURE 01
Authors:
Dr. Gypsy Nandi
Dr. Rupam Kumar Sharma
Publisher:
BPB
Tagline:
Understand Why Data Science is the Next
▪ Ecommerce: Ecommerce sites hugely involve data science for maximizing revenue
and profitability. These sites analyze the shopping and purchasing behavior of
customers and accordingly recommend products to customers for more purchases
online.
▪ Finance: The finance market is an emerging field in the data industry. The financial
analytics market takes care of risk analysis, fraud detection, shareholders’ upcoming
share status, working capital management, and so on.
▪ Healthcare: The healthcare sector nowadays heavily relies on analytics of patient data to predict
diseases and health issues. Healthcare industries make an analysis of data-driven patient quality
care, improved patient care, classification of the type of symptoms of patients and predicted
health deficiencies, and so on.
▪ Education: The sources of data in education is vast, starting from student-centric data,
enrollment in various courses, scholarship and fee details, examination results, and so on.
Education analytics play a major role in academic institutions for better admission scenario,
empowerment of students for successful examination results, and all-round student performance.
▪ Human Resource (HR): HR analytics involves HR-related data that can be used for building
strong leadership, employee acquisition, employee retention, workforce optimization, and
performance management.
▪ Sports: Nowadays, sports analytics is often used in international tournaments to analyze the
performance of players, the predicted scores, prevention of injuries, and the possibility of winning
or losing a match by a particular team.
❑ Summary:
❑ The data science team learn and investigate the problem.
❑ Develop context and understanding.
❑ Come to know about data sources needed and available for
the project.
❑ The team formulates initial hypothesis that can be later
Fig 2: The Data Analytics Life Cycle
tested with data.
❑ For example, with descriptive analysis, a data analyst will be able to generate
the statistical results of the performance of the cricket players of team India.
❑ Few of the fundamental areas of study for mastering data science are:
❑ Machine learning ❑ Text mining
❑ Deep learning ❑ Recommender systems
❑ Natural language processing ❑ Data visualization
❑ Statistical data analysis ❑ Computer vision
❑ Knowledge discovery and data mining ❑ Spatial data management
2. Deep Learning:
❑ Deep learning is often used in data science as it is computationally very competent
compared to traditional machine learning methods, which require human
intervention before being machine trained.
❑ NLP, as an important branch of data science, plays a vital role in extracting insights from
the input text.
❑ Statistical data analysis allows the execution of statistical operations using quantitative
approaches.
❑ Few such important concepts in statistical data analysis include descriptive statistics,
data distributions, conditional probability, hypothesis-testing, and regression.
DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU
Fundamental Areas of Study in Data Science
5. Knowledge Discovery and Data Mining (KDD):
❑ Data mining, a major step in Knowledge Discovery from Data (KDD), has evolved as a
prominent field in all these years as the demand for discovering meaningful patterns
from the data has given rise to meaningful output for data analysis.
❑ Data alone makes no sense in the analysis world until this data is converted and
interpreted to some meaningful form and this is done through the process of data mining
in KDD.
6. Text Mining:
❑ Text mining includes the method of deriving high quality information from text.
❑ Some of the prominent text mining tasks include text clustering, document
summarization, sentiment analysis through text, text categorization, and concept
extraction.
❑ Text analytics is extensively used for research in data science, business intelligence, or
exploratory data analysis.
❑ Nowadays, building efficient recommender systems are a part and parcel of every online
business as they indirectly help in generating a huge amount of revenue and make the
business flourish well when compared to other competitors.
8. Data Visualization:
❑ Visualization is the graphical representation of data that can make information easy to
analyze and understand.
❑ Data visualization has the power of illustrating complex data relationships and patterns
with the help of simple designs consisting of lines, shapes, and colours.
❑ Computer vision differs from image processing in that it uses the three-dimensional
structure of images for a varied angle view of an image for a better understanding of a
static scene.