BA - Unit 1
BA - Unit 1
1
Introduction to Data
Science
Ms. Asha Yadav
Assistant Professor
Department of Computer Science
School of Open Learning
University of Delhi
STRUCTURE
1.1 Learning Objectives
1.2 Introduction
1.3 Data and Its Types
1.4 Data Analytics and Data Analysis
1.5 Application of Analytics in Business
1.6 Big Data and Its Characteristics
1.7 Applications of Big Data
1.8 Challenges in Data Analytics
1.9 Summary
1.10 Answers to In-Text Questions
1.11 Self-Assessment Questions
1.12 References
1.13 Suggested Readings
PAGE 1
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
1.2 Introduction
Data Science is an interdisciplinary field that combines statistics, data
analysis, and machine learning to obtain meaningful insights and knowl-
edge from data. It is based on the processes of gathering, analysing data,
and making informed decisions by using the patterns that have evolved.
It is very versatile as it allows businesses and organizations to enhance
decision-making, perform predictive analyses, and discover hidden patterns
within datasets. Data science is applied in many spheres, including banks,
healthcare, manufacturing, and e-commerce, to serve critical applications
such as optimizing routes, forecasting revenues, creating targeted promo-
tional offers, and even predicting election outcomes.
Data Scientist combines expertise in machine learning, statistics, pro-
gramming (using tools like R), mathematics, and database management
to work with raw data. This brings together a systematic approach to
asking the right questions in defining a problem, gathering and cleaning
data, standardizing it for analysis, finding trends, and presenting action-
able insights in a clear and impactful manner. By using Data Science,
organizations could tap into this capability and discover the full potential
in their data; it could be improving operational efficiency or enhancing
customer experiences and providing competitive advantage. Truly, the
field is still growing and expanding its scope and application, making it
quite in-demand in today’s data-driven world.
This lesson will serve as your foundation to understand data and data
analysis. How can we classify analytics and how this can be applied to
various businesses? We will also understand which data comes into the
category of big data and various applications and challenges that occur
while dealing with big data.
2 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 3
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Apart from types of data, its quality is also an important aspect. We are
exposed to data every day, for example, in news stories, weather reports
and advertising, but how can we determine whether the data is of good
quality or not. Quality is something that is important throughout the en-
tire data journey. The six aspects of data quality are relevance, accuracy,
timeliness, interpretability, coherence, accessibility.
Relevance: The relevance of data or statistical information reflects
the degree to which it meets the needs of data users. Some questions
that must be answered are, “does this information matter?” “Does
it fill an existing data gap?”
Accuracy: Accurate data give a true reflection of reality, a data
which is not accurate doesn’t help to gain any fruitful decision and
hence has no value.
Timeliness: It is the time when data is available to the user or
decision maker. It is the delay between the time when the data are
meaningful and when they are available. For example, the stock
information of an e-commerce needs to be updated and available
as soon as an order is placed.
Interpretability: An information that people can’t understand has
no value and could even be misleading. To avoid such misunderstandings,
data is followed by meta data which is supplementary information or
documentation that allows users to interpret the data properly.
Coherence: It can be split into two concepts: consistency and
commonality. Consistency means using the same concepts, definitions
and methods over time. Commonality means using the same or
similar concepts, definitions and methods across different statistical
programs. If there is good consistency and good commonality, then
it is easier to compare results from different studies or track how
they stay the same or change over time.
Accessibility: It is defined as how easy it is for people to find,
get, understand, and use data. When determining whether data
are accessible, make sure they are organized, available, accountable,
and interpretable.
4 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
1.4 Data Analytics and Data Analysis
Data analytics and data analysis are two terms frequently used inter-
changeably, but they also have different meanings in context of working
with data and extracting useful insights. While both have relevance for
data-based decisions making, each has its own scope and purpose.
Data analytics refers to the whole process of examining datasets in or-
der to find trends, patterns, relationships, and other insights that might
help in the decision-making processes. There are various techniques and
processes through which data is analyzed and interpreted meaningfully.
Data analytics is usually applied to answer questions or solve problems or
predict outcomes. There are four main types of data analytics: descriptive,
diagnostic, predictive and prescriptive.
Descriptive Analytics: This type of analytics focuses on summarizing
past data and describing what happened. It includes the use of
historical data to identify trends and patterns, often through statistical
measures like mean, median, and mode. Descriptive analytics answers
questions like, “What happened?” and provides insights into the
past performance of an entity or system, like how well a business
performed last year.
Diagnostic Analytics: IT is a step ahead, which identifies the
causes of certain trends or patterns identified in the descriptive
analytics phase. It addresses the “Why did it happen?” by focusing
on deeper analysis to understand root causes of the data observed,
like determining the factors of drop in subscribers of an Instagram
account.
Predictive Analytics: It has the ability to make predictions based
on historical data through statistical models, and the output could
be a machine learning algorithm. It answers the question, “What is
likely to happen?” Analyzing trends, patterns, and relationships for
future behaviours or outcomes. For example, predicting the sales
of a new trend for coming six months.
Prescriptive Analytics: This type of analytics suggests possible
actions and outcomes based on the analysis. It combines insights
from all other types of analytics to answer “What should we
PAGE 5
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
6 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 7
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes processes, and gain a competitive edge in today’s fast-paced and da-
ta-driven business world. Businesses can understand their past and present
by harnessing data from different forms of analytics as well as forecast
future trends, optimize operations, and create more personalized customer
experiences. Let’s see some examples that will help you understand the
underlying worth of analysis.
One of the most powerful applications of analytics in business
is understanding customer behavior and preferences. Business
organizations can identify trends, predict needs, and personalize their
offerings to increase customer satisfaction and loyalty by analyzing
customer data. This personalized shopping experience boosts sales
and enhances customer engagement by making it easier for customers
to find what they need. Retailers, e-commerce platforms, and even
service industries like healthcare use analytics to segment customers,
predict their future needs, and tailor their marketing campaigns or
product recommendations accordingly.
Analytics is also an important function for improving business
processes. From such studies, analyses of inventory, supply chain
logistics, and workforce performance can help tune business processes
to lower costs and optimize efficiency. The world’s largest retailer,
Walmart, makes use of predictive analytics to optimize inventory
management: it considers the seasonality of demand for particular
goods, thus ensuring that the right merchandise arrives in time at
the stores and is not overstored. It can be easily applied across
various sectors like manufacturing, logistics, and even hospitality,
where strong demand forecasting and resource optimization are key
to operational success.
Companies need to make wise financial decisions to stay ahead.
Analytics can help in forecasting revenue, manage budgets, and assess
financial risks so that companies can have a well-supported decision-
making process regarding investments and expenses. For example,
predictive analytics will help a bank assess the creditworthiness
of someone applying for a loan. Through analyzing a consumer’s
financial history, spending patterns, or even social media activity,
a bank can forecast the probability of loan repayment and charge
the appropriate interest rate.
8 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 9
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
10 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
many malls, stores, and websites around the world are generating similar Notes
information every second.
This huge amount of data is termed as big data. Businesses use this
data to understand the preference of customers, predict trends, and even
recommend products to you (like “You might also like.”). For example,
when Netflix suggests shows based on what you have watched, it’s using
big data to make smart recommendations tailored just for you.
In the modern technological world the data is expanding far too quickly
and people frequently rely on it, additionally because of the rate at which
the data is expanding it has become increasingly difficult to store the
data on any server hence to handle this to process and to analyze this
huge amount of data the concept of Big Data came into picture. It is a
collection of data that is large in volume (obviously the data is generated
every day hence it grows exponentially with time), it becomes difficult
for us to store it and manage it hence the traditional methods which were
used to store manage and handle data were proven to be inefficient.
Hence we can say Big data refers to extremely large datasets that are too
complex, vast, or fast-moving to be processed, stored, or analyzed using
traditional data processing methods. It is the accumulation, management,
and analysis of huge amounts of structured, semi-structured, and unstruc-
tured data to expose patterns, trends, and insights.
Some real-world examples of big data could be Instagram every minute
many photos and videos are shared across the world. Twitter generates
billion tweets per year each tweet can contain textual data, or it can be
image video or audio data. Gmail or outlook can also be an example as
around billion emails are sent every day and most of them contain differ-
ent attachments like text, video, photo etc. Banks, e-commerce, weather
monitoring systems, CCTVs etc. all contribute to big data.
Let’s understand the characteristics of big data or we can say 5 V’s of
big data:
Volume: It is the huge data amount generated which is major in
terms of Terabytes, petabytes, or even exabytes For example, the
likes, comments and post shared by billions of users of Facebook
every day.
PAGE 11
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
12 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 13
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
14 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 15
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
1.9 Summary
This lesson introduced the basic concepts of data and its importance in
the field of data science and analytics. It distinguished between data
analysis and data analytics, with the insight into the types of analytics:
16 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi