0% found this document useful (0 votes)
890 views52 pages

Introduction To: Data Science

This document provides an overview of data science and data analytics. It defines data science as a field that combines programming, mathematics, statistics and domain expertise to extract meaningful insights from data. It describes the typical data analytics life cycle, which involves data discovery, preparation, modeling, results communication, and operationalization. Several industry applications of data science are also discussed, such as using customer data in ecommerce for recommendations and patient data in healthcare for disease prediction.

Uploaded by

RAHUL SHARMA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
890 views52 pages

Introduction To: Data Science

This document provides an overview of data science and data analytics. It defines data science as a field that combines programming, mathematics, statistics and domain expertise to extract meaningful insights from data. It describes the typical data analytics life cycle, which involves data discovery, preparation, modeling, results communication, and operationalization. Several industry applications of data science are also discussed, such as using customer data in ecommerce for recommendations and patient data in healthcare for disease prediction.

Uploaded by

RAHUL SHARMA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Introduction to

Data Science
UNIT 1: LECTURE 01

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Content
• What is Data Science?
• Why Learn Data Science?
• Data Analytics Life Cycle
• Types of Data Analysis
• Types of Jobs in Data Analytics
• Data Science Tools
• Fundamental Areas of Study in Data Science

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Recommended Book
Book Title:
Data Science Fundamentals and Practical Approaches

Authors:
Dr. Gypsy Nandi
Dr. Rupam Kumar Sharma

Publisher:
BPB

Tagline:
Understand Why Data Science is the Next

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Why Learn
Topic 1 Data Science?

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


What is Data Science?
❑ Data Science:
❑ The task of scrutinizing and processing raw data to reach a meaningful conclusion.
❑ The field of study that combines domain expertise, programming skills, and
knowledge of mathematics and statistics to extract meaningful insights from data.
❑ Related to data mining, machine learning and big data.
❑ Generates insights that help organizations increase operational efficiency, identify
new business opportunities and improve marketing and sales programs, among
other benefits.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Science: Definition
Data science is the field of study that
combines domain expertise, programming
skills, and knowledge of mathematics and
statistics to extract meaningful insights
from data.
Data science is the domain of study that deals
with vast volumes of data using modern tools
and techniques to find unseen patterns, derive
meaningful information, and make business
decisions.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


What is Data Science?

Note: Both Data Science and Artificial


Intelligence may use Machine Learning
techniques

Fig 1: A Venn-diagram to visualizes overlapping AI-related terminology

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Important Facts
❑ The term ‘Artificial Intelligence’ was coined by John
McCarthy, a computer scientist in 1955.

❑ The term Machine Learning (ML) was coined by Arthur


Samuel in 1959.

❑ In 1962, John Tukey described a field he called "data


analysis", which resembles modern data science.

Some of the attendees of the AI workshop in 1956


❑ You can look at Deep Learning as a subset or (Image Credits: thedartmouth.com)
advancement of ML. DL comes into play when ML
cannot fully deliver desired outcomes.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Science Applications

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Why Learn Data Science?
▪ Data science is a promising field of study in today’s fast-growing data-driven
world.
▪ Few of the industry verticals where data science has found its prominence
are:
INDUSTRY VERTICALS

▪ Ecommerce: Ecommerce sites hugely involve data science for maximizing revenue
and profitability. These sites analyze the shopping and purchasing behavior of
customers and accordingly recommend products to customers for more purchases
online.

▪ Finance: The finance market is an emerging field in the data industry. The financial
analytics market takes care of risk analysis, fraud detection, shareholders’ upcoming
share status, working capital management, and so on.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Why Learn Data Science?
▪ Retail: Retail industries take care of a 360-degree view and feedback reviews of customers. The
retail analytics market analyzes customers’ purchasing trends and demands in order to get
INDUSTRY VERTICALS

products based on customers’ liking.

▪ Healthcare: The healthcare sector nowadays heavily relies on analytics of patient data to predict
diseases and health issues. Healthcare industries make an analysis of data-driven patient quality
care, improved patient care, classification of the type of symptoms of patients and predicted
health deficiencies, and so on.

▪ Education: The sources of data in education is vast, starting from student-centric data,
enrollment in various courses, scholarship and fee details, examination results, and so on.
Education analytics play a major role in academic institutions for better admission scenario,
empowerment of students for successful examination results, and all-round student performance.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Why Learn Data Science?
INDUSTRY VERTICALS

▪ Human Resource (HR): HR analytics involves HR-related data that can be used for building
strong leadership, employee acquisition, employee retention, workforce optimization, and
performance management.

▪ Sports: Nowadays, sports analytics is often used in international tournaments to analyze the
performance of players, the predicted scores, prevention of injuries, and the possibility of winning
or losing a match by a particular team.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU
Self Pace Learning Question - 1
QUESTION: What is Data Science? Cite an example of how data
science can be applied in the field of ecommerce and healthcare.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Topic 2
Data Analytics
Life Cycle

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Science vs Data Analytics
❑ Data science is an umbrella term that comprises a large variety of fields compared to data
analytics which is more focused and can be considered to be a subset of data science.
❑ Some common job titles in the data science field are:

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Analytics Life Cycle
❑ Data analytics involves mainly six important phases that are carried out in
a cycle:
❑ Data discovery
❑ Data preparation
❑ Planning of data models
❑ Building of data models
❑ Communication of results, and,
❑ Operationalization

Figure 2 illustrates the six phases of the


data analytics lifecycle that is followed
one phase after another to complete one cycle. Fig 2: The Data Analytics Life Cycle

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Analytics Life Cycle
1. Data Discovery:
❑ In this first phase of data analytics, the stakeholders regularly
perform the following tasks - examine the business trends, make
case studies of similar data analytics, and study the domain of the
business industry.
❑ The entire team makes an assessment of the in-house resources,
the in-house infrastructure, total time involved, and technology
requirements.
❑ Once all these assessments and evaluations are completed, the
Fig 2: The Data Analytics Life Cycle
stakeholders start formulating the initial hypothesis for resolving all
business challenges in terms of the current market scenario.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Analytics Life Cycle
1. Data Discovery:

❑ Summary:
❑ The data science team learn and investigate the problem.
❑ Develop context and understanding.
❑ Come to know about data sources needed and available for
the project.
❑ The team formulates initial hypothesis that can be later
Fig 2: The Data Analytics Life Cycle
tested with data.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Analytics Life Cycle
2. Data Preparation:
❑ In this second phase, data is prepared by transforming it from a legacy
system into a data analytics form by using the sandbox platform.
❑ A sandbox is a scalable platform commonly used by the data scientists
for data preprocessing. It includes huge CPUs, high capacity storage and
high I/O capacity.
❑ The IBM Netezza 1000 is one such data sandbox platform used by the
IBM Company for handling data marts.
❑ The stakeholders involved during this phase are mostly involved in the
preprocessing of data for preliminary results by using a standard Fig 2: The Data Analytics Life Cycle
sandbox platform.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Analytics Life Cycle
2. Data Preparation:
❑ Summary:
❑ Steps to explore, preprocess, and condition data
prior to modeling and analysis.
❑ It requires the presence of an analytic sandbox, the
team execute, load, and transform, to get data into
the sandbox (Figure 3).
❑ Data preparation tasks are likely to be performed
multiple times and not ration in predefined order.
❑ Several tools commonly used for this phase are –
Fig 3: The Data Science Sandbox
Hadoop, Alpine Miner, Open Refine, etc. Environment

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Analytics Life Cycle
3. Model Planning:
❑ In this third phase, the data analytics team makes proper planning
of the methods to be adapted and the various workflow to be
followed during the next phase of model building.
❑ At this stage, the various division of work among the team is
decided to clearly define the workload among the team members.
❑ The data prepared in the previous phase is further explored to
understand the various features and their relationships and also
Fig 2: The Data Analytics Life Cycle
perform feature selection for applying it to the model.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Analytics Life Cycle
4. Model Building:
❑ In this phase, the team works on developing datasets for
training and testing as well as for production purposes.
❑ Also, the execution of the model, based on the planning
made in the previous phase, is carried out.
❑ The kind of environment needed for execution of the model
is decided and prepared so that if a more robust environment
is required, it is accordingly applied. Fig 2: The Data Analytics Life Cycle

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Analytics Life Cycle
5. Communicate Results:
❑ Phase five of the life cycle checks the results of the project to
find whether it is a success or failure.
❑ The result is scrutinized by the entire team along with its
stakeholders to draw inferences on the key findings and
summarize the entire work done.
❑ Also, the business values are quantified and an elaborate
narrative on the key findings is prepared that is discussed among
Fig 2: The Data Analytics Life Cycle
the various stakeholders.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Analytics Life Cycle
6. Operationalization:
❑ In this last phase, a final report is prepared by the team along
with the briefings, source codes, and related documents.
❑ The last phase also involves running the pilot project to
implement the model and test it in a real-time environment.
❑ As data analytics help build models that lead to better decision
making, it, in turn, adds values to individuals, customers,
business sectors and other organizations.
Fig 2: The Data Analytics Life Cycle

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Topic 3
Types of
Data Analysis

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Types of Data Analysis
❑ There are many different ways to analyze
data.
❑ Some forms are more complex than others
based on which data analysis has been
broadly divided into four types, namely –
❑ Descriptive analysis
❑ Diagnostic analysis
❑ Predictive analysis, and,
❑ Prescriptive analysis

Fig 4: Four types of data analysis based on


❑ Figure 4 demonstrates the level of complexity
the level of complexity
of each of these four types of data analysis.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Types of Data Analysis
1. Descriptive Analysis:
❑ This is the simplest and the most common type of data analysis used by
companies and other sectors.
❑ This type of data analysis is mostly used in businesses to generate monthly
revenue reports, sales leads, and key performance indicators (KPI) dashboards.
❑ The results or reports generated are based on data that are already available.
❑ The main emphasis in the descriptive analysis is given on ‘What has
happened?’ by analyzing valuable information found from the available past
Fig 4: Four types of data analysis based on the
data. level of complexity

❑ For example, with descriptive analysis, a data analyst will be able to generate
the statistical results of the performance of the cricket players of team India.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Types of Data Analysis
2. Diagnostic Analysis:
❑ The diagnostic analysis differs from the descriptive analysis by simply not
emphasizing only ‘What has happened?’ but also on ‘Why it happened?’
❑ This type of data analysis tries to gain a deeper understanding of the
reasons behind the pattern of data found in the past.
❑ Here, business intelligence comes into play by digging down to find the
root cause of the pattern or nature of data obtained.
❑ For example, with diagnostic analysis, a data analyst will be able to find Fig 4: Four types of data analysis based on the
level of complexity
why the performance of each player of the cricket team of India has risen
(or degraded) in the recent past six months.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Types of Data Analysis
3. Predictive Analysis:
❑ Predictive analysis, as the name suggests, deals with prediction of future
based on the available current and past data.
❑ The main emphasis in the descriptive analysis is given on ‘What is likely to
happen?’ by utilizing previous data to find the future outcome.
❑ For example, with predictive analysis, a data analyst will be able to predict
the performance of each player of the Indian cricket team for the
upcoming international cricket world cup. Fig 4: Four types of data analysis based on the
level of complexity
❑ Such prediction can help the Board of Cricket Council of India (BCCI) to
decide on the players’ selection for the upcoming international cricket
tournament.
DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU
Types of Data Analysis
4. Prescriptive Analysis:
❑ The final type of data analysis which is the highest in terms of complexity is
called predictive analysis.
❑ In this type of data analysis, the insights gained from all the other three types
of data analysis are combined to determine the kind of action to be taken to
solve a certain situation.
❑ Prescriptive analysis prescribes what steps are needed to be taken to avoid a
future problem.
❑ It involves a high degree of responsibility, time, and complicacy to reach to Fig 4: Four types of data analysis based on the
informed decision-making. level of complexity

❑ Thus, the predictive analysis makes recommendations based on the


forecasting done in predictive analysis.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Self Pace Learning Question - 2
QUESTION: Give an example, on your own, of the four types of data
analysis.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Topic 4
Types of Jobs in
Data Analytics

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Types of Jobs in Data Analytics
▪ The various key stakeholders in any data analysis project include:
▪ The data analyst
▪ The data scientist
▪ The data engineer
▪ The database administrator, and,
▪ The analytics manager

▪ Figure 5 shows some of the key stakeholders involved in any data


analytics-based project.
▪ Each stakeholder has a clear role to play for a business problem right
from understanding the essentials of the problem, proper planning, Fig 5: Some of the key stakeholders in the
implementation of the project, analyzing the various outcomes of the Data Analytics projects
project, solving the bottlenecks visible in the outcomes, and generating
reports by drawing inferences about the success of the project.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Types of Jobs in Data Analytics
1. Data Analyst:
▪ The main role of a data analyst is to extract data and interpret the
information attained from the data for analyzing the outcome of
a given problem in business.
▪ The major skills required to be a data analyst are Python and/or
R programming skills, Structured Query Language (SQL),
Statistical Analysis Software (SAS), SAS Miner, Microsoft Excel
and/or Tableau.
▪ The key areas and techniques which a data analyst should be
well-versed with include the following:
▪ Data Preprocessing
▪ Data visualization
Fig 5: Some of the key stakeholders in the
▪ Statistical modeling Data Analytics projects
▪ Programming skills
▪ Communication and Presentation skills

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Types of Jobs in Data Analytics
2. Data Scientist:
▪ A data scientist incurs all the skills of a data analyst with the
additional skills of data wrangling, complex machine learning, Big
Data tools, and software engineering.
▪ Data scientists mainly deal with large and complex data that can be
of high dimension, and carry out appropriate machine learning and
visualization tools to convert the complex data into easily
interpretable meaningful information.
▪ The key areas and techniques which a data analyst should be well-
versed with include the following:
▪ Some of the fundamental prerequisites that a data scientist should be
thorough with are as follows:
▪ Statistics
▪ Mathematics Fig 5: Some of the key stakeholders in the
▪ Computer Programming Data Analytics projects
▪ Database Handling

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Types of Jobs in Data Analytics
3. Data Engineer:
▪ The job of a data engineer comes first, and then the data is
handed over to a data analyst or data scientist for analysis.
▪ Thus, the role of a data engineer is not to analyze data but rather
to prepare, manage and convert data into a form that can be
readily used by a data analyst or data scientist.
▪ Few of the prominent work a data engineer is involved in
include the following:
▪ Developing and maintaining data architectures.
▪ Aligning data architectures with the business or project requirements.
▪ Improving data quality and raising data efficiency.
Fig 5: Some of the key stakeholders in the
▪ Performing predictive and prescriptive modeling for given input data. Data Analytics projects
▪ Engaging oneself with the other stakeholders to explain the details of
the converted data.
DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU
Types of Jobs in Data Analytics
4. Database Administrator (DBA):
▪ The DBA operates and administers the database.
▪ The technical skills required by a DBA are SQL, scripting, database
performance tuning, and system and network design.
▪ The backup and recovery of databases are also handled by a DBA.
▪ Few of the prominent work a database administrator is involved in
include the following:
▪ Database designing as per end-user requirements.
▪ Providing (or revoking) rights to (or from) database end-users.
▪ Enabling efficient data backup and data recovery mechanisms.
▪ Database related training to end-users.
Fig 5: Some of the key stakeholders in the
▪ Ensuring data privacy and security.
Data Analytics projects
▪ Managing data integrity for end-users.
▪ Monitoring the performances of the database.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Types of Jobs in Data Analytics
5. Data Architect:
▪ The data architect provides the support of various tools and
platforms that are required by data engineers to carry out
various tests with precision.
▪ Data architects should be well equipped with knowledge of
data modeling and data warehousing.
▪ The other additional skills required by a data architect are
Extraction, Transformation, and Load (ETL), and knowledge of
Hive, Pig, and Spark.

Fig 5: Some of the key stakeholders in the


Data Analytics projects

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Types of Jobs in Data Analytics
6. Analytics Manager:
▪ The analytics manager is involved in the overall management of the
various data analytics operations.
▪ The major skills required to be an analytics manager are Python
and/or R programming skills, Structured Query Language (SQL),
and Statistical Analysis Software (SAS).
▪ Also, an analytics manager should have good leadership and social
skills.
▪ Few of the prominent work an analytics manager is involved in
include the following:
▪ Leading the data analysts’ team.
▪ Having a thorough understanding of the business requirements and
objectives. Fig 5: Some of the key stakeholders in the
▪ Configuring and implementing data analytics solutions. Data Analytics projects
▪ Ensuring the quality results of the reports developed by every team.
▪ Keeping an update on recent industry and business trends.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Topic 5
Data Science
Tools

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Science Tools
❑ There are many popular tools and techniques used by data scientists and data
analysts.
❑ One best thing about these tools is that most of these tools are popular, user-
friendly and open-source and provide good performance in the field of data
science.
❑ Six such open-source tools that can be learned and adapted by any beginner
or researcher who wants to explore in the field of data science are:
1. Python Programming
2. R Programming
3. SAS (Statistical Analysis System)
4. Tableau Public
5. Microsoft Excel
6. RapidMiner

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Science Tools
❑ Python Programming - Python is an open-source tool and falls
under object-oriented scripting language. It was found in the 1980s by
Guido van Rossum and is famous for the implementation of data
preprocessing, statistical analysis, machine learning and deep
learning, which are the core tasks in any data science project.
❑ R Programming - R is also an open-source tool that is often used for
data science. It was developed by Ross Ihaka and Robert Gentleman,
both of whose first names start with the letter R and hence the name
‘R’ has been given for this language.
❑ SAS (Statistical Analysis System) – SAS is mainly used for
integrating data from multiple sources and generating statistical
results based on the input data fed into the environment.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Data Science Tools
❑ Tableau Public – Tableau is a data visualization software which has its free
version named as Tableau Public. It was developed in 2003 by four founders
from the United States. The preparation, analysis, and presentation of input
data can be all done in Tableau with various drag and drop features and easy
available menus.
❑ Microsoft Excel – Microsoft Excel is a data analytics tool widely used due to
its simplicity and easy interpretation of complex data analytical tasks. It was
released in the year 1987 by the Microsoft Company to handle numerical
calculations efficiently.
❑ RapidMiner – RapidMiner is a data science software platform developed by
the RapidMiner Company in the year 2006. It is written in the Java language
and has a GUI that is used for designing and executing workflows related to
data analytics.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Fundamental Areas
Topic 6 of Study in
Data Science
DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU
Fundamental Areas of Study in Data Science
❑ Data science is a broad term that encompasses multiple disciplines.

❑ It is a rapidly growing field of study that uses scientific methods to extract


meaningful insights from given input data.

❑ Few of the fundamental areas of study for mastering data science are:
❑ Machine learning ❑ Text mining
❑ Deep learning ❑ Recommender systems
❑ Natural language processing ❑ Data visualization
❑ Statistical data analysis ❑ Computer vision
❑ Knowledge discovery and data mining ❑ Spatial data management

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Fundamental Areas of Study in Data Science
1. Machine Learning:
❑ The basic idea of machine learning is to allow machines (computers) to
independently learn from the wealth of data that is fed as input into the machine.

❑ To master in machine learning, a learner needs to have an in-depth knowledge of


computer fundamentals, programming skills, data modeling, and evaluation skills,
probability, and statistics.

2. Deep Learning:
❑ Deep learning is often used in data science as it is computationally very competent
compared to traditional machine learning methods, which require human
intervention before being machine trained.

❑ Deep learning helps in analyzing a bulk amount of data through a hierarchical


learning process.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Fundamental Areas of Study in Data Science
3. Natural Language Processing:
❑ The field of NLP is found to be highly beneficial for resolving ambiguity in the various
languages spoken worldwide and is a key area of study for text analytics as well as
speech recognition.

❑ NLP, as an important branch of data science, plays a vital role in extracting insights from
the input text.

4. Statistical Data Analysis:


❑ Statistics is a branch of mathematics that includes the collection, analysis, interpretation,
and validation of stored data.

❑ Statistical data analysis allows the execution of statistical operations using quantitative
approaches.

❑ Few such important concepts in statistical data analysis include descriptive statistics,
data distributions, conditional probability, hypothesis-testing, and regression.
DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU
Fundamental Areas of Study in Data Science
5. Knowledge Discovery and Data Mining (KDD):
❑ Data mining, a major step in Knowledge Discovery from Data (KDD), has evolved as a
prominent field in all these years as the demand for discovering meaningful patterns
from the data has given rise to meaningful output for data analysis.

❑ Data alone makes no sense in the analysis world until this data is converted and
interpreted to some meaningful form and this is done through the process of data mining
in KDD.

6. Text Mining:
❑ Text mining includes the method of deriving high quality information from text.
❑ Some of the prominent text mining tasks include text clustering, document
summarization, sentiment analysis through text, text categorization, and concept
extraction.
❑ Text analytics is extensively used for research in data science, business intelligence, or
exploratory data analysis.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Fundamental Areas of Study in Data Science
7. Recommender Systems:
❑ The various web services such as Amazon, YouTube, and Netflix, and various e-commerce
sites such as Flipkart and Snapdeal use recommender systems to provide suggestions to
online users about new and relevant items.

❑ Nowadays, building efficient recommender systems are a part and parcel of every online
business as they indirectly help in generating a huge amount of revenue and make the
business flourish well when compared to other competitors.

8. Data Visualization:
❑ Visualization is the graphical representation of data that can make information easy to
analyze and understand.
❑ Data visualization has the power of illustrating complex data relationships and patterns
with the help of simple designs consisting of lines, shapes, and colours.

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU


Fundamental Areas of Study in Data Science
9. Computer Vision:
❑ Computer vision is a field of artificial intelligence that trains machines or computers to
understand and analyze the visual world.

❑ Computer vision differs from image processing in that it uses the three-dimensional
structure of images for a varied angle view of an image for a better understanding of a
static scene.

10. Spatial Data Management:


❑ Geographic information systems (GIS) technology has seen a recent uplift in recent years
as companies focus a lot on geospatial data that are generated from multiple sources.
❑ Geospatial data are structured data that includes object information in the spatial
universe.
❑ The objects can be buildings, roads, landmarks, ecosystems, and any such landmarks
that consist of many spatial features such as the identity of the object, its location,
orientation, and dimension.
DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU
Self Pace Learning Question - 3
QUESTION: Differentiate between Data Science, Machine Learning,
and Artificial Intelligence

DR. GYPSY NANDI, DEPT OF COMPUTER APPLICATIONS, ADBU

You might also like