0% found this document useful (0 votes)

54 views107 pages

Data Science UNIT 1 Final

Uploaded by

rishavsingh7478

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views107 pages

Data Science UNIT 1 Final

Uploaded by

rishavsingh7478

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 107

UEIT501 – DATA SCIENCE

UNIT I INTRODUCTION TO DATA

SCIENCE
SYLLABUS
Concept of Data science
 History
 Application areas
 Traits of Big data
 web scarping
 Analysis vs reporting
Concept of Data science
WHAT IS DATA?
• Measureable units of information gathered or captured from
activity of people, places and things.
• Data is everywhere, we need to handle and store it properly,
without any error.
• Statistics operate on variables, not data
• A variable is a function mapping data objects to values
• Visualization represent data
• There are two types of data: Qualitative and Quantitative
data
WHAT IS DATA SCIENCE?

• Data science is a deep study of the massive amount of data, which involves
extracting meaningful insights from raw, structured, and unstructured data that

is processed using the scientific method, different technologies, and algorithms.

• It is a multidisciplinary field that uses tools and techniques to manipulate the

data so that you can find something new and meaningful.

• It is a multidisciplinary approach that combines principles and practices from the

fields of mathematics, statistics, artificial intelligence, and computer engineering

to analyze large amounts of data

• Data Science Contd.,
• Data science uses the most powerful hardware, programming
systems, and most efficient algorithms to solve the data
related problems. It is the future of artificial intelligence.
• In short, we can say that data science is all about:
• Asking the correct questions and analyzing the raw data.
• Modeling the data using various complex and efficient
algorithms.
• Visualizing the data to get a better perspective.
• Understanding the data to make better decisions and finding
the final result.
WHY IS DATA SCIENCE IMPORTANT?

• Data science is important because it combines tools, methods, and

technology to generate meaning from data.

• Online systems and payment portals capture more data in the fields of e-
commerce, medicine, finance, and every other aspect of human life. We

have text, audio, video, and image data available in vast quantities.
NEED FOR DATA SCIENCE:
DATA SCIENCE TECHNIQUES?

• Data science professionals use computing

systems to follow the data science process.
• The top techniques used by data scientists
are:
Classification
Regression
Clustering
DATA SCIENCE TECHNIQUES?
DATA SCIENCE TECHNIQUES?
DATA SCIENCE TECHNIQUES?
FUTURE OF DATA SCIENCE?

• Artificialintelligence and machine learning

innovations have made data processing faster and
more efficient.
• Industry demand has created an ecosystem of
courses, degrees, and job positions within the field
of data science.
INTRODUCTION TO DATA SCIENCE
CONCEPT OF DATA SCIENCE
TYPES OF DATA SCIENCE JOB

• Data Scientist
• Data Analyst
• Machine learning expert
• Data engineer
• Data Architect
• Data Administrator
• Business Analyst
• Business Intelligence Manager
DATA NATURE: QUANTITATIVE VS
QUALITATIVE
• Structured data is often referred to as quantitative data. It means
that such data commonly contains precise numbers or textual
elements that can be counted. The analysis methods are clear and
easy-to-apply. Among them there are:
• classification or arranging stored items of data into similar classes
based on common features,
• regression or investigation of the relationships and dependencies
between variables, and
• data clustering or organizing the data points into specific groups
based on various attributes.
FIND THE NAMES OF THE LOGO
FIND THE NAMES OF THE LOGO
STRUCTURED DATA USE CASE
EXAMPLES
• Online booking : Different hotel booking and ticket reservation services leverage the
advantages of the pre-defined data model as all booking data such as dates, prices,
destinations, etc. fit into a standard data structure with rows and columns.
• ATMs : Any ATM is a great example of how relational databases and structured data work.
All the actions a user can do follow a pre-defined model.
• Inventory control systems : There are lots of variants of inventory control systems
companies use, but they all rely on a highly organized environment of relational
databases.
• Banking and accounting : Different companies and banks must process and record
huge amounts of financial transactions. Consequently, they make use of traditional
database management systems to keep structured data in place.
UNSTRUCTURED DATA USE CASE
EXAMPLES
• Sound recognition. Call centers use speech recognition to identify
customers and collect information about their queries and emotions.
• Image recognition. Online retailers take advantage of image recognition
so that customers can shop from their phones by posting a photo of the
desired item.
• Text analytics. Manufacturers make use of advanced text analytics to
examine warranty claims from customers and dealers and elicit specific
items of important information for further clustering and processing.
• Chatbots. Using natural language processing (NLP) for text analysis,
chatbots help different companies boost customer satisfaction from their
services. Depending on the question input, customers are routed to the
corresponding representatives that would provide comprehensive answers
TYPES OF DATA
QUALITATIVE OR CATEGORICAL DATA

• Can’t be measured or counted in the form of numbers

• Data consist of audio, images, symbols, or text. The gender of a person, i.e., male,
female, or others, is qualitative data.
• Qualitative data tells about the perception of people.
• Helps market researchers understand the customers’ tastes and then design their
ideas and strategies accordingly
• The other examples of qualitative data are :
• What language do you speak
• Favourite holiday destination
• Opinion on something (agree, disagree, or neutral)
• Colours
NOMINAL DATA
• Nominal Data is used to label variables without any order or quantitative value.
we can’t do any numerical tasks or can’t give any order to sort the data.
• nominal” comes from the Latin name “nomen,” which means “name.”
• Examples of Nominal Data :
• Colour of hair (Blonde, red, Brown, Black, etc.)
• Marital status (Single, Widowed, Married)
• Nationality (Indian, German, American)
• Gender (Male, Female, Others)
• Eye Color (Black, Brown, etc.)
ORDINAL DATA
• Ordinal data have natural ordering where a number is
present in some kind of order by their position on the scale.
These data are used for observation like customer
satisfaction, happiness, etc., but we can’t do any arithmetical
tasks on them.
• Shows the sequences and cannot use for statistical analysis.
Compared to the nominal data, ordinal data have some kind
of order that is not present in nominal data.
EXAMPLES OF ORDINAL DATA
• When companies ask for feedback, experience, or
satisfaction on a scale of 1 to 10
• Letter grades in the exam (A, B, C, D, etc.)
• Ranking of peoples in a competition (First, Second, Third,
etc.)
• Economic Status (High, Medium, and Low)
• Education Level (Higher, Secondary, Primary)
QUANTITATIVE DATA
• Quantitative data can be expressed in numerical values, which
makes it countable and includes statistical data analysis. These
kinds of data are also known as Numerical data.
• It answers the questions like, “how much,” “how many,” and “how
often.” For example, the price of a phone, the computer’s ram, the
height or weight of a person, etc., falls under the quantitative data
• Quantitative data can be used for statistical manipulation and
these data can be represented on a wide variety of graphs and
charts such as bar graphs, histograms, scatter plots, boxplot, pie
charts, line graphs, etc.
EXAMPLES OF QUANTITATIVE DATA :
• Height or weight of a person or object
• Room Temperature
• Scores and Marks (Ex: 59, 80, 60, etc.)
• Time
• The Quantitative data are further classified into two
parts :
DISCRETE DATA
• The term discrete means distinct or separate. The discrete
data contain the values that fall under integers or whole
numbers. The total number of students in a class is an
example of discrete data. These data can’t be broken into
decimal or fraction values.
• The discrete data are countable and have finite values; their
subdivision is not possible. These data are represented
mainly by a bar graph, number line, or frequency table.
EXAMPLES OF DISCRETE DATA

• Total numbers of students present in a class

• Cost of a cell phone
• Numbers of employees in a company
• The total number of players who participated in a
competition
• Days in a week
CONTINUOUS DATA
• Continuous data are in the form of fractional numbers. It can
be the version of an android phone, the height of a person,
the length of an object, etc. Continuous data represents
information that can be divided into smaller levels. The
continuous variable can take any value within a range.
• The key difference between discrete and continuous data is
that discrete data contains the integer or whole number. Still,
continuous data stores the fractional numbers to record
different types of data such as temperature, height, width,
time, speed, etc.
EXAMPLES OF CONTINUOUS DATA

• Height of a person
• Speed of a vehicle
• “Time-taken” to finish the work
• Wi-Fi Frequency
• Market share price
DATA SCIENCE COMPONENTS:
DATA SCIENCE LIFECYCLE
history of data science
HISTORY OF DATA SCIENCE

• The term “Data Science” was created in the early 1960s to

describe a new profession which would support the
understanding and interpretation of the large amounts of data
which was being amassed at the time. (At the time, there was no
way of predicting the truly massive amounts of data over the
next fifty years.)
• While Data Science is used in areas such as astronomy and
medicine, it is also used in business to help make smarter
decisions.
HISTORY - DATA SCIENCE TIMELINE

• In 1962, Tukey is referring to the merging of statistics and

computers, when computers were first being used to solve
mathematical problems and work with statistics, rather than
doing the work by hand.
• In 1974, Peter Naur authored the Concise Survey of
Computer Methods, using the term “Data Science,”
repeatedly. Naur presented his own convoluted definition of
the new concept:
“The usefulness of data and data processes derives from
their application in building and handling models of reality.”
HISTORY - DATA SCIENCE TIMELINE
• In 1977, The IASC, also known as
the International Association for Statistical Computing was
formed. The first phase of their mission statement reads, “It
is the mission of the IASC to link traditional statistical
methodology, modern computer technology, and the
knowledge of domain experts in order to convert data into
information and knowledge.”
• In 1977, Tukey wrote a second paper, titled Exploratory Data
Analysis, arguing the importance of using data in selecting
“which” hypotheses to test, and that confirmatory data
analysis and exploratory data analysis should work hand-in-
hand.
HISTORY - DATA SCIENCE TIMELINE
• In 1989, the Knowledge Discovery in Databases, which would
mature into the ACM SIGKDD Conference on Knowledge
Discovery and Data Mining, organized its first workshop.
• In 1994, Business Week ran the cover story, Database
Marketing, revealing the ominous news companies had
started gathering large amounts of personal information, with
plans to start strange new marketing campaigns. The flood of
data was, at best, confusing to many company managers,
who were trying to decide what to do with so much
disconnected information.
HISTORY - DATA SCIENCE TIMELINE
• In 1999, Jacob Zahavi pointed out the need for new tools to handle
the massive, and continuously growing, amounts of data available
to businesses, in Mining Data for Nuggets of Knowledge. He wrote:
• Scalability is a huge issue in data mining… Conventional statistical
methods work well with small data sets
• In 2001, Software-as-a-Service (SaaS) was created. This was the
pre-cursor to using cloud-based applications.
• In 2001, William S. Cleveland laid out plans for training data
scientists to meet the needs of the future. He presented an action
plan titled,
Data Science: An Action Plan for Expanding the Technical Areas of t
he field of Statistics
HISTORY - DATA SCIENCE TIMELINE
• In 2002, the International Council for Science: Committee on Data for
Science and Technology began publishing the Data Science Journal, a
publication focused on issues such as the description of data systems,
their publication on the internet, applications and legal issues.
• In 2006, Hadoop 0.1.0, an open-source, non-relational database, was
released. Hadoop was based on Nutch, another open-source database.
Two problems with processing big data are the storage of huge
amounts of data and then processing that stored data. (Relational data
base management systems (RDBMS) cannot process non-relational
data.) Hadoop solved those problems. Apache Hadoop is now an open-
sourced software library that allows for the research of big data.
HISTORY - DATA SCIENCE TIMELINE
• In 2008, the title, “data scientist” became a buzzword, and
eventually a part of the language. DJ Patil and Jeff
Hammerbacher, of LinkedIn and Facebook, are given credit
for initiating its use as a buzzword.
• In 2009, the term NoSQL was reintroduced (a variation had
been used since 1998) by Johan Oskarsson, when he
organized a discussion on “open-source, non-relational
databases”.
• In 2011, job listings for data scientists increased by 15,000%.
There was also an increase in seminars and conferences
devoted specifically to Data Science and big data.
HISTORY - DATA SCIENCE TIMELINE
• In 2013, IBM shared statistics showing 90% of the data in the
world had been created within the last two years.
• In 2015, using Deep Learning techniques, Google’s speech
recognition, Google Voice, experienced a dramatic
performance jump of 49 percent.
• In 2015, Bloomberg’s Jack Clark, wrote that it had been a
landmark year for artificial intelligence (AI).
HISTORY - DATA SCIENCE TIMELINE

Data Science Today

• In the past thirty years, Data Science has quietly grown to
include businesses and organizations world-wide. It is now
being used by governments, geneticists, engineers, and even
astronomers.
• Data Science has become an important part of business and
academic research.
DATA SCIENCE APPLICATION
DATA SCIENCE IN BANKING SECTOR
• Risk Modeling
• analyze the default rate and develop strategies to reinforce their lending schemes
• Fraud Detection
• Advancement of ML have made easy to detect, monitoring and analysis of the user
activity to find any usual or malicious pattern.

• Customer Lifetime Value

• Retention of customers
• Customer Segmentation
• banks group their customers based on their behavior and common characteristics
in order to address them appropriately

• Real-Time Predictive Analytics

• Real-time analytics
• Predictive analytics
CASE STUDY

• https://data-flair.training/blogs/data-science-in-
banking/
DATA SCIENCE LIFE CYCLE
STEPS IN DATA SCIENCE
• Obtaining the Data: This stage involves using technical knowledge like MySQL to
process and generate the data. It can even be in simpler file formats such as
Microsoft Excel. Some examples like Python and R even directly import the datasets
into a data science program.
• Scrubbing the Data: This stage involves cleaning raw data to retain only the relevant
part of the processed data. The noise is also scrubbed off, and the data is refined,
converted, and consolidated.
• Exploring the Data: This stage consists of examining the generated data. The data
and its properties are inspected since different data types demand specific
treatments. Descriptive statistics are then computed to extract the features and test
the significant variables.
• Modeling the Data: The dataset is refined further, and only the essential components
are kept. Only relevant values are kept and tested to predict accurate results.
• Interpreting the Data: At this stage, the final product is interpreted for the client or
business to analyze if it meets the requirement or answers a business question. The
insights are shared with everyone, and the results of the final stage are visualized.
Traits(characteristics) of big data
BIG DATA
• Big data is a collection of large datasets that cannot be processed
using traditional computing techniques. It is not a single technique
or a tool, rather it has become a complete subject, which involves
various tools, techniques and frameworks.
• What Comes Under Big Data?
• Big data involves the data produced by different devices and
applications. Given below are some of the fields that come under
the umbrella of Big Data.
• Black Box Data − It is a component of helicopter, airplanes, and
jets, etc. It captures voices of the flight crew, recordings of
microphones and earphones, and the performance information of
the aircraft.
BIG DATA
• Social Media Data − Social media such as Facebook and Twitter hold
information and the views posted by millions of people across the globe.
• Stock Exchange Data − The stock exchange data holds information
about the ‘buy’ and ‘sell’ decisions made on a share of different
companies made by the customers.
• Power Grid Data − The power grid data holds information consumed by
a particular node with respect to a base station.
• Transport Data − Transport data includes model, capacity, distance and
availability of a vehicle.
• Search Engine Data − Search engines retrieve lots of data from
different databases.
BIG DATA EXAMPLES
TRAITS(CHARACTERISTICS) OF BIG DATA
• Big Data contains a large amount of data that is not being
processed by traditional data storage or the processing unit.

• It is used by many multinational

companies to process the data and business of
many organizations.

• The data flow would exceed 150 exabytes per day before
replication.
THE CHARACTERISTICS OF BIG DATA

• There are five v's of Big Data that explains the

characteristics.
5 V's of Big Data
• Volume
• Veracity
• Variety
• Value
• Velocity
THE CHARACTERISTICS OF BIG DATA
CHARACTERISTICS OF BIG DATA
VOLUME

• The name Big Data itself is related to an enormous size. Big

Data is a vast 'volumes' of data generated from many
sources daily, such as business processes, machines,
social media platforms, networks, human
interactions, and many more.

• Facebook can generate approximately

a billion messages, 4.5 billion times that the "Like" button
is recorded, and more than 350 million new posts are
uploaded each day. Big data technologies can handle large
amounts of data.
VOLUME
VARIETY

• Big Data can be structured, unstructured, and semi-structured that are being
collected from different sources. Data will only be collected
from databases and sheets in the past, But these days the data will comes in array
forms, that are PDFs, Emails, audios, SM posts, photos, videos, etc.
• The data is categorized as below:
• Structured data: In Structured schema, along with all the required columns. It is in a
tabular form. Structured Data is stored in the relational database management system.
• Semi-structured: In Semi-structured, the schema is not appropriately defined,
e.g., JSON, XML, CSV, TSV, and email. OLTP (Online Transaction Processing)
systems are built to work with semi-structured data. It is stored in relations, i.e., tables.
• Unstructured Data: All the unstructured files, log files, audio files,
and image files are included in the unstructured data. Some organizations have much
data available, but they did not know how to derive the value of data since the data is
raw.
• Quasi-structured Data: The data format contains textual data with inconsistent data
formats that are formatted with effort and time with some tools.
VARIETY
VERACITY

• Veracity means how much the data is reliable. It has many

ways to filter or translate the data. Veracity is the process of
being able to handle and manage data efficiently.
• Big Data is also essential in business development.

• For example, Facebook posts with hashtags.

VALUE

• Value is an essential characteristic of big data. It is not the

data that we process or store. It
is valuable and reliable data that we store, process, and
also analyze.
VELOCITY
• Velocity plays an important role compared to others. Velocity
creates the speed by which the data is created in real-time.
It contains the linking of incoming data sets speeds, rate
of change, and activity bursts. The primary aspect of Big
Data is to provide demanding data rapidly.
• Big data velocity deals with the speed at the data flows from
sources like application logs, business processes,
networks, and social media sites, sensors, mobile
devices, etc.
VELOCITY
Web scarping
WEB SCARPING
What is Web Scraping?
• Web scraping is an automatic method to obtain large
amounts of data from websites. Most of this data is
unstructured data in an HTML format which is then converted
into structured data in a spreadsheet or a database so that it
can be used in various applications.
• There are many different ways to perform web scraping to
obtain data from websites. These include using online
services, particular API’s or even creating your code for web
scraping from scratch.
WEB SCARPING
Web Scraping

Web Scrapping extracts the data from websites in the

unstructured format. It helps to collect these
unstructured data and convert it in a structured form.
WEB SCARPING

• Web scraping requires two parts, namely the crawler and

the scraper.
• The crawler is an artificial intelligence algorithm that browses
the web to search for the particular data required by
following the links across the internet.
• The scraper, on the other hand, is a specific tool created to
extract data from the website. The design of the scraper can
vary greatly according to the complexity and scope of the
project so that it can quickly and accurately extract the data.
WHAT IS WEB SCRAPING USED FOR?

• Web Scraping has multiple applications across various

industries. Let’s check out some of these now!
1. Price Monitoring
2. Market Research
3. News Monitoring
4. Sentiment Analysis
5. Email Marketing
WHAT IS WEB SCRAPING USED FOR?

• Web Scraping has multiple applications across various industries. Let’s check out
some of these now!
1. Price Monitoring
• Web Scraping can be used by companies to scrap the product data for their products
and competing products as well to see how it impacts their pricing strategies.
Companies can use this data to fix the optimal pricing for their products so that they
can obtain maximum revenue.
2. Market Research
• Web scraping can be used for market research by companies. High-quality web
scraped data obtained in large volumes can be very helpful for companies in
analyzing consumer trends and understanding which direction the company should
move in the future.
WHAT IS WEB SCRAPING USED FOR?
3. News Monitoring
• Web scraping news sites can provide detailed reports on the current news to a
company. This is even more essential for companies that are frequently in the
news or that depend on daily news for their day-to-day functioning. After all, news
reports can make or break a company in a single day!
4. Sentiment Analysis
• If companies want to understand the general sentiment for their products among
their consumers, then Sentiment Analysis is a must. Companies can use web
scraping to collect data from social media websites such as Facebook and Twitter
as to what the general sentiment about their products is. This will help them in
creating products that people desire and moving ahead of their competition.
5. Email Marketing
• Companies can also use Web scraping for email marketing. They can collect Email
ID’s from various sites using web scraping and then send bulk promotional and
marketing Emails to all the people owning these Email ID’s.
WEB SCARPING
HOW DOES WEB SCRAPPING WORK?
• These are the following steps to perform web scraping. Let's understand the working
of web scraping.
Step -1: Find the URL that you want to scrape
• First, you should understand the requirement of data according to your project. A
webpage or website contains a large amount of information. That's why scrap only
relevant information. In simple words, the developer should be familiar with the data
requirement.
Step - 2: Inspecting the Page
• The data is extracted in raw HTML format, which must be carefully parsed and reduce
the noise from the raw data. In some cases, data can be simple as name and address
or as complex as high dimensional weather and stock market data.
Step - 3: Write the code
• Write a code to extract the information, provide relevant information, and run the
code.
Step - 4: Store the data in the file
TECHNIQUES OF WEB SCRAPING
• Techniques of Web Scraping: There are
two ways of extracting data from websites,
the Manual extraction technique, and the
automated extraction technique.
• Manual Extraction Techniques
• Automated Extraction Techniques:
• HTML Parsing.
• DOM Parsing
• Web Scraping Software:
TECHNIQUES OF WEB SCRAPING
• Techniques of Web Scraping: There are two ways of extracting data from
websites, the Manual extraction technique, and the automated extraction technique.
• Manual Extraction Techniques: Manually copy-pasting the site content comes
under this technique. Though tedious, time taking and repetitive it is an effective way
to scrap data from the sites having good anti-scraping measures like bot detection.
• Automated Extraction Techniques: Web scraping software is used to
automatically extract data from sites based on user requirement.
• HTML Parsing: Parsing means to make something understandable to be analyzing it part by part.
To wit, it means to convert the information in one form to another form that is easy to that is
easier to work on with. HTML parsing means taking in the code and extracting relevant
information from it based on the user requirement. Mainly executed using JavaScript, the target as
the name suggests are HTML pages.
• DOM Parsing: The Document Object Model is the official recommendation of the World Wide Web
Consortium. It defines an interface that enables a user to modify and update the style, structure,
and content of the XML document.
• Web Scraping Software: Nowadays, many web scraping tools are available or are custom build
on users need to extract required desiring information from millions of websites.
DIFFERENT TYPES OF WEB SCRAPERS

• Web Scrapers can be divided on the basis of many different

criteria, including Self-built or Pre-built Web Scrapers,
Browser extension or Software Web Scrapers, and Cloud or
Local Web Scrapers.
• You can have Self-built Web Scrapers but that requires
advanced knowledge of programming.
ANALYSIS VS REPORTING
ANALYSIS VS REPORTING
Analysis:
• The process of exploring data and reports in
order to extract meaningful insights, which can
be used to better understand and improve
business performance.
Reporting:
• The process of organizing data into
informational summaries in order to monitor
how different areas of a business are
performing.
TYPES OF DATA ANALYSIS METHODS

• The major Data Analysis methods are:

• Descriptive Analysis
• Diagnostic Analysis
• Predictive Analysis
• Prescriptive Analysis
• Statistical Analysis
1. DESCRIPTIVE ANALYSIS

• Descriptive Analysis looks at data and analyzes past events for insight as to how
to approach future events. It looks at the past performance and understands the
performance by mining historical data to understand the cause of success or
failure in the past. Almost all management reporting such as sales, marketing,
operations, and finance uses this type of analysis.
• Example: Let’s take an example of DMart, we can look at the product’s history
and find out which products have been sold more or which products have large
demand by looking at the product sold trends and based on their analysis we
can further make the decision of putting a stock of that item in large quantity
for the coming year.
2. DIAGNOSTIC ANALYSIS

• Diagnostic analysis works hand in hand with Descriptive Analysis.

As descriptive Analysis finds out what happened in the past,
diagnostic Analysis, on the other hand, finds out why did that
happen or what measures were taken at that time, or how
frequent it has happened.it basically gives a detailed explanation
of a particular scenario by understanding behavior patterns.
• Example: Let’s take the example of Dmart again. Now if we want
to find out why a particular product has a lot of demand, is it
because of their brand or is it because of quality. All this
information can easily be identified using diagnostic Analysis.
3. PREDICTIVE ANALYSIS

• Whatever information we have received from descriptive and

diagnostic analysis, we can use that information to predict
future data. it basically finds out what is likely to happen in
the future.
4. PRESCRIPTIVE ANALYSIS

• This is an advanced method of Predictive Analysis. Now when

you predict something or when you start thinking out of the
box you will definitely have a lot of options, and then we get
confused as to which option will actually work.
5.STATISTICAL ANALYSIS

• Statistical Analysis is a statistical approach or technique for

analyzing data sets in order to summarize their important and main
characteristics generally by using some visual aids. This approach
can be used to gather knowledge about the following aspects of
data:
• Main characteristics or features of the data.
• The variables and their relationships.
• Finding out the important variables that can be used in our problem.
REPORT
• Data Analysis can be used to report to different people:
• A primary collaborator or client
• Executive and business leaders
• A technical supervisor

• Keep it Succinct: Organize data in a way that makes it easy for different
audiences to skim through it to find the information most relevant to them.
• Make it Visual: Use data visualizations techniques, such as tables and charts, to
communicate the message clearly.
• Include an Executive Summary: This allows someone to analyze your findings
upfront and harness your most important points to influence their decisions.
DATA ANALYSIS TOOLS

1. SAS
• SAS was a programming language developed by the SAS
Institute for performed advanced analytics, multivariate
analyses, business intelligence, data management, and
predictive analytics.
2. Microsoft Excel
• It is an important spreadsheet application that can be useful
for recording expenses, charting data, and performing easy
manipulation and lookup and or generating pivot tables to
provide the desired summarized reports of large datasets
that contain significant data findings.
DATA ANALYSIS TOOLS
3. R
• It is one of the leading programming languages for performing
complex statistical computations and graphics. It is a free and open-
source language that can be run on various UNIX platforms,
Windows, and macOS. It also has a command-line interface that is
easy to use.
4. Python
• It is a powerful high-level programming language that is used for
general-purpose programming. Python supports both structured and
functional programming methods.
DATA ANALYSIS TOOLS
5. Tableau Public
• Tableau Public is free software developed by the public company
“Tableau Software” that allows users to connect to any spreadsheet or
file and create interactive data visualizations.
6. RapidMiner
• RapidMiner is an extremely versatile data science platform developed
by “RapidMiner Inc”. The software emphasizes lightning-fast data
science capabilities and provides an integrated environment for the
preparation of data and application of machine learning, deep
learning, text mining, and predictive analytical techniques
ANALYSIS VS REPORTING
TYPES OF DATA ANALYSIS METHODS

The major Data Analysis methods are:

• Descriptive Analysis
• Diagnostic Analysis
• Predictive Analysis
• Prescriptive Analysis
• Statistical Analysis
1. DESCRIPTIVE ANALYSIS
• Descriptive Analysis looks at data and analyzes past events for
insight as to how to approach future events. It looks at the past
performance and understands the performance by mining historical
data to understand the cause of success or failure in the past.
Almost all management reporting such as sales, marketing,
operations, and finance uses this type of analysis.
• Example: Let’s take an example of DMart, we can look at the
product’s history and find out which products have been sold more
or which products have large demand by looking at the product
sold trends and based on their analysis we can further make the
decision of putting a stock of that item in large quantity for the
coming year.
2. DIAGNOSTIC ANALYSIS
• Diagnostic analysis works hand in hand with Descriptive
Analysis. As descriptive Analysis finds out what happened in
the past, diagnostic Analysis, on the other hand, finds out
why did that happen or what measures were taken at that
time, or how frequent it has happened.it basically gives a
detailed explanation of a particular scenario by
understanding behavior patterns.
• Example: Let’s take the example of Dmart again. Now if we
want to find out why a particular product has a lot of
demand, is it because of their brand or is it because of
quality. All this information can easily be identified using
diagnostic Analysis.
3. PREDICTIVE ANALYSIS
• Whatever information we have received from
descriptive and diagnostic analysis, we can use that
information to predict future data. it basically finds
out what is likely to happen in the future.
4. PRESCRIPTIVE ANALYSIS
• This is an advanced method of Predictive Analysis.
• Now when you predict something or when you start thinking
out of the box you will definitely have a lot of options, and
then we get confused as to which option will actually work.
5.STATISTICAL ANALYSIS

• Statistical Analysis is a statistical approach or

technique for analyzing data sets in order to
summarize their important and main characteristics
generally by using some visual aids.
• This approach can be used to gather knowledge about
the following aspects of data:
• Main characteristics or features of the data.
• The variables and their relationships.
• Finding out the important variables that can be
used in our problem.
REPORT
• Data Analysis can be used to report to different people:
• A primary collaborator or client
• Executive and business leaders
• A technical supervisor

Chapter 07 - Test Bank For Introduction To Information Systems 4th Edition by Wallace
100% (1)
Chapter 07 - Test Bank For Introduction To Information Systems 4th Edition by Wallace
35 pages
MGT555 - Individual Assignment 1 - AFIQ NAJMI BIN ROSMAN 2020878336-NURUL NABILAH BINTI AYOB 202183944 NBO5B
No ratings yet
MGT555 - Individual Assignment 1 - AFIQ NAJMI BIN ROSMAN 2020878336-NURUL NABILAH BINTI AYOB 202183944 NBO5B
10 pages
03-07-2024-Data Science - Orentation Programme
No ratings yet
03-07-2024-Data Science - Orentation Programme
53 pages
FDS Unit 1 Notes
No ratings yet
FDS Unit 1 Notes
53 pages
Fundamentals of Machine Learning and Data Science
No ratings yet
Fundamentals of Machine Learning and Data Science
73 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
Unit 1
No ratings yet
Unit 1
85 pages
FDS Module 1 Notes
No ratings yet
FDS Module 1 Notes
27 pages
Data Science 5
100% (4)
Data Science 5
216 pages
Final UNIT II-DESCRIPTIVE ANALYTICS
No ratings yet
Final UNIT II-DESCRIPTIVE ANALYTICS
128 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
52 pages
4.0 Introduction To Data
No ratings yet
4.0 Introduction To Data
16 pages
Module 1
No ratings yet
Module 1
30 pages
L1 - Introduction To Data Science
No ratings yet
L1 - Introduction To Data Science
33 pages
CHAPTER 1 - Introduction To Data Science
No ratings yet
CHAPTER 1 - Introduction To Data Science
67 pages
ML Lecture 4 Data
No ratings yet
ML Lecture 4 Data
22 pages
Chapter 1-Introduction To Data
No ratings yet
Chapter 1-Introduction To Data
18 pages
Unit-1 - Introduction To Data Science
No ratings yet
Unit-1 - Introduction To Data Science
17 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
22mca341 - Data Science
No ratings yet
22mca341 - Data Science
109 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
Lecture 1,2&3
No ratings yet
Lecture 1,2&3
80 pages
Introduction To Data Science: Chapter Two
No ratings yet
Introduction To Data Science: Chapter Two
52 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
36 pages
1 - Structured Analysis Methodology and Tools (20241204172416)
No ratings yet
1 - Structured Analysis Methodology and Tools (20241204172416)
30 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Chapter 1
No ratings yet
Chapter 1
149 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
Getting Started With Data Science: Grade VIII
No ratings yet
Getting Started With Data Science: Grade VIII
32 pages
classVIII DS Student Handbook
No ratings yet
classVIII DS Student Handbook
30 pages
BA - Unit - 1 - Merged (1) Highlighted
No ratings yet
BA - Unit - 1 - Merged (1) Highlighted
103 pages
BA - Unit 1
No ratings yet
BA - Unit 1
16 pages
Intro To DS
No ratings yet
Intro To DS
18 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
Unit I - Data Science
No ratings yet
Unit I - Data Science
161 pages
How Data Is Col
No ratings yet
How Data Is Col
11 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
16 pages
Unit1-Data Science
No ratings yet
Unit1-Data Science
77 pages
3.question Bank
No ratings yet
3.question Bank
7 pages
21css303t Datascience Unit 1 Notes
No ratings yet
21css303t Datascience Unit 1 Notes
246 pages
DV - Unit 1
No ratings yet
DV - Unit 1
40 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
DATA ANALYSIS - Full - Note - Immersive 2
No ratings yet
DATA ANALYSIS - Full - Note - Immersive 2
13 pages
Moduke 2
No ratings yet
Moduke 2
55 pages
Project Report
No ratings yet
Project Report
29 pages
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
37 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
86 pages
Char of Data DV 1
No ratings yet
Char of Data DV 1
14 pages
Data Science Intro Session-18 & 19
No ratings yet
Data Science Intro Session-18 & 19
48 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Unit 1
No ratings yet
Unit 1
76 pages
Chapter 2. Introduction To Data Science
No ratings yet
Chapter 2. Introduction To Data Science
40 pages
ML Chapter 01
No ratings yet
ML Chapter 01
19 pages
Unit 1 Data Analytics (KCA-034)
No ratings yet
Unit 1 Data Analytics (KCA-034)
21 pages
Mypresentation 1
No ratings yet
Mypresentation 1
50 pages
Chapter 1.1 Introduction To Data
No ratings yet
Chapter 1.1 Introduction To Data
10 pages
Chapter - 2
No ratings yet
Chapter - 2
38 pages
Chapter 2 EMTE@Kibru 014914
No ratings yet
Chapter 2 EMTE@Kibru 014914
40 pages
Chapter 2 - Intro To Data Sciences
No ratings yet
Chapter 2 - Intro To Data Sciences
41 pages
CSD101 Fundamentals of Data Science Session 1 and 2
No ratings yet
CSD101 Fundamentals of Data Science Session 1 and 2
53 pages
Data Collection: Six Sigma Thinking, #1
From Everand
Data Collection: Six Sigma Thinking, #1
Sumeet Savant
No ratings yet
Data Science - UNIT V
No ratings yet
Data Science - UNIT V
46 pages
Data Science Unit - 3 - 31.8.23
No ratings yet
Data Science Unit - 3 - 31.8.23
62 pages
Data Science Unit 2-11-08 2023
No ratings yet
Data Science Unit 2-11-08 2023
78 pages
Data Science-Unit-4 - 05.10.23
No ratings yet
Data Science-Unit-4 - 05.10.23
59 pages
01 Business Intelligence and Analytics
No ratings yet
01 Business Intelligence and Analytics
15 pages
1 Introduction To Data Analytics
No ratings yet
1 Introduction To Data Analytics
14 pages
The Future of Trading - AI Solutions For Everyday Traders - Miller, Chris
No ratings yet
The Future of Trading - AI Solutions For Everyday Traders - Miller, Chris
91 pages
Us Cons Digital Procurement v5
No ratings yet
Us Cons Digital Procurement v5
12 pages
Customer Relationship Management at Maruti Suzuki
100% (3)
Customer Relationship Management at Maruti Suzuki
57 pages
Predictive Maintenance PHD Thesis
100% (3)
Predictive Maintenance PHD Thesis
7 pages
INT404
No ratings yet
INT404
19 pages
Big Data Workshop
No ratings yet
Big Data Workshop
52 pages
End Term Questions-OS
No ratings yet
End Term Questions-OS
16 pages
Mod 1 Introduction To BI - Instructor Section 2
No ratings yet
Mod 1 Introduction To BI - Instructor Section 2
37 pages
Data Driven Supply Chain ML
No ratings yet
Data Driven Supply Chain ML
27 pages
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
No ratings yet
MT416 - BCommII - Introduction To Business Analytics - MBA - 10039 - 19 - PratyayDas
44 pages
Statistics Test 1
No ratings yet
Statistics Test 1
5 pages
Essay On An Education Issue
No ratings yet
Essay On An Education Issue
16 pages
Flasharray Architect Associate Student Guide 2023
No ratings yet
Flasharray Architect Associate Student Guide 2023
354 pages
FT301-Unit-IV-Wealth Management and Robo-Advisors
No ratings yet
FT301-Unit-IV-Wealth Management and Robo-Advisors
11 pages
MSC Group Project Demo
No ratings yet
MSC Group Project Demo
31 pages
Mini Project Report - Updated
No ratings yet
Mini Project Report - Updated
7 pages
IE 6840 Week 6 HW - 5 GY2876 PDF
No ratings yet
IE 6840 Week 6 HW - 5 GY2876 PDF
3 pages
Financial Analyst
No ratings yet
Financial Analyst
1 page
Big Data and Predictive Maintenance in Manufacturing
No ratings yet
Big Data and Predictive Maintenance in Manufacturing
16 pages
1739628428module 1 Introduction To AI in Risk Management
No ratings yet
1739628428module 1 Introduction To AI in Risk Management
14 pages
Applying Data Mining To Telecom Churn Ma
No ratings yet
Applying Data Mining To Telecom Churn Ma
10 pages
Predictive Factories:: The Next Transformation
No ratings yet
Predictive Factories:: The Next Transformation
4 pages
Big data-UNIT 1
No ratings yet
Big data-UNIT 1
39 pages
Sysco
100% (1)
Sysco
6 pages
Business Analytics, MBA 2023-25 MS 211
No ratings yet
Business Analytics, MBA 2023-25 MS 211
4 pages
Data Mining Notes
No ratings yet
Data Mining Notes
297 pages