0% found this document useful (0 votes)
54 views107 pages

Data Science UNIT 1 Final

Uploaded by

rishavsingh7478
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views107 pages

Data Science UNIT 1 Final

Uploaded by

rishavsingh7478
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 107

UEIT501 – DATA SCIENCE

UNIT I INTRODUCTION TO DATA


SCIENCE
SYLLABUS
Concept of Data science
 History
 Application areas
 Traits of Big data
 web scarping
 Analysis vs reporting
Concept of Data science
WHAT IS DATA?
• Measureable units of information gathered or captured from
activity of people, places and things.
• Data is everywhere, we need to handle and store it properly,
without any error.
• Statistics operate on variables, not data
• A variable is a function mapping data objects to values
• Visualization represent data
• There are two types of data: Qualitative and Quantitative
data
WHAT IS DATA SCIENCE?

• Data science is a deep study of the massive amount of data, which involves
extracting meaningful insights from raw, structured, and unstructured data that

is processed using the scientific method, different technologies, and algorithms.

• It is a multidisciplinary field that uses tools and techniques to manipulate the


data so that you can find something new and meaningful.

• It is a multidisciplinary approach that combines principles and practices from the


fields of mathematics, statistics, artificial intelligence, and computer engineering

to analyze large amounts of data


• Data Science Contd.,
• Data science uses the most powerful hardware, programming
systems, and most efficient algorithms to solve the data
related problems. It is the future of artificial intelligence.
• In short, we can say that data science is all about:
• Asking the correct questions and analyzing the raw data.
• Modeling the data using various complex and efficient
algorithms.
• Visualizing the data to get a better perspective.
• Understanding the data to make better decisions and finding
the final result.
WHY IS DATA SCIENCE IMPORTANT?

• Data science is important because it combines tools, methods, and


technology to generate meaning from data.

• Online systems and payment portals capture more data in the fields of e-
commerce, medicine, finance, and every other aspect of human life. We

have text, audio, video, and image data available in vast quantities.
NEED FOR DATA SCIENCE:
DATA SCIENCE TECHNIQUES?

• Data science professionals use computing


systems to follow the data science process.
• The top techniques used by data scientists
are:
Classification
Regression
Clustering
DATA SCIENCE TECHNIQUES?
DATA SCIENCE TECHNIQUES?
DATA SCIENCE TECHNIQUES?
FUTURE OF DATA SCIENCE?

• Artificialintelligence and machine learning


innovations have made data processing faster and
more efficient.
• Industry demand has created an ecosystem of
courses, degrees, and job positions within the field
of data science.
INTRODUCTION TO DATA SCIENCE
CONCEPT OF DATA SCIENCE
TYPES OF DATA SCIENCE JOB

• Data Scientist
• Data Analyst
• Machine learning expert
• Data engineer
• Data Architect
• Data Administrator
• Business Analyst
• Business Intelligence Manager
DATA NATURE: QUANTITATIVE VS
QUALITATIVE
• Structured data is often referred to as quantitative data. It means
that such data commonly contains precise numbers or textual
elements that can be counted. The analysis methods are clear and
easy-to-apply. Among them there are:
• classification or arranging stored items of data into similar classes
based on common features,
• regression or investigation of the relationships and dependencies
between variables, and
• data clustering or organizing the data points into specific groups
based on various attributes.
FIND THE NAMES OF THE LOGO
FIND THE NAMES OF THE LOGO
STRUCTURED DATA USE CASE
EXAMPLES
• Online booking : Different hotel booking and ticket reservation services leverage the
advantages of the pre-defined data model as all booking data such as dates, prices,
destinations, etc. fit into a standard data structure with rows and columns.
• ATMs : Any ATM is a great example of how relational databases and structured data work.
All the actions a user can do follow a pre-defined model.
• Inventory control systems : There are lots of variants of inventory control systems
companies use, but they all rely on a highly organized environment of relational
databases.
• Banking and accounting : Different companies and banks must process and record
huge amounts of financial transactions. Consequently, they make use of traditional
database management systems to keep structured data in place.
UNSTRUCTURED DATA USE CASE
EXAMPLES
• Sound recognition. Call centers use speech recognition to identify
customers and collect information about their queries and emotions.
• Image recognition. Online retailers take advantage of image recognition
so that customers can shop from their phones by posting a photo of the
desired item.
• Text analytics. Manufacturers make use of advanced text analytics to
examine warranty claims from customers and dealers and elicit specific
items of important information for further clustering and processing.
• Chatbots. Using natural language processing (NLP) for text analysis,
chatbots help different companies boost customer satisfaction from their
services. Depending on the question input, customers are routed to the
corresponding representatives that would provide comprehensive answers
TYPES OF DATA
QUALITATIVE OR CATEGORICAL DATA

• Can’t be measured or counted in the form of numbers


• Data consist of audio, images, symbols, or text. The gender of a person, i.e., male,
female, or others, is qualitative data.
• Qualitative data tells about the perception of people.
• Helps market researchers understand the customers’ tastes and then design their
ideas and strategies accordingly
• The other examples of qualitative data are :
• What language do you speak
• Favourite holiday destination
• Opinion on something (agree, disagree, or neutral)
• Colours
NOMINAL DATA
• Nominal Data is used to label variables without any order or quantitative value.
we can’t do any numerical tasks or can’t give any order to sort the data.
• nominal” comes from the Latin name “nomen,” which means “name.”
• Examples of Nominal Data :
• Colour of hair (Blonde, red, Brown, Black, etc.)
• Marital status (Single, Widowed, Married)
• Nationality (Indian, German, American)
• Gender (Male, Female, Others)
• Eye Color (Black, Brown, etc.)
ORDINAL DATA
• Ordinal data have natural ordering where a number is
present in some kind of order by their position on the scale.
These data are used for observation like customer
satisfaction, happiness, etc., but we can’t do any arithmetical
tasks on them.
• Shows the sequences and cannot use for statistical analysis.
Compared to the nominal data, ordinal data have some kind
of order that is not present in nominal data.
EXAMPLES OF ORDINAL DATA
• When companies ask for feedback, experience, or
satisfaction on a scale of 1 to 10
• Letter grades in the exam (A, B, C, D, etc.)
• Ranking of peoples in a competition (First, Second, Third,
etc.)
• Economic Status (High, Medium, and Low)
• Education Level (Higher, Secondary, Primary)
QUANTITATIVE DATA
• Quantitative data can be expressed in numerical values, which
makes it countable and includes statistical data analysis. These
kinds of data are also known as Numerical data.
• It answers the questions like, “how much,” “how many,” and “how
often.” For example, the price of a phone, the computer’s ram, the
height or weight of a person, etc., falls under the quantitative data
• Quantitative data can be used for statistical manipulation and
these data can be represented on a wide variety of graphs and
charts such as bar graphs, histograms, scatter plots, boxplot, pie
charts, line graphs, etc.
EXAMPLES OF QUANTITATIVE DATA :
• Height or weight of a person or object
• Room Temperature
• Scores and Marks (Ex: 59, 80, 60, etc.)
• Time
• The Quantitative data are further classified into two
parts :
DISCRETE DATA
• The term discrete means distinct or separate. The discrete
data contain the values that fall under integers or whole
numbers. The total number of students in a class is an
example of discrete data. These data can’t be broken into
decimal or fraction values.
• The discrete data are countable and have finite values; their
subdivision is not possible. These data are represented
mainly by a bar graph, number line, or frequency table.
EXAMPLES OF DISCRETE DATA

• Total numbers of students present in a class


• Cost of a cell phone
• Numbers of employees in a company
• The total number of players who participated in a
competition
• Days in a week
CONTINUOUS DATA
• Continuous data are in the form of fractional numbers. It can
be the version of an android phone, the height of a person,
the length of an object, etc. Continuous data represents
information that can be divided into smaller levels. The
continuous variable can take any value within a range.
• The key difference between discrete and continuous data is
that discrete data contains the integer or whole number. Still,
continuous data stores the fractional numbers to record
different types of data such as temperature, height, width,
time, speed, etc.
EXAMPLES OF CONTINUOUS DATA

• Height of a person
• Speed of a vehicle
• “Time-taken” to finish the work
• Wi-Fi Frequency
• Market share price
DATA SCIENCE COMPONENTS:
DATA SCIENCE LIFECYCLE
history of data science
HISTORY OF DATA SCIENCE

• The term “Data Science” was created in the early 1960s to


describe a new profession which would support the
understanding and interpretation of the large amounts of data
which was being amassed at the time. (At the time, there was no
way of predicting the truly massive amounts of data over the
next fifty years.)
• While Data Science is used in areas such as astronomy and
medicine, it is also used in business to help make smarter
decisions.
HISTORY - DATA SCIENCE TIMELINE

• In 1962, Tukey is referring to the merging of statistics and


computers, when computers were first being used to solve
mathematical problems and work with statistics, rather than
doing the work by hand.
• In 1974, Peter Naur authored the Concise Survey of
Computer Methods, using the term “Data Science,”
repeatedly. Naur presented his own convoluted definition of
the new concept:
“The usefulness of data and data processes derives from
their application in building and handling models of reality.”
HISTORY - DATA SCIENCE TIMELINE
• In 1977, The IASC, also known as
the International Association for Statistical Computing was
formed. The first phase of their mission statement reads, “It
is the mission of the IASC to link traditional statistical
methodology, modern computer technology, and the
knowledge of domain experts in order to convert data into
information and knowledge.”
• In 1977, Tukey wrote a second paper, titled Exploratory Data
Analysis, arguing the importance of using data in selecting
“which” hypotheses to test, and that confirmatory data
analysis and exploratory data analysis should work hand-in-
hand.
HISTORY - DATA SCIENCE TIMELINE
• In 1989, the Knowledge Discovery in Databases, which would
mature into the ACM SIGKDD Conference on Knowledge
Discovery and Data Mining, organized its first workshop.
• In 1994, Business Week ran the cover story, Database
Marketing, revealing the ominous news companies had
started gathering large amounts of personal information, with
plans to start strange new marketing campaigns. The flood of
data was, at best, confusing to many company managers,
who were trying to decide what to do with so much
disconnected information.
HISTORY - DATA SCIENCE TIMELINE
• In 1999, Jacob Zahavi pointed out the need for new tools to handle
the massive, and continuously growing, amounts of data available
to businesses, in Mining Data for Nuggets of Knowledge. He wrote:
• Scalability is a huge issue in data mining… Conventional statistical
methods work well with small data sets
• In 2001, Software-as-a-Service (SaaS) was created. This was the
pre-cursor to using cloud-based applications.
• In 2001, William S. Cleveland laid out plans for training data
scientists to meet the needs of the future. He presented an action
plan titled,
Data Science: An Action Plan for Expanding the Technical Areas of t
he field of Statistics
HISTORY - DATA SCIENCE TIMELINE
• In 2002, the International Council for Science: Committee on Data for
Science and Technology began publishing the Data Science Journal, a
publication focused on issues such as the description of data systems,
their publication on the internet, applications and legal issues.
• In 2006, Hadoop 0.1.0, an open-source, non-relational database, was
released. Hadoop was based on Nutch, another open-source database.
Two problems with processing big data are the storage of huge
amounts of data and then processing that stored data. (Relational data
base management systems (RDBMS) cannot process non-relational
data.) Hadoop solved those problems. Apache Hadoop is now an open-
sourced software library that allows for the research of big data.
HISTORY - DATA SCIENCE TIMELINE
• In 2008, the title, “data scientist” became a buzzword, and
eventually a part of the language. DJ Patil and Jeff
Hammerbacher, of LinkedIn and Facebook, are given credit
for initiating its use as a buzzword.
• In 2009, the term NoSQL was reintroduced (a variation had
been used since 1998) by Johan Oskarsson, when he
organized a discussion on “open-source, non-relational
databases”.
• In 2011, job listings for data scientists increased by 15,000%.
There was also an increase in seminars and conferences
devoted specifically to Data Science and big data.
HISTORY - DATA SCIENCE TIMELINE
• In 2013, IBM shared statistics showing 90% of the data in the
world had been created within the last two years.
• In 2015, using Deep Learning techniques, Google’s speech
recognition, Google Voice, experienced a dramatic
performance jump of 49 percent.
• In 2015, Bloomberg’s Jack Clark, wrote that it had been a
landmark year for artificial intelligence (AI).
HISTORY - DATA SCIENCE TIMELINE

Data Science Today


• In the past thirty years, Data Science has quietly grown to
include businesses and organizations world-wide. It is now
being used by governments, geneticists, engineers, and even
astronomers.
• Data Science has become an important part of business and
academic research.
DATA SCIENCE APPLICATION
DATA SCIENCE IN BANKING SECTOR
• Risk Modeling
• analyze the default rate and develop strategies to reinforce their lending schemes
• Fraud Detection
• Advancement of ML have made easy to detect, monitoring and analysis of the user
activity to find any usual or malicious pattern.

• Customer Lifetime Value


• Retention of customers
• Customer Segmentation
• banks group their customers based on their behavior and common characteristics
in order to address them appropriately

• Real-Time Predictive Analytics


• Real-time analytics
• Predictive analytics
CASE STUDY

• https://data-flair.training/blogs/data-science-in-
banking/
DATA SCIENCE LIFE CYCLE
STEPS IN DATA SCIENCE
• Obtaining the Data: This stage involves using technical knowledge like MySQL to
process and generate the data. It can even be in simpler file formats such as
Microsoft Excel. Some examples like Python and R even directly import the datasets
into a data science program.
• Scrubbing the Data: This stage involves cleaning raw data to retain only the relevant
part of the processed data. The noise is also scrubbed off, and the data is refined,
converted, and consolidated.
• Exploring the Data: This stage consists of examining the generated data. The data
and its properties are inspected since different data types demand specific
treatments. Descriptive statistics are then computed to extract the features and test
the significant variables.
• Modeling the Data: The dataset is refined further, and only the essential components
are kept. Only relevant values are kept and tested to predict accurate results.
• Interpreting the Data: At this stage, the final product is interpreted for the client or
business to analyze if it meets the requirement or answers a business question. The
insights are shared with everyone, and the results of the final stage are visualized.
Traits(characteristics) of big data
BIG DATA
• Big data is a collection of large datasets that cannot be processed
using traditional computing techniques. It is not a single technique
or a tool, rather it has become a complete subject, which involves
various tools, techniques and frameworks.
• What Comes Under Big Data?
• Big data involves the data produced by different devices and
applications. Given below are some of the fields that come under
the umbrella of Big Data.
• Black Box Data − It is a component of helicopter, airplanes, and
jets, etc. It captures voices of the flight crew, recordings of
microphones and earphones, and the performance information of
the aircraft.
BIG DATA
• Social Media Data − Social media such as Facebook and Twitter hold
information and the views posted by millions of people across the globe.
• Stock Exchange Data − The stock exchange data holds information
about the ‘buy’ and ‘sell’ decisions made on a share of different
companies made by the customers.
• Power Grid Data − The power grid data holds information consumed by
a particular node with respect to a base station.
• Transport Data − Transport data includes model, capacity, distance and
availability of a vehicle.
• Search Engine Data − Search engines retrieve lots of data from
different databases.
BIG DATA EXAMPLES
TRAITS(CHARACTERISTICS) OF BIG DATA
• Big Data contains a large amount of data that is not being
processed by traditional data storage or the processing unit.

• It is used by many multinational


companies to process the data and business of
many organizations.

• The data flow would exceed 150 exabytes per day before
replication.
THE CHARACTERISTICS OF BIG DATA

• There are five v's of Big Data that explains the


characteristics.
5 V's of Big Data
• Volume
• Veracity
• Variety
• Value
• Velocity
THE CHARACTERISTICS OF BIG DATA
CHARACTERISTICS OF BIG DATA
VOLUME

• The name Big Data itself is related to an enormous size. Big


Data is a vast 'volumes' of data generated from many
sources daily, such as business processes, machines,
social media platforms, networks, human
interactions, and many more.

• Facebook can generate approximately


a billion messages, 4.5 billion times that the "Like" button
is recorded, and more than 350 million new posts are
uploaded each day. Big data technologies can handle large
amounts of data.
VOLUME
VARIETY

• Big Data can be structured, unstructured, and semi-structured that are being
collected from different sources. Data will only be collected
from databases and sheets in the past, But these days the data will comes in array
forms, that are PDFs, Emails, audios, SM posts, photos, videos, etc.
• The data is categorized as below:
• Structured data: In Structured schema, along with all the required columns. It is in a
tabular form. Structured Data is stored in the relational database management system.
• Semi-structured: In Semi-structured, the schema is not appropriately defined,
e.g., JSON, XML, CSV, TSV, and email. OLTP (Online Transaction Processing)
systems are built to work with semi-structured data. It is stored in relations, i.e., tables.
• Unstructured Data: All the unstructured files, log files, audio files,
and image files are included in the unstructured data. Some organizations have much
data available, but they did not know how to derive the value of data since the data is
raw.
• Quasi-structured Data: The data format contains textual data with inconsistent data
formats that are formatted with effort and time with some tools.
VARIETY
VERACITY

• Veracity means how much the data is reliable. It has many


ways to filter or translate the data. Veracity is the process of
being able to handle and manage data efficiently.
• Big Data is also essential in business development.

• For example, Facebook posts with hashtags.


VALUE

• Value is an essential characteristic of big data. It is not the


data that we process or store. It
is valuable and reliable data that we store, process, and
also analyze.
VELOCITY
• Velocity plays an important role compared to others. Velocity
creates the speed by which the data is created in real-time.
It contains the linking of incoming data sets speeds, rate
of change, and activity bursts. The primary aspect of Big
Data is to provide demanding data rapidly.
• Big data velocity deals with the speed at the data flows from
sources like application logs, business processes,
networks, and social media sites, sensors, mobile
devices, etc.
VELOCITY
Web scarping
WEB SCARPING
What is Web Scraping?
• Web scraping is an automatic method to obtain large
amounts of data from websites. Most of this data is
unstructured data in an HTML format which is then converted
into structured data in a spreadsheet or a database so that it
can be used in various applications.
• There are many different ways to perform web scraping to
obtain data from websites. These include using online
services, particular API’s or even creating your code for web
scraping from scratch.
WEB SCARPING
Web Scraping

Web Scrapping extracts the data from websites in the


unstructured format. It helps to collect these
unstructured data and convert it in a structured form.
WEB SCARPING

• Web scraping requires two parts, namely the crawler and


the scraper.
• The crawler is an artificial intelligence algorithm that browses
the web to search for the particular data required by
following the links across the internet.
• The scraper, on the other hand, is a specific tool created to
extract data from the website. The design of the scraper can
vary greatly according to the complexity and scope of the
project so that it can quickly and accurately extract the data.
WHAT IS WEB SCRAPING USED FOR?

• Web Scraping has multiple applications across various


industries. Let’s check out some of these now!
1. Price Monitoring
2. Market Research
3. News Monitoring
4. Sentiment Analysis
5. Email Marketing
WHAT IS WEB SCRAPING USED FOR?

• Web Scraping has multiple applications across various industries. Let’s check out
some of these now!
1. Price Monitoring
• Web Scraping can be used by companies to scrap the product data for their products
and competing products as well to see how it impacts their pricing strategies.
Companies can use this data to fix the optimal pricing for their products so that they
can obtain maximum revenue.
2. Market Research
• Web scraping can be used for market research by companies. High-quality web
scraped data obtained in large volumes can be very helpful for companies in
analyzing consumer trends and understanding which direction the company should
move in the future.
WHAT IS WEB SCRAPING USED FOR?
3. News Monitoring
• Web scraping news sites can provide detailed reports on the current news to a
company. This is even more essential for companies that are frequently in the
news or that depend on daily news for their day-to-day functioning. After all, news
reports can make or break a company in a single day!
4. Sentiment Analysis
• If companies want to understand the general sentiment for their products among
their consumers, then Sentiment Analysis is a must. Companies can use web
scraping to collect data from social media websites such as Facebook and Twitter
as to what the general sentiment about their products is. This will help them in
creating products that people desire and moving ahead of their competition.
5. Email Marketing
• Companies can also use Web scraping for email marketing. They can collect Email
ID’s from various sites using web scraping and then send bulk promotional and
marketing Emails to all the people owning these Email ID’s.
WEB SCARPING
HOW DOES WEB SCRAPPING WORK?
• These are the following steps to perform web scraping. Let's understand the working
of web scraping.
Step -1: Find the URL that you want to scrape
• First, you should understand the requirement of data according to your project. A
webpage or website contains a large amount of information. That's why scrap only
relevant information. In simple words, the developer should be familiar with the data
requirement.
Step - 2: Inspecting the Page
• The data is extracted in raw HTML format, which must be carefully parsed and reduce
the noise from the raw data. In some cases, data can be simple as name and address
or as complex as high dimensional weather and stock market data.
Step - 3: Write the code
• Write a code to extract the information, provide relevant information, and run the
code.
Step - 4: Store the data in the file
TECHNIQUES OF WEB SCRAPING
• Techniques of Web Scraping: There are
two ways of extracting data from websites,
the Manual extraction technique, and the
automated extraction technique.
• Manual Extraction Techniques
• Automated Extraction Techniques:
• HTML Parsing.
• DOM Parsing
• Web Scraping Software:
TECHNIQUES OF WEB SCRAPING
• Techniques of Web Scraping: There are two ways of extracting data from
websites, the Manual extraction technique, and the automated extraction technique.
• Manual Extraction Techniques: Manually copy-pasting the site content comes
under this technique. Though tedious, time taking and repetitive it is an effective way
to scrap data from the sites having good anti-scraping measures like bot detection.
• Automated Extraction Techniques: Web scraping software is used to
automatically extract data from sites based on user requirement.
• HTML Parsing: Parsing means to make something understandable to be analyzing it part by part.
To wit, it means to convert the information in one form to another form that is easy to that is
easier to work on with. HTML parsing means taking in the code and extracting relevant
information from it based on the user requirement. Mainly executed using JavaScript, the target as
the name suggests are HTML pages.
• DOM Parsing: The Document Object Model is the official recommendation of the World Wide Web
Consortium. It defines an interface that enables a user to modify and update the style, structure,
and content of the XML document.
• Web Scraping Software: Nowadays, many web scraping tools are available or are custom build
on users need to extract required desiring information from millions of websites.
DIFFERENT TYPES OF WEB SCRAPERS

• Web Scrapers can be divided on the basis of many different


criteria, including Self-built or Pre-built Web Scrapers,
Browser extension or Software Web Scrapers, and Cloud or
Local Web Scrapers.
• You can have Self-built Web Scrapers but that requires
advanced knowledge of programming.
ANALYSIS VS REPORTING
ANALYSIS VS REPORTING
Analysis:
• The process of exploring data and reports in
order to extract meaningful insights, which can
be used to better understand and improve
business performance.
Reporting:
• The process of organizing data into
informational summaries in order to monitor
how different areas of a business are
performing.
TYPES OF DATA ANALYSIS METHODS

• The major Data Analysis methods are:


• Descriptive Analysis
• Diagnostic Analysis
• Predictive Analysis
• Prescriptive Analysis
• Statistical Analysis
1. DESCRIPTIVE ANALYSIS

• Descriptive Analysis looks at data and analyzes past events for insight as to how
to approach future events. It looks at the past performance and understands the
performance by mining historical data to understand the cause of success or
failure in the past. Almost all management reporting such as sales, marketing,
operations, and finance uses this type of analysis.
• Example: Let’s take an example of DMart, we can look at the product’s history
and find out which products have been sold more or which products have large
demand by looking at the product sold trends and based on their analysis we
can further make the decision of putting a stock of that item in large quantity
for the coming year.
2. DIAGNOSTIC ANALYSIS

• Diagnostic analysis works hand in hand with Descriptive Analysis.


As descriptive Analysis finds out what happened in the past,
diagnostic Analysis, on the other hand, finds out why did that
happen or what measures were taken at that time, or how
frequent it has happened.it basically gives a detailed explanation
of a particular scenario by understanding behavior patterns.
• Example: Let’s take the example of Dmart again. Now if we want
to find out why a particular product has a lot of demand, is it
because of their brand or is it because of quality. All this
information can easily be identified using diagnostic Analysis.
3. PREDICTIVE ANALYSIS

• Whatever information we have received from descriptive and


diagnostic analysis, we can use that information to predict
future data. it basically finds out what is likely to happen in
the future.
4. PRESCRIPTIVE ANALYSIS

• This is an advanced method of Predictive Analysis. Now when


you predict something or when you start thinking out of the
box you will definitely have a lot of options, and then we get
confused as to which option will actually work.
5.STATISTICAL ANALYSIS

• Statistical Analysis is a statistical approach or technique for


analyzing data sets in order to summarize their important and main
characteristics generally by using some visual aids. This approach
can be used to gather knowledge about the following aspects of
data:
• Main characteristics or features of the data.
• The variables and their relationships.
• Finding out the important variables that can be used in our problem.
REPORT
• Data Analysis can be used to report to different people:
• A primary collaborator or client
• Executive and business leaders
• A technical supervisor

• Keep it Succinct: Organize data in a way that makes it easy for different
audiences to skim through it to find the information most relevant to them.
• Make it Visual: Use data visualizations techniques, such as tables and charts, to
communicate the message clearly.
• Include an Executive Summary: This allows someone to analyze your findings
upfront and harness your most important points to influence their decisions.
DATA ANALYSIS TOOLS

1. SAS
• SAS was a programming language developed by the SAS
Institute for performed advanced analytics, multivariate
analyses, business intelligence, data management, and
predictive analytics.
2. Microsoft Excel
• It is an important spreadsheet application that can be useful
for recording expenses, charting data, and performing easy
manipulation and lookup and or generating pivot tables to
provide the desired summarized reports of large datasets
that contain significant data findings.
DATA ANALYSIS TOOLS
3. R
• It is one of the leading programming languages for performing
complex statistical computations and graphics. It is a free and open-
source language that can be run on various UNIX platforms,
Windows, and macOS. It also has a command-line interface that is
easy to use.
4. Python
• It is a powerful high-level programming language that is used for
general-purpose programming. Python supports both structured and
functional programming methods.
DATA ANALYSIS TOOLS
5. Tableau Public
• Tableau Public is free software developed by the public company
“Tableau Software” that allows users to connect to any spreadsheet or
file and create interactive data visualizations.
6. RapidMiner
• RapidMiner is an extremely versatile data science platform developed
by “RapidMiner Inc”. The software emphasizes lightning-fast data
science capabilities and provides an integrated environment for the
preparation of data and application of machine learning, deep
learning, text mining, and predictive analytical techniques
ANALYSIS VS REPORTING
TYPES OF DATA ANALYSIS METHODS

The major Data Analysis methods are:


• Descriptive Analysis
• Diagnostic Analysis
• Predictive Analysis
• Prescriptive Analysis
• Statistical Analysis
1. DESCRIPTIVE ANALYSIS
• Descriptive Analysis looks at data and analyzes past events for
insight as to how to approach future events. It looks at the past
performance and understands the performance by mining historical
data to understand the cause of success or failure in the past.
Almost all management reporting such as sales, marketing,
operations, and finance uses this type of analysis.
• Example: Let’s take an example of DMart, we can look at the
product’s history and find out which products have been sold more
or which products have large demand by looking at the product
sold trends and based on their analysis we can further make the
decision of putting a stock of that item in large quantity for the
coming year.
2. DIAGNOSTIC ANALYSIS
• Diagnostic analysis works hand in hand with Descriptive
Analysis. As descriptive Analysis finds out what happened in
the past, diagnostic Analysis, on the other hand, finds out
why did that happen or what measures were taken at that
time, or how frequent it has happened.it basically gives a
detailed explanation of a particular scenario by
understanding behavior patterns.
• Example: Let’s take the example of Dmart again. Now if we
want to find out why a particular product has a lot of
demand, is it because of their brand or is it because of
quality. All this information can easily be identified using
diagnostic Analysis.
3. PREDICTIVE ANALYSIS
• Whatever information we have received from
descriptive and diagnostic analysis, we can use that
information to predict future data. it basically finds
out what is likely to happen in the future.
4. PRESCRIPTIVE ANALYSIS
• This is an advanced method of Predictive Analysis.
• Now when you predict something or when you start thinking
out of the box you will definitely have a lot of options, and
then we get confused as to which option will actually work.
5.STATISTICAL ANALYSIS

• Statistical Analysis is a statistical approach or


technique for analyzing data sets in order to
summarize their important and main characteristics
generally by using some visual aids.
• This approach can be used to gather knowledge about
the following aspects of data:
• Main characteristics or features of the data.
• The variables and their relationships.
• Finding out the important variables that can be
used in our problem.
REPORT
• Data Analysis can be used to report to different people:
• A primary collaborator or client
• Executive and business leaders
• A technical supervisor

• Keep it Succinct: Organize data in a way that makes it easy for different
audiences to skim through it to find the information most relevant to them.
• Make it Visual: Use data visualizations techniques, such as tables and charts, to
communicate the message clearly.
• Include an Executive Summary: This allows someone to analyze your findings
upfront and harness your most important points to influence their decisions.
DATA ANALYSIS TOOLS

1. SAS
• SAS was a programming language developed by the SAS
Institute for performed advanced analytics, multivariate
analyses, business intelligence, data management, and
predictive analytics.
2. Microsoft Excel
• It is an important spreadsheet application that can be useful
for recording expenses, charting data, and performing easy
manipulation and lookup and or generating pivot tables to
provide the desired summarized reports of large datasets
that contain significant data findings.
DATA ANALYSIS TOOLS
3. R
• It is one of the leading programming languages for performing
complex statistical computations and graphics. It is a free and open-
source language that can be run on various UNIX platforms,
Windows, and macOS. It also has a command-line interface that is
easy to use.
4. Python
• It is a powerful high-level programming language that is used for
general-purpose programming. Python supports both structured and
functional programming methods.
DATA ANALYSIS TOOLS
5. Tableau Public
• Tableau Public is free software developed by the public company
“Tableau Software” that allows users to connect to any spreadsheet or
file and create interactive data visualizations.
6. RapidMiner
• RapidMiner is an extremely versatile data science platform developed
by “RapidMiner Inc”. The software emphasizes lightning-fast data
science capabilities and provides an integrated environment for the
preparation of data and application of machine learning, deep
learning, text mining, and predictive analytical techniques
DATA ANALYSIS TOOLS

You might also like