0% found this document useful (0 votes)

93 views10 pages

Unit 1

This document contains questions and notes about big data and intelligent data analysis. It begins with three parts for a question bank (Parts A, B, and C) that cover topics such as the 5 V's of big data, challenges of conventional systems, intelligent data analysis stages and processes. The notes section defines big data and its sources, provides examples of big data in healthcare, and describes intelligent data analysis and the nature of data.

Uploaded by

Avin Vinod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views10 pages

Unit 1

Uploaded by

Avin Vinod

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

1

UNIT 1 – INTRODUCTION TO BIG DATA

QUESTION BANK
PART A ( 2 Marks each)
1. What you mean by Big Data? Imp
2. What are the different sources of big data?
3. Explain different applications of Big Data. Imp
4. Write down the nature of data. Imp
5. What are the different stages in Intelligent Data Analysis?
6. Distinguish between Analysis vs Reporting. Imp
7. What you mean by statistical distributions?
8. What you mean by re-sampling?
9. Define prediction error.

PART B ( 5 Marks each)

10. Explain 5V’s of Big Data.
11. Explain the challenges of conventional systems.
12. Explain Intelligent Data Analysis. Imp
13. Explain analytical process in detail.

PART C ( 15 Marks each)

14. Explain in detail nature of data.
15. Explain Intelligent Data Analysis. Imp
16. Explain Modern Data Analytical Tools. Imp

NOTES
INTRODUCTION TO BIG DATA PLATFORM
▪ Big Data is high-volume, high-velocity and/or high-variety information asset that
requires new forms of processing for enhanced decision making, insight discovery
and process optimization.
▪ A collection of data sets so large or complex that traditional data processing
applications are inadequate.
▪ Data of a very large size, typically to the extent that its manipulation and
management present significant logistical challenges.
▪ A single mobile phone user will generate about 40 exabytes of data on every month.
▪ This massive amount of data is termed as Big data.
▪ Hence these are a large amount of data.
▪ To classify any data as Big data, this is possible with the the concept of 5v.
1. Volume
2. Velocity
3. Variety

Prepared by NITHIN SEBASTIAN

4. Veracity
5. Value
▪ Consider the following example of Health care industry.
➢ Volume:
o High data volumes impose distinct data storage and processing
demands, as well as additional data preparation, curation, and
management processes.
o Hospitals and clinics across the world generate massive volumes of
data.
o 2314 Exabytes of data are collected annually.
➢ Velocity:
o In Big Data environments, data can arrive at fast speeds, and
enormous datasets can accumulate within very short periods of time.
o These data are patient records and test results.
o All this data is generated at a very high speed. Which attributes to the
velocity of big data.
➢ Variety:
o
o Data variety refers to the multiple formats and types of data that need
to be supported by Big Data solutions.
o Data variety brings challenges for enterprises in terms of data
integration, transformation, processing, and storage.
o It referred to the various data types.
Structured data: Excel Records.
Semi-structured data: Log files
Un-structured data: X-ray Images.
➢ Veracity:
o Veracity refers to the quality or fidelity of data.
o Data that enters Big Data environments needs to be
assessed for quality, which can lead to data processing
activities to resolve invalid data and remove noise.
o Accuracy and trustworthiness of the generated data
termed as veracity.
o Noise is data that cannot be converted into information and thus has
no value, whereas signals have value and lead to meaningful
information.
➢ Value:
o Value is defined as the usefulness of data for an enterprise.
o Analysing all these data will benefit the medical sector by,
Faster disease detection
Better treatment
Reduced cost
o These are known as the value of big data.
▪ To store and process big data, various frameworks are used.

Prepared by NITHIN SEBASTIAN

Cassandra
Hadoop
Spark
▪ Big data is analysed for numerous applications in games like:
HALO 3
CALL OF DUTY
▪ Designers analyse users’ data to understand at which stage users pause, restart, quit
playing.
▪ This insight can help them to rework on the game and improve the user experience.
▪ Also, big data also helped with disaster management during hurricane in 2012 in USA
and necessary measures were taken.

TYPES/SOURCES OF BIG DATA

▪ It is suggested by IBM and the Big Data task team:
• Social networks and web data: such as Facebook, Twitter, e-mails, blogs, and
YouTube.
• Transactions data and Business Processes data: such as credit card
transactions, flight bookings, etc. and public agencies data such as medical
records, insurance business data, etc.
• Customer master data: such as data for facial recognition and for the name,
date of birth, marriage anniversary, gender, location and income category.
• Machine-generated data: such as machine-to-machine or Internet of Things
(IOT) data, and the data from sensors, trackers, web logs and computer
systems log. Computer generated data is also considered as machine
generated data from data stores. Usage of programs for processing of data
using data repositories, such as database or file, generates data and also
machine generated data.
• Human-generated data: such as biometrics data, human-machine interaction
data, e-mail records with a mail server and MySql database of student grades.

CHALLENGES OF CONVENTIONAL SYSTEMS

▪ Big data is huge amount of data, hence conventional systems cannot store, manage
and analyse within a time interval.
1. Data is collected in large quantities and it is not possible to process
everything.
2. Data must be meaningful and collected in real time. Where meaningful data
means a data without irregularity, mistakes and inconsistency. Realtime data
means the data can’t be old or outdated.
3. Data is collected from multiple sources (text, audio, video etc) and must be
categorized. This is an important aspect of data collection. Doing manually, it
is impossible for humans, because it is extremely challenging, time
consuming, requires a lot of manpower.

Prepared by NITHIN SEBASTIAN

4. Collect correct data. By eliminating flaws, mistakes, incompleteness,

inconsistency, irregularity from data. Because wrong data will produce
problems in future.
5. Comparison of data using multiple tools. The collected data must be
represented in the form of graphs, charts, statistics etc.
6. The data must be accessible to the respective person.
7. Lack of knowledgeable professionals who can handle and deals with diverse
and large data.
▪ To overcome these challenges and drawbacks, intelligent data analysis is used.
▪ The valuable information can be gathered with the help of machines, thus reducing
processing time, cost and errors.

INTELLIGENT DATA ANALYSIS (IDA)

▪ Intelligent Data Analysis (IDA) discloses hidden facts that are not known previously
and provides potentially important information or facts from large quantities of data.
▪ It also helps in making a decision.
▪ IDA helps to obtain useful information, necessary data and interesting models from a
lot of data available online in order to make the right choices.
▪ The main goal of intelligent data analysis is to obtain knowledge.
▪ Data analysis is the process of a combination of extracting data from data set,
analysing, classification of data, organizing, reasoning and so on.
▪ Intelligent data analysis is the combination of advanced statistical techniques,
human intuition and serious computing power to address real world data intensive
problems.
▪ Analysis is a scientific process to discover the meaningful patterns and structures
hidden within the mountain of available data and transform this data into
information for better decisions.
▪ IDA is one of the major issues in Artificial Intelligence (AI) and information.
▪ Intelligent data analysis discloses hidden facts that are not known previously and
provides potentially important information or facts from large quantities of data.
▪ IDA helps to obtain useful information, necessary data and interesting models.
▪ In general, IDA includes three stages:
1. Preparation of data
2. Data mining
3. Data validation and explanation

NATURE OF DATA
▪ Data are raw facts that have not been processed to explain their meaning.
▪ Data are stored in the database and data base management system manages data.
i.e., it stores, update and retrieve from the database.
▪ There are 3 types of data:
1. Structured data
2. Semi-structured data

Prepared by NITHIN SEBASTIAN

3. Unstructured data
STRUCTURED DATA
▪ Stored in tabular format.
▪ i.e., in the form of rows and columns.
▪ Structured data clearly defined and data is stored in a pre-defined data model.
▪ Ex: Excel files, SQL data bases
▪ Data are stored in rows and columns are related to each other.
▪ Hence get a proper view and understanding of data.
▪ Real life example:

▪ Structured data are stored in Relational Databases.

UN- STRUCTURED DATA

▪ No predefined structure.
▪ No data model
▪ Data is irregular and ambiguous.
▪ Ex: text, numbers, images, audios, videos, messages, social media post etc.
▪ Easy to extract data.
▪ 80- 90% of data are unstructured data.
▪ Real life example:
▪ Face book, Instagram & YouTube are unstructured data.
▪ It is complex task to analyse such data. hence Artificial Intelligence is used.
▪ Ex: Face recognition by google.
▪ Previously, only structured data was used extensively. But with the help of Artificial
Intelligence, unstructured data are commonly used.
▪ So, unstructured data is the most useful kind of data. & It provides a lot of
information.
SEMI-STRUCTURED DATA
▪ It falls between structured and unstructured data.
▪ It is a combination of both.
▪ Ex: Emails, XML, WWW.

ANALYTIC PROCESS
▪ The steps involved in data analytic process are:

Prepared by NITHIN SEBASTIAN

1. Collecting data
2. Cleaning data
3. Manipulating data
4. Analysing data
5. Visualizing data
▪ Ex: Travel industry:
Collecting Data
✓ If one person is travelling to Delhi, he uses one of the Aviation website,
provides basic details like destination, date of travel, price etc., He select one
with his budget & make the payment confirm. These data are collected by
travel company.
✓ Similarly, many people do the same thing, then it generates a lot of data.
✓ These data are stored in their web servers in tabular format. Hence it is easier
for analyst to analyse this data
Cleaning data
✓ If there are some missing values or un-structed data in tabular format, by
replacing missing values or deleting that row is called cleaning data.
✓ Now the data is clean and ready for analyse.
Manipulating Data
✓ Manipulates the data to create required features and variables.
✓ Ex: if the analyst adds new columns like return date etc.
Analysing Data
✓ Data analyse using logical methods and analytical techniques.
✓ Once it’s ready, advanced analytics processes can turn big data into big insights.
✓ Some of these big data analysis methods include:
● Data mining sorts through large datasets to identify patterns and relationships by
identifying anomalies and creating data clusters.
● Predictive analytics uses an organization’s historical data to make predictions
about the future, identifying upcoming risks and opportunities.
● Deep learning imitates human learning patterns by using artificial intelligence and
machine learning to layer algorithms and find patterns in the most complex and
abstract data.

Visualizing Data
✓ Present data after complete analysis on data.
✓ Visualization means showing analysed data in visual or graphical format for
easy interpretation.

DATA ANALYSIS TOOLS

▪ Data analysis tools provide the analysed results visually.
▪ They are:

Prepared by NITHIN SEBASTIAN

Hadoop:
✓ It is an open-source framework that efficiently stores and processes big
datasets on clusters of commodity hardware.
✓ This framework is free and can handle large amounts of structured and
unstructured data, making it a valuable mainstay for any big data operation.
Microsoft Excel :
✓ Developed by Microsoft.
✓ It is a spread sheet program, used to create grid of numbers, texts and
various formulas.
✓ Easy to use & widely used tool.
✓ Excel works with other office software. i.e., Excel spreadsheets can be
easily added to Word document and Power point presentations.
✓ The biggest benefits of Excel are ability to organize large amounts of
data into orderly logical spreadsheets and charts.
RapidMiner:
✓ It is a data science software platform which helps with data
presentation and analysis.
✓ It is an integrated environment for:
• Data preparation
• Analysis
• Machine learning
• Deep learning
✓ It is widely used in every business and commercial sector.
✓ It has data exploration features such as:
• Graphs
• Descriptive statistics
• Visualization which allows users to get valuable insights.
✓ It has more than 1500 operators for data transformation and analysis
tasks.
Talend:
✓ It is an open-source software platform which offers data integration
and management.
✓ It specializes in big data integration.
✓ It is also available in open-source and premium versions.
✓ It is one of the best tools for cloud computing and big data
integration.
KNIME:
✓ It is a free and open-source data analysis tool to create data science
applications and build machine learning models.
✓ It is an analysing, reporting and integration platform.
✓ KNIME has been used in pharmaceutical research and customer data
analysis, business intelligence, text mining & financial data analysis.
✓ It provides interactive graphical user interface to create visual
workflows.

Prepared by NITHIN SEBASTIAN

▪ Other tools are:

SAS ( Statistical Analysis System)
R and Python
Apache spark
Power BI
Tableau

ANALYSIS VS REPORTING
REPORTING
▪ It is the process of organizing data in the form of graphs and charts.
▪ Reporting is used to provide facts, which can use to draw conclusions, avoid
problems or create plans.
▪ Reporting presents the actual data to end-users, after collecting, sorting and
summarizing it to make it easy to understand.
▪ Reporting offers no judgment or insight.
▪ It focuses on what is happening.
▪ High-level overview of data.
ANALYSING
▪ It is the process of exploring data in order to extract a meaningful insight.
▪ Analytics offers pre-analysed conclusions that a company can use to solve problems
and improve its performance.
▪ Analytics doesn't present the data but instead draws information from the available
data and uses it to generate insights, forecasts and recommended actions.
▪ It focuses on why is something happening.
▪ Data analytics focuses on “why” something is happening within an organization.
▪ Interpret data at a deeper level.

STATISTICAL CONCEPTS
▪ Statistics is a applied mathematics were we collect, organize, analyse and interpret
numerical facts.
▪ Statistical methods are the concepts models and formulas of mathematics used in
the Statistical analysis of data.
▪ It is the science of collecting, exploring and presenting large amounts of data to
identify patterns and trends.
▪ It is also called quantitative analysis.
SAMPLING DISTRIBUTIONS
• In statistics, a population is the entire pool from which a statistical sample is
drawn.
• A population may refer to an entire group of people, objects, events, hospital
visits, or measurements.
• A population can thus be said to be an aggregate observation of subjects
grouped together by a common feature.

Prepared by NITHIN SEBASTIAN

• A lot of data drawn and used by academicians, statisticians, researchers,

marketers, analysts, etc. are actually samples, not populations.
• A sample is a subset of a population.
• A sampling distribution is a probability distribution of a statistic obtained
from a larger number of samples drawn from a specific population.
• The sampling distribution of a given population is the distribution of
frequencies of a range of different outcomes that could possibly occur for a
statistic of a population.
• Sampling Distribution is a statistic that aims to guess a large number of
samples obtained from a specific group repeatedly.
• In statistics, the probability is used for calculating the likely occurrence of a
phenomenon.
• This is done by collecting samples from populations.
• A lot of data that is collected over time, aim to calculate the probabilities of
an event.
• This data is collected with utmost precision.
• Sampling distribution involves more than one statistical value of a sample.
• The primary purpose of Sampling Distribution is to establish representative
results of small samples of a comparatively larger population.
• The significance of sampling distribution :
✓ It provides accuracy.
✓ Provides consistency.
RE-SAMPLING

• The problem with the sampling process is that we only have a single estimate
of the population parameter, with little idea of the variability or uncertainty in
the estimate.
• One way to address this is by estimating the population parameter multiple
times from our data sample. This is called resampling.
• Re-sampling is the method that consists of creating or drawing repeated
samples from the original samples.
• Resampling involves the selection of randomized cases with replacement
from the original data sample in such a manner that each number of a sample
drawn has a number of cases that are similar to the original data sample.

STATISTICAL INFERENCE

• Statistical inference is the process of analysing the result and making

conclusions from data.
• It is also called inferential statistics.
• Statistical inference is a method of making decisions about the parameters of
a population, based on random sampling.
• It helps to assess the relationship between the dependent and independent
variables.

Prepared by NITHIN SEBASTIAN

• different types of statistical inferences are:

✓ Pearson Correlation
✓ Bi-variate regression
✓ Multi-variate regression
✓ Chi-square statistics and contingency table
✓ ANOVA or T-test
• The statistical inference has a wide range of application in different fields,
such as:
✓ Business Analysis
✓ Artificial Intelligence
✓ Financial Analysis
✓ Fraud Detection
✓ Machine Learning etc.
PREDICTION ERROR

• Predictive analytical processes use new and historical data to forecast activity,
behaviour, and trends.
• A prediction error is the failure of some expected event to occur.
• When prediction fails, humans can use different methods, examining
predictions and failures and deciding some methods to overcome such errors
in the future.
• Applying that type of knowledge can inform decisions and improve the
quality of future prediction.

Prepared by NITHIN SEBASTIAN

XKWorkshopManual PDF
No ratings yet
XKWorkshopManual PDF
3,165 pages
Big - Data Unit-1
100% (2)
Big - Data Unit-1
33 pages
Big Data Analytics Notess
No ratings yet
Big Data Analytics Notess
69 pages
Fbda Unit-1
No ratings yet
Fbda Unit-1
17 pages
Bda (Unit 1)
No ratings yet
Bda (Unit 1)
24 pages
Itfm Assignment Group 8
100% (1)
Itfm Assignment Group 8
16 pages
Data Analytics New Quantum AKTU
No ratings yet
Data Analytics New Quantum AKTU
210 pages
Cloud Computing
No ratings yet
Cloud Computing
86 pages
DC-6 Om
100% (4)
DC-6 Om
522 pages
DBIS Lecture 4 - Slides (AI and Big Data)
No ratings yet
DBIS Lecture 4 - Slides (AI and Big Data)
84 pages
Manual Phonic
0% (1)
Manual Phonic
46 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
Unit-5 DS
No ratings yet
Unit-5 DS
20 pages
Basics of Big Data Notes
No ratings yet
Basics of Big Data Notes
17 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
R II Bca IV Sem Unit 3 Balu Sir
No ratings yet
R II Bca IV Sem Unit 3 Balu Sir
14 pages
Bda (Chapter 1)
No ratings yet
Bda (Chapter 1)
8 pages
Big Data-Intro
No ratings yet
Big Data-Intro
31 pages
DA Merge Notes (30!09!24)
No ratings yet
DA Merge Notes (30!09!24)
348 pages
Big Data Analytics Unit1
No ratings yet
Big Data Analytics Unit1
20 pages
BDA Unit 1
No ratings yet
BDA Unit 1
10 pages
What Is Data
No ratings yet
What Is Data
20 pages
Introduction To Data
No ratings yet
Introduction To Data
34 pages
Big Data Analytics
No ratings yet
Big Data Analytics
15 pages
Unit - I - Types of Digital Data
No ratings yet
Unit - I - Types of Digital Data
45 pages
Mtech Scheme
No ratings yet
Mtech Scheme
54 pages
Module 1
No ratings yet
Module 1
21 pages
Big Data Analytics QB
No ratings yet
Big Data Analytics QB
44 pages
Big Data Answers
No ratings yet
Big Data Answers
14 pages
BDA Unit 1
No ratings yet
BDA Unit 1
39 pages
Hamid Seminar
No ratings yet
Hamid Seminar
57 pages
Dataanalyticsunit 1
No ratings yet
Dataanalyticsunit 1
26 pages
Bigdatanalyticsintro
No ratings yet
Bigdatanalyticsintro
60 pages
Bda U1
No ratings yet
Bda U1
78 pages
Big Data (Unit 1)
No ratings yet
Big Data (Unit 1)
37 pages
UNIT I Notes
No ratings yet
UNIT I Notes
26 pages
Itfm Assignment Group 5
No ratings yet
Itfm Assignment Group 5
14 pages
1.1 Module-1
No ratings yet
1.1 Module-1
31 pages
BDU1
No ratings yet
BDU1
39 pages
Unit 1
No ratings yet
Unit 1
61 pages
Module 6 - Big Data and NOSQL
No ratings yet
Module 6 - Big Data and NOSQL
63 pages
Uipath - Uipath-Ardv1.V2021-01-22.Q52: Leave A Reply
No ratings yet
Uipath - Uipath-Ardv1.V2021-01-22.Q52: Leave A Reply
15 pages
Big Data Analytics
No ratings yet
Big Data Analytics
58 pages
01 - Introduction To Big Data Analytics PDF
No ratings yet
01 - Introduction To Big Data Analytics PDF
37 pages
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-07-15 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-07-15 Reference-Material-I
69 pages
Unit1 - Introduction To Big Data
No ratings yet
Unit1 - Introduction To Big Data
53 pages
BDA Unit 1
No ratings yet
BDA Unit 1
50 pages
Unit 2 Da
No ratings yet
Unit 2 Da
69 pages
Data Analysis - Unit1
No ratings yet
Data Analysis - Unit1
65 pages
Da Unit-1
No ratings yet
Da Unit-1
24 pages
Lecture1 Introductiontobigdata 190301171350
No ratings yet
Lecture1 Introductiontobigdata 190301171350
63 pages
Big Data Pgdca
No ratings yet
Big Data Pgdca
23 pages
Unit 1
No ratings yet
Unit 1
59 pages
Unit 01
No ratings yet
Unit 01
32 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
Unit-I - Big Data
No ratings yet
Unit-I - Big Data
29 pages
Module 1. 16974328175990
No ratings yet
Module 1. 16974328175990
119 pages
Unit 1
No ratings yet
Unit 1
76 pages
Unit 1
No ratings yet
Unit 1
44 pages
Chapter 2 - EMTE - 240216 - 133452
No ratings yet
Chapter 2 - EMTE - 240216 - 133452
47 pages
1 - Konsep Big Data
No ratings yet
1 - Konsep Big Data
35 pages
DAUnit 1
No ratings yet
DAUnit 1
20 pages
Big Data Study Material Part 1 (Unit I) - 1
No ratings yet
Big Data Study Material Part 1 (Unit I) - 1
38 pages
Understanding SAP EWM Wave
No ratings yet
Understanding SAP EWM Wave
8 pages
Fig. Qty Description Code Fig. Qty Description Code: Carburettor 40 DCOE Part No. 19550.174 Parts
No ratings yet
Fig. Qty Description Code Fig. Qty Description Code: Carburettor 40 DCOE Part No. 19550.174 Parts
2 pages
Compact, High-Flow, Electric Remote Controlled Water Monitor
No ratings yet
Compact, High-Flow, Electric Remote Controlled Water Monitor
2 pages
ARINC Meteorological Data Collection and Reporting System (MDCRS)
No ratings yet
ARINC Meteorological Data Collection and Reporting System (MDCRS)
16 pages
Q.18604 Cummin Genset Nta 855 - 1
100% (2)
Q.18604 Cummin Genset Nta 855 - 1
1 page
TOR B1 Listening WS 3 Standard
No ratings yet
TOR B1 Listening WS 3 Standard
3 pages
Application of Six Sigma With Respect To Abbott Laboratories.
100% (1)
Application of Six Sigma With Respect To Abbott Laboratories.
17 pages
Sweet Potatao As Superfood
No ratings yet
Sweet Potatao As Superfood
6 pages
Q3 Gender 2018 Sex Gender Nature Nurture
No ratings yet
Q3 Gender 2018 Sex Gender Nature Nurture
5 pages
Bio Metrics
No ratings yet
Bio Metrics
23 pages
Module 5 Reflection 1
No ratings yet
Module 5 Reflection 1
7 pages
CHAPTER 7 - MATHEMATICS of FINANCE, Seventh Edition by Robert L. Brown, Steve Kopp and Petr Zima (Z-Lib - Org) - 261-289
No ratings yet
CHAPTER 7 - MATHEMATICS of FINANCE, Seventh Edition by Robert L. Brown, Steve Kopp and Petr Zima (Z-Lib - Org) - 261-289
29 pages
Automobile Road Test
No ratings yet
Automobile Road Test
2 pages
CS Executive Sbec MCQ Questions With Answers
No ratings yet
CS Executive Sbec MCQ Questions With Answers
20 pages
PLC Interview Questions
No ratings yet
PLC Interview Questions
3 pages
QIG Quick Installation Guide DCU 305 R3
No ratings yet
QIG Quick Installation Guide DCU 305 R3
2 pages
Chapter-4: Operations, Material and Maketing Management: Definition & Importance of Operational Management
No ratings yet
Chapter-4: Operations, Material and Maketing Management: Definition & Importance of Operational Management
47 pages
8 Powerful Icon Libraries
No ratings yet
8 Powerful Icon Libraries
10 pages
Update On Renewed Effort To Strengthen Routine Immunization
No ratings yet
Update On Renewed Effort To Strengthen Routine Immunization
49 pages
Amisha Reflective Report
No ratings yet
Amisha Reflective Report
8 pages
Practical Set-1: The Result Is 600 The Result Is 70
No ratings yet
Practical Set-1: The Result Is 600 The Result Is 70
12 pages
Task3.Ipynb - Colaboratory Dip
No ratings yet
Task3.Ipynb - Colaboratory Dip
3 pages
Secret of Anti-Aging Anti-Aging Food Con
No ratings yet
Secret of Anti-Aging Anti-Aging Food Con
5 pages
UNIT U03 02 Grammar Summary
No ratings yet
UNIT U03 02 Grammar Summary
5 pages
Day 4 Plastic Pollution Ielts Nguyenhuyen
No ratings yet
Day 4 Plastic Pollution Ielts Nguyenhuyen
1 page
Worksheet 3 LS6 - MIANO, REYMARK
No ratings yet
Worksheet 3 LS6 - MIANO, REYMARK
1 page
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet

Unit 1

Uploaded by

Unit 1

Uploaded by

1

UNIT 1 – INTRODUCTION TO BIG DATA

PART B ( 5 Marks each)

PART C ( 15 Marks each)

Prepared by NITHIN SEBASTIAN

Prepared by NITHIN SEBASTIAN

TYPES/SOURCES OF BIG DATA

CHALLENGES OF CONVENTIONAL SYSTEMS

Prepared by NITHIN SEBASTIAN

4. Collect correct data. By eliminating flaws, mistakes, incompleteness,

INTELLIGENT DATA ANALYSIS (IDA)

Prepared by NITHIN SEBASTIAN

▪ Structured data are stored in Relational Databases.

Prepared by NITHIN SEBASTIAN

DATA ANALYSIS TOOLS

Prepared by NITHIN SEBASTIAN

Prepared by NITHIN SEBASTIAN

▪ Other tools are:

Prepared by NITHIN SEBASTIAN

• A lot of data drawn and used by academicians, statisticians, researchers,

• Statistical inference is the process of analysing the result and making

Prepared by NITHIN SEBASTIAN

• different types of statistical inferences are:

Prepared by NITHIN SEBASTIAN

You might also like