0% found this document useful (0 votes)
14 views21 pages

Dsbda Unit - 1

The document provides an overview of Big Data and Data Science, highlighting their definitions, advantages, disadvantages, similarities, and differences. It discusses the challenges related to data processing infrastructure and the importance of data management in various fields. Additionally, it outlines examples of Big Data applications and the concept of data explosion.

Uploaded by

Devika Rankhambe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views21 pages

Dsbda Unit - 1

The document provides an overview of Big Data and Data Science, highlighting their definitions, advantages, disadvantages, similarities, and differences. It discusses the challenges related to data processing infrastructure and the importance of data management in various fields. Additionally, it outlines examples of Big Data applications and the concept of data explosion.

Uploaded by

Devika Rankhambe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

DSBDA Unit - I

Introduction to Big Data & Data


Science
Bigdata Data Science
• It is huge, large, or voluminous • Data Science is a field or domain
data, information, or the which includes and involves working
relevant statistics acquired by with a huge amount of data and
large organizations and ventures using it for building predictive,
prescriptive, and analytical models.
• Many software and data • It’s about digging, capturing,
storages is created and prepared (building the model)
as it is difficult to compute the analyzing(validating the model), and
big data manually utilizing the data(deploying the b
• It is used to discover patterns • It is an intersection of Data and
and trends and make decisions computing.est model).
related to human behavior and • It is a blend of the field of Computer
interaction technology. Science, Business, and Statistics
together.
Advantages & Disadvantages of Big
Data of Big Data
Advantages Disadvantages of Big Data
• Able to handle and process large • Requires specialized skills and
and complex data sets that expertise in data engineering, data
cannot be easily managed with management, and big data tools
traditional database systems and technologies
• Provides a platform for advanced • Can be expensive to implement
analytics and machine learning and maintain due to the need for
applications specialized infrastructure and
software
• Enables organizations to gain
insights and make data-driven • May face privacy and security
decisions based on large amounts concerns when handling sensitive
data
of data
• Can be challenging to integrate
• Offers potential for significant
with existing systems and processes
cost savings through efficient
Advantages & Disadvantages of
Data ofScience
Advantages Data Science Disadvantages of Data Science
• Provides a framework for extracting • Requires specialized skills and
insights and knowledge from data expertise in statistical analysis,
through statistical analysis, machine learning, and data
machine learning, and visualization
• data visualization techniques • Can be time-consuming and
• Offers a wide range of applications resource-intensive due to the
in various fields such as finance, need for data cleaning and
healthcare, and marketing preprocessing
• Helps organizations make informed • May face ethical concerns when
decisions by extracting meaningful dealing with sensitive data
insights from data
• Can be challenging to integrate
• Offers potential for significant cost with existing systems and
savings through efficient data
processes
management and analysis
Similarities between Big Data and Data Science:

• Both fields deal with large amounts of data and require specialized
skills and expertise
• Both aim to extract insights and knowledge from data to inform
decision-making
• Both have a wide range of applications in various industries
• Both can lead to significant cost savings and operational efficiencies
when applied correctly
Difference
Data Science
: Data Science & Bigdata
Bigdata
• Data Science is an area. • Big Data is a technique to collect,
maintain and process huge
• It is about the collection, information.
processing, analyzing, and
• It is about extracting vital and
utilizing of data in various valuable information from a huge
operations. It is more amount of data.
conceptual. • It is a technique for tracking and
• It is a field of study just like discovering trends in complex data
Computer Science, Applied sets.
Statistics, or Applied • The goal is to make data more vital
Mathematics. and usable i.e. by extracting only
important information from the huge
• The goal is to build data- data within existing traditional
dominant products for a aspects.
venture.
Difference
Data Science
: Data Science & Bigdata
Bigdata
• Tools mainly used in Data • Tools mostly used in Big Data
Science include SAS, R, Python, include Hadoop, Spark, Flink,
etc. etc.
• It is a superset of Big Data as • It is a sub-set of Data Science as
data science consists of Data mining activities which is in a
scrapping, cleaning, pipeline of Data science.
visualization, statistics, and • It is mainly used for business
many more techniques. purposes and customer
• It is mainly used for scientific satisfaction.
purposes. • It is more involved with the
• It broadly focuses on the science processes of handling
of the data. voluminous data.
9 Big Data Examples & Use
Cases
• Transportation.
• Advertising and Marketing.
• Banking and Financial Services.
• Government.
• Media and Entertainment.
• Meteorology.
• Healthcare.
• Cybersecurity.
• Education
Data explosion
• The rapid or exponential increase in the amount of data that is
generated and stored in the computing systems, that reaches level
where data management becomes difficult, is called “Data Explosion”.
Data processing infrastructure challenges
Transportation

• One of the biggest issue is moving data between different


systems and then storing it or loading it into memory for
manipulation.

• This continuous movement of data has been one of the reasons


that structured data processing evolved to be restrictive in nature,
(where the data had to be transported between the compute and
storage layers. )

• Network technologies facilitated the bandwidth of the transport


layers to be much bigger and more scalable.
Data processing infrastructure challenges

Processing
• I s to combine some for m of logical and
mat hemat ica l calculations together in one cycle of operation.

• Divided into the 3 main areas:

1. CPU or processor.
2. Memory
3. Software
Data processing infrastructure challenges
CPU or processor.
• With each generation:
- the computing speed and processing power have increased
-leading to more processing capabilities
- access to wider memory.
- architecture evolution within the software layers.
Memory.
• While the storage of data to disk for offline processing proved the need
for storage evolution and data management.

• As the processor evaluations improving the capability of the processor,


Memory has becomes cheaper and faster in terms of speed.

• According to the allocated memory to system, the process resides within


a system, has changed significantly.
Data processing infrastructure challenges

Software
• Main component of data processing.

• used to develop the programs to transform and process the data.

• Software across different layers from operating systems to


programming languages has evolved generationally.

• Translates sequenced instruction sets into machine language that is


used to process data with the infrastructure layers of CPU +
memory + storage.
Data processing infrastructure challenges

Speed or throughput

• The biggest continuing challenge.

• Speed is a combination of various architecture layers: hardware,


software, networking, and storage.
Big Data infrastructure Challenges
• Sharing and Accessing Data
• Privacy and Security
• Analytical Challenges
• Technical challenges – Quality of Data, Fault tolerance, Scalability
Sharing and Accessing Data
• Inaccessibility of data sets from external sources.
• Sharing data can cause substantial challenges.
• It include the need for inter and intra- institutional legal documents.
• Accessing data from public repositories leads to multiple difficulties.
• It is necessary for the data to be available in an accurate, complete
and timely manner because if data in the companies information
system is to be used to make accurate decisions in time then it
becomes necessary for data to be available in this manner.
Datawarehouse Architecture
Shared everything Architectures
Shared nothing Architectures

You might also like