Dsbda Unit - 1
Dsbda Unit - 1
• Both fields deal with large amounts of data and require specialized
skills and expertise
• Both aim to extract insights and knowledge from data to inform
decision-making
• Both have a wide range of applications in various industries
• Both can lead to significant cost savings and operational efficiencies
when applied correctly
Difference
Data Science
: Data Science & Bigdata
Bigdata
• Data Science is an area. • Big Data is a technique to collect,
maintain and process huge
• It is about the collection, information.
processing, analyzing, and
• It is about extracting vital and
utilizing of data in various valuable information from a huge
operations. It is more amount of data.
conceptual. • It is a technique for tracking and
• It is a field of study just like discovering trends in complex data
Computer Science, Applied sets.
Statistics, or Applied • The goal is to make data more vital
Mathematics. and usable i.e. by extracting only
important information from the huge
• The goal is to build data- data within existing traditional
dominant products for a aspects.
venture.
Difference
Data Science
: Data Science & Bigdata
Bigdata
• Tools mainly used in Data • Tools mostly used in Big Data
Science include SAS, R, Python, include Hadoop, Spark, Flink,
etc. etc.
• It is a superset of Big Data as • It is a sub-set of Data Science as
data science consists of Data mining activities which is in a
scrapping, cleaning, pipeline of Data science.
visualization, statistics, and • It is mainly used for business
many more techniques. purposes and customer
• It is mainly used for scientific satisfaction.
purposes. • It is more involved with the
• It broadly focuses on the science processes of handling
of the data. voluminous data.
9 Big Data Examples & Use
Cases
• Transportation.
• Advertising and Marketing.
• Banking and Financial Services.
• Government.
• Media and Entertainment.
• Meteorology.
• Healthcare.
• Cybersecurity.
• Education
Data explosion
• The rapid or exponential increase in the amount of data that is
generated and stored in the computing systems, that reaches level
where data management becomes difficult, is called “Data Explosion”.
Data processing infrastructure challenges
Transportation
Processing
• I s to combine some for m of logical and
mat hemat ica l calculations together in one cycle of operation.
1. CPU or processor.
2. Memory
3. Software
Data processing infrastructure challenges
CPU or processor.
• With each generation:
- the computing speed and processing power have increased
-leading to more processing capabilities
- access to wider memory.
- architecture evolution within the software layers.
Memory.
• While the storage of data to disk for offline processing proved the need
for storage evolution and data management.
Software
• Main component of data processing.
Speed or throughput