Big data refers to large, complex datasets that cannot be processed by traditional data processing software. It comes from various sources like smartphones, social media, and online transactions. The key aspects of big data are volume, velocity, and variety. Common big data tools include Hadoop, Spark, MapReduce, Hive, and Impala which help address challenges like data volume, velocity, variety, and veracity. ETL extracts data from sources, transforms it, and loads it into a data warehouse, while ELT loads raw data and then transforms it within the data destination.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100%(1)100% found this document useful (1 vote)
78 views4 pages
BigData Brief
Big data refers to large, complex datasets that cannot be processed by traditional data processing software. It comes from various sources like smartphones, social media, and online transactions. The key aspects of big data are volume, velocity, and variety. Common big data tools include Hadoop, Spark, MapReduce, Hive, and Impala which help address challenges like data volume, velocity, variety, and veracity. ETL extracts data from sources, transforms it, and loads it into a data warehouse, while ELT loads raw data and then transforms it within the data destination.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4
[ Big Data ]
What exactly is big data?
The definition of big data is: data that contains greater variety, arriving in increasing volumes and with more velocity
Put simply, big data is larger, more complex data sets,
especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before.
Big data Types:
Big data use cases:
• Customer segmentation • Marketing campaign optimization • Product development • Social media sentiment analysis • Supply chain optimization • Financial analysis • Fraud detection • Image and speech recognition • Personalized medicine • Energy management • Cybersecurity • Smart city planning • Traffic analysis Big data Technology Challenges:
Challenge Description Solution Technology
Volume & Avoid risk of data loss from Replicate segments s of Resilience machine failure in clusters of data in multiple machines, commodity machines master nodes keeps track of segments location
Volume & Avoid chocking of network Move processing logic to
Velocity bandwidth when moving where the data is stored, large volumes of data using parallel processing algorithms
Variety Efficient storage of large and NoSQL Database
small data objects (documents, graphs, key- value pairs. Etc.)
Velocity Monitoring streams too large For-shaped architecture
to store to process data as a stream and as a batch
Big data Ecosystem:
[ Q&A ] 1- What is Big Data, and where does it come from? • big data is larger, more complex data sets, these data sets are so voluminous that traditional data processing software just can’t manage them. Big Data comprises unstructured and structured data sets such as videos, photos, audio, websites, and multimedia content.
• It comes from various sources like:
o Smartphones o Internet cookies o Social media posts o Online purchase transaction
2- What are the V’s in Big Data?
• Volume: The huge amount of data stored in data warehouses.
• Velocity: Velocity basically introduces the pace at which data is being produced in real-time. • Variety: Big Data comprises structured, unstructured, and semi-structured data collected from varied sources. • Veracity: Data veracity basically relates to how reliable the data is, we can define it as the quality of the data analyzed.
3- What is the difference between Database and Big data?
There are major differences in: • Size Traditional data sets tend to be measured in gigabytes and terabytes Big data is usually measured in petabytes, zettabytes, or exabytes • Sources Traditional data derives from enterprise-level sources like ERP and CRM Big data derives from a broader range of enterprise and non-enterprise-level data • Organization Traditional databases, like SQL & Oracle DB use fixed schema that is static and preconfigured. Big data uses a dynamic schema. In storage, big data is raw and unstructured. • Architecture Traditional data is typically managed using a centralized architecture Big data uses a distributed architecture. 4- What are the tools in big data? What are (Hadoop/Spark / MapReduce / Hive / impala/ Impala/ Kafka/…)? Bigdata tools & technologies are used to solve the bigdata challenges:
Challenge Description Solution Technology
Volume & Avoid risk of data loss from Replicate segments s of Resilience machine failure in clusters of data in multiple machines, commodity machines master nodes keeps track of segments location
Volume & Avoid chocking of network Move processing logic to
Velocity bandwidth when moving where the data is stored, large volumes of data using parallel processing algorithms
Variety Efficient storage of large and NoSQL Database
small data objects (documents, graphs, key- value pairs. Etc.)
Velocity Monitoring streams too large For-shaped architecture
to store to process data as a stream and as a batch
5- What is the difference between ETL & ELT?
• ETL stands for Extract, Transform, and Load, while ELT stands for Extract, Load, and Transform. o In ETL, data flows from the data source to staging to the data destination. o ELT lets the data destination do the transformation, eliminating the need for data staging.
• ETL can help with data privacy and compliance, cleansing sensitive data before loading into the data destination, while ELT can handle large volumes of unstructured data.
Software Asset Management: What Is It and Why Do I Need It?: A Textbook on the Fundamentals in Software License Compliance, Audit Risks, Optimizing Software License ROI, Business Practices and Life Cycle Management