CC PPT
CC PPT
INTRODUCTION TO
Krishnan Kaushik (1CR21CS082)
Kottakota Rishina (1CR21cs081)
Mayank Kumar Singh
In-Memory Computing
How It Works:
• Spark processes data in RAM instead of disk
• Drastically reduces time spent on I/O operations
• Increases speed, especially for iterative algorithms
SPARK SQL AND
DATAFRAMES
•Spark SQL:
• Allows querying structured data via SQL
• Can load data from various sources like JSON, Parquet, etc.
•DataFrames:
• Distributed collection of data organized into named columns
• Optimized for structured data processing
SPARK STREAMING AND
SPARK MLLIB
•Real-Time Data Processing:
• Processes live data streams (like logs, social media feeds)
• Can handle batch intervals (micro-batching)
•Use Cases:
• Log analysis, fraud detection, real-time analytics