0% found this document useful (0 votes)
154 views4 pages

Apache Spark and Ignite

This document provides an overview of Apache Spark and Apache Ignite. It begins with an introduction to Spark and sections on Spark basics, working with RDDs, aggregating data with pair RDDs, writing and deploying Spark applications, parallel processing, RDD persistence, basic and advanced Spark streaming, improving Spark performance, and Spark SQL and DataFrames. It then covers Apache Ignite, including its architecture, in-memory data grid, SQL support, in-memory compute grid, and integration with Spark. It concludes with Ignite use cases.

Uploaded by

Hari Om Atul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views4 pages

Apache Spark and Ignite

This document provides an overview of Apache Spark and Apache Ignite. It begins with an introduction to Spark and sections on Spark basics, working with RDDs, aggregating data with pair RDDs, writing and deploying Spark applications, parallel processing, RDD persistence, basic and advanced Spark streaming, improving Spark performance, and Spark SQL and DataFrames. It then covers Apache Ignite, including its architecture, in-memory data grid, SQL support, in-memory compute grid, and integration with Spark. It concludes with Ignite use cases.

Uploaded by

Hari Om Atul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Apache Spark and Apache Ignite

Introduction to Spark
1 What is Spark?
2 Review: From Hadoop MapReduce to Spark
3 Review: HDFS
4 Review: YARN
5 Spark Overview
Spark Basics
1 Using the Spark Shell
2 RDDs (Resilient Distributed Datasets)
3 Functional Programming in Spark
Working with RDDs in Spark
1 Creating RDDs
2 Other General RDD Operations
Aggregating Data with Pair RDDs
1 Key-Value Pair RDDs
2 Map-Reduce
3 Other Pair RDD Operations
Writing and Deploying Spark Applications
1 Spark Applications vs. Spark Shell
2 Creating the SparkContext
3 Building a Spark Application (Scala and Java)
4 Running a Spark Application
5 The Spark Application Web UI
6 Hands-On Exercise: Write and Run a Spark
Application
7 Configuring Spark Properties
8 Logging

Parallel Processing
1 Review: Spark on a Cluster
2 RDD Partitions
3 Partitioning of File-based RDDs
4 HDFS and Data Locality
5 Executing Parallel Operations
6 Stages and Tasks
Spark RDD Persistence
1 RDD Lineage
2 RDD Persistence Overview
3 Distributed Persistence
Basic Spark Streaming
1 Spark Streaming Overview
2 Example: Streaming Request Count
3 DStreams
4 Developing Spark Streaming Applications
Advanced Spark Streaming
1 Multi-Batch Operations
2 State Operations
3 Sliding Window Operations
4 Advanced Data Sources

Improving Spark Performance


1 Shared Variables: Broadcast Variables
2 Shared Variables: Accumulators
3 Common Performance Issues
4 Diagnosing Performance Problems

Spark SQL and DataFrames


1 Spark SQL and the SQL Context
2 Creating DataFrames
3 Transforming and Querying DataFrames
4 Saving DataFrames
5 DataFrames and RDDs
6 Comparing Spark SQL, Impala and Hive-on-Spark

Apache IGNITE
INTRODUCTION AND OVERVIEW
ARCHITECTURE
IN-MEMORY DATA GRID
SQL SUPPORT
IN-MEMORY COMPUTE GRID
IN-MEMORY SERVICE GRID
IN-MEMORY STREAMING
IN-MEMORY HADOOP ACCELERATION
DISTRIBUTED IN-MEMORY FILE SYSTEM
ADVANCED CLUSTERING
DISTRIBUTED MESSAGING
DISTRIBUTED EVENTS
DISTRIBUTED DATA STRUCTURES
UNIFIED API

Integration with Spark


Use Cases

You might also like