Learning Spark

This document contains reading notes for the book "Learning Spark: Lightning-Fast Big Data Analysis". It provides an introduction to the book, biographies of the authors, and links to example code and reading notes for Chapter 2. The reading notes are shared on GitHub and are intended solely for educational purposes in learning Spark.

Uploaded by

roblim1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

547 views4 pages

Learning Spark

Uploaded by

roblim1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Learning

Spark reading notes

Table of Contents
1. introduction
2. chapter 02 reading note

2
Learning Spark reading notes

Learning Spark: Lightning-Fast Big Data Analysis

reading notes
Reading notes for the book of Learning Spark: Lightning-Fast Big Data Analysis is only for spark developer educational
purposes. Reading notes in Github: https://github.com/gaoxuesong/learning-spark-lightning-fast-big-data-analysis

《Learning Spark: Lightning-Fast Big Data Analysis》的中文读书笔记纯属个人对于Spark的兴趣，仅供学习。读书笔记分

享于Github: https://github.com/gaoxuesong/learning-spark-lightning-fast-big-data-analysis

About the Author

Holden Karau is a software development engineer at Databricks and is active in open source. She is the author of an
earlier Spark book. Prior to Databricks she worked on a variety of search and classification problems at Google,
Foursquare, and Amazon. She graduated from the University of Waterloo with a Bachelors of Mathematics in Computer
Science. Outside of software she enjoys playing with fire, welding, and hula hooping.

Most recently, Andy Konwinski co-founded Databricks. Before that he was a PhD student and then postdoc in the AMPLab
at UC Berkeley, focused on large scale distributed computing and cluster scheduling. He co-created and is a committer on
the Apache Mesos project. He also worked with systems engineers and researchers at Google on the design of Omega,
their next generation cluster scheduling system. More recently, he developed and led the AMP Camp Big Data Bootcamps
and first Spark Summit, and has been contributing to the Spark project.

Patrick Wendell is an engineer at Databricks as well as a Spark Committer and PMC member. In the Spark project, Patrick
has acted as release manager for several Spark releases, including Spark 1.0. Patrick also maintains several subsystems
of Spark's core engine. Before helping start Databricks, Patrick obtained an M.S. in Computer Science at UC Berkeley. His
research focused on low latency scheduling for large scale analytics workloads. He holds a B.S.E in Computer Science
from Princeton University

Matei Zaharia is the creator of Apache Spark and CTO at Databricks. He holds a PhD from UC Berkeley, where he started
Spark as a research project. He now serves as its Vice President at Apache. Apart from Spark, he has made research and
open source contributions to other projects in the cluster computing area, including Apache Hadoop (where he is a
committer) and Apache Mesos (which he also helped start at Berkeley).

Examples for Learning Spark

codes https://github.com/gaoxuesong/learning-spark/ forked from https://github.com/databricks/learning-spark

introduction 3
Learning Spark reading notes

chapter 02 reading note

https://github.com/gaoxuesong/learning-spark-lightning-fast-big-data-analysis/blob/master/chapter02.pdf

chapter 02 reading note 4

Machine Learning For Tabular Data XGBoost, Deep Learning, and AI (Mark Ryan, Luca Massaron) (Z-Library)
100% (1)
Machine Learning For Tabular Data XGBoost, Deep Learning, and AI (Mark Ryan, Luca Massaron) (Z-Library)
504 pages
Pandas
100% (2)
Pandas
2,017 pages
Learning Spark
27% (11)
Learning Spark
3 pages
Tableau Desktop 2020 Cookbook
0% (1)
Tableau Desktop 2020 Cookbook
180 pages
Amit Kumar Tyagi - Data Science and Data Analytics - Opportunities and Challenges-Chapman and Hall - CRC (2021)
100% (1)
Amit Kumar Tyagi - Data Science and Data Analytics - Opportunities and Challenges-Chapman and Hall - CRC (2021)
483 pages
Hadoop With Python
100% (6)
Hadoop With Python
71 pages
Duckdb Docs
No ratings yet
Duckdb Docs
721 pages
Machine Learning With Spark - Sample Chapter
100% (1)
Machine Learning With Spark - Sample Chapter
36 pages
Data Science For Financial Markets - Kaggle
No ratings yet
Data Science For Financial Markets - Kaggle
202 pages
Natural Language Processing
100% (1)
Natural Language Processing
12 pages
2023 - Mohamed Abdel-Basset, Nour Moustafa, Hossam Hawash, Zahir Tari - Responsible Graph Neural Networks-CRC Press
100% (1)
2023 - Mohamed Abdel-Basset, Nour Moustafa, Hossam Hawash, Zahir Tari - Responsible Graph Neural Networks-CRC Press
324 pages
Spark For Python Developers - Sample Chapter
100% (6)
Spark For Python Developers - Sample Chapter
32 pages
Learning Spark Preview Ed
No ratings yet
Learning Spark Preview Ed
18 pages
Datawarehouse To Data Lakehouse
100% (1)
Datawarehouse To Data Lakehouse
48 pages
(Studies in Big Data) Mamta Mittal - Valentina E. Balas - Lalit Mohan Goyal - Raghvendra Kumar - Big Data Processing Using Spark in Cloud (2019, Springer) PDF
No ratings yet
(Studies in Big Data) Mamta Mittal - Valentina E. Balas - Lalit Mohan Goyal - Raghvendra Kumar - Big Data Processing Using Spark in Cloud (2019, Springer) PDF
274 pages
Korean Phrases
100% (2)
Korean Phrases
62 pages
Data Scientist Master Program v4
100% (1)
Data Scientist Master Program v4
28 pages
Introduction To Database Programming in Python
No ratings yet
Introduction To Database Programming in Python
26 pages
LearningSpark EXCERPT
50% (2)
LearningSpark EXCERPT
47 pages
Machine Learning For Data Science Handbook: Lior Rokach Oded Maimon Erez Shmueli Editors
100% (2)
Machine Learning For Data Science Handbook: Lior Rokach Oded Maimon Erez Shmueli Editors
975 pages
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
No ratings yet
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
3 pages
Data Science Solutions With Python Fast and Scalable Models Using
100% (1)
Data Science Solutions With Python Fast and Scalable Models Using
128 pages
Migrating Big Data Analytics
No ratings yet
Migrating Big Data Analytics
16 pages
Keras
50% (2)
Keras
2 pages
Packt - Hands On - Big.data - Analytics.with - Pyspark.2019
100% (1)
Packt - Hands On - Big.data - Analytics.with - Pyspark.2019
253 pages
R Deep Learning Essentials - Sample Chapter
100% (3)
R Deep Learning Essentials - Sample Chapter
24 pages
Practical Data Science
No ratings yet
Practical Data Science
121 pages
Krumrei-Mancuso2015 Humility Scale
No ratings yet
Krumrei-Mancuso2015 Humility Scale
14 pages
Spark Databricks Summary
80% (5)
Spark Databricks Summary
100 pages
Apache Spark Analytics Made Simple PDF
No ratings yet
Apache Spark Analytics Made Simple PDF
76 pages
POSNER, Richard - Law, Pragmatism and Democracy
100% (6)
POSNER, Richard - Law, Pragmatism and Democracy
412 pages
Full Stack Data Science
No ratings yet
Full Stack Data Science
54 pages
Claim Divine Your Dinner A Cookbook For Using Tarot As Your Guide To Magickal Meals Premium Ebook Download
No ratings yet
Claim Divine Your Dinner A Cookbook For Using Tarot As Your Guide To Magickal Meals Premium Ebook Download
16 pages
O Reilly Data Lake Bootcamp Day 11694182865124
No ratings yet
O Reilly Data Lake Bootcamp Day 11694182865124
46 pages
Second Year - English
No ratings yet
Second Year - English
128 pages
Deep Learning Fundamentals Materials
100% (1)
Deep Learning Fundamentals Materials
216 pages
Distributed Machine Learning With PySpark
100% (3)
Distributed Machine Learning With PySpark
830 pages
What Is Moxibustion Acupuncturedrcmt PDF
No ratings yet
What Is Moxibustion Acupuncturedrcmt PDF
3 pages
Data Engineer - Roadmap and FREE Resources - Paper 2021
No ratings yet
Data Engineer - Roadmap and FREE Resources - Paper 2021
7 pages
Ebooks File Professional Responsibility 5th Ed., Paperback Edition W. Bradley Wendel All Chapters
100% (14)
Ebooks File Professional Responsibility 5th Ed., Paperback Edition W. Bradley Wendel All Chapters
85 pages
As 1789
No ratings yet
As 1789
2 pages
Databricks Guide
No ratings yet
Databricks Guide
27 pages
Mastering SQL Window Functions - 01
No ratings yet
Mastering SQL Window Functions - 01
39 pages
IBM BIgData Spark
100% (1)
IBM BIgData Spark
80 pages
CB Queryoptimization 01
No ratings yet
CB Queryoptimization 01
78 pages
Python For DA
100% (2)
Python For DA
47 pages
Top 50 Analytics Projects 1691221401
100% (1)
Top 50 Analytics Projects 1691221401
52 pages
Py Spark
No ratings yet
Py Spark
427 pages
Time Series Algorithms Recipes: Implement Machine Learning and Deep Learning Techniques With Python
No ratings yet
Time Series Algorithms Recipes: Implement Machine Learning and Deep Learning Techniques With Python
189 pages
Personality Theory and Research 13th Edition by Daniel Cervone Daniel Cervone PDF Download
100% (1)
Personality Theory and Research 13th Edition by Daniel Cervone Daniel Cervone PDF Download
17 pages
DL 650 Am 3
No ratings yet
DL 650 Am 3
108 pages
CNet Training Brochure
No ratings yet
CNet Training Brochure
52 pages
Sample Outline Azure Machine Learning Engineering
No ratings yet
Sample Outline Azure Machine Learning Engineering
17 pages
STATS 225: Bayesian Analysis Lecture 1: Introduction: Babak Shahbaba
No ratings yet
STATS 225: Bayesian Analysis Lecture 1: Introduction: Babak Shahbaba
49 pages
Matrixcookbook PDF
No ratings yet
Matrixcookbook PDF
72 pages
An Exegetical Study of Song of Songs 4
No ratings yet
An Exegetical Study of Song of Songs 4
12 pages
Learn How Databricks Streamlines The Data Management Lifecycle
No ratings yet
Learn How Databricks Streamlines The Data Management Lifecycle
20 pages
CMoS s5 Phy Chem Calculations Seminar 01?
100% (1)
CMoS s5 Phy Chem Calculations Seminar 01?
3 pages
Data Science Links
No ratings yet
Data Science Links
1 page
Sold To A Ruthless Mafia Boss Outline
No ratings yet
Sold To A Ruthless Mafia Boss Outline
4 pages
Data Science Learning Path For 50 Days
No ratings yet
Data Science Learning Path For 50 Days
15 pages
ES 24 - Group 9 NOISE POLLUTION
No ratings yet
ES 24 - Group 9 NOISE POLLUTION
34 pages
An Overview of Principles of Odor Production, Emission, and Control
100% (1)
An Overview of Principles of Odor Production, Emission, and Control
21 pages
Assessment of Existing Steel Structures - Reccomendations For Estimation of Exisitng Fatigue Life
No ratings yet
Assessment of Existing Steel Structures - Reccomendations For Estimation of Exisitng Fatigue Life
109 pages
A Concise Introduction To Models and Methods For Automated Planning
No ratings yet
A Concise Introduction To Models and Methods For Automated Planning
143 pages
New Ebook Guide To AI Data Science
No ratings yet
New Ebook Guide To AI Data Science
50 pages
Giao An Tieng Anh 11 Hay Nhin La Muon Tai
No ratings yet
Giao An Tieng Anh 11 Hay Nhin La Muon Tai
315 pages
Spark Tutorial
No ratings yet
Spark Tutorial
8 pages
INterview & CV Lect
No ratings yet
INterview & CV Lect
31 pages
Hadoop and Mapreduce Cheat Sheet
No ratings yet
Hadoop and Mapreduce Cheat Sheet
1 page
Cublas Library
No ratings yet
Cublas Library
146 pages
WWW - Aka.ms/pathways: Getting Started Doing More With Power BI Role Based Certification Additional Study
No ratings yet
WWW - Aka.ms/pathways: Getting Started Doing More With Power BI Role Based Certification Additional Study
1 page
1 - Student Materials - Anchor Phenomenon Launch - Performance Task - V4
No ratings yet
1 - Student Materials - Anchor Phenomenon Launch - Performance Task - V4
18 pages
Simplifying Data Engineering Databricks
100% (1)
Simplifying Data Engineering Databricks
20 pages
Introduction To Spark For Data Engineers / Data Scientists
100% (3)
Introduction To Spark For Data Engineers / Data Scientists
100 pages
Ibps Po Prelims - 25 (18-10-2024) - Rank List
No ratings yet
Ibps Po Prelims - 25 (18-10-2024) - Rank List
2 pages
Conditionals Random Pages Sample2 PDF
No ratings yet
Conditionals Random Pages Sample2 PDF
22 pages
CUDA Binary Utilities
No ratings yet
CUDA Binary Utilities
32 pages
Research Project On SHG
No ratings yet
Research Project On SHG
8 pages
Blades in The Deck Playerkit
No ratings yet
Blades in The Deck Playerkit
18 pages
Principled Design of The Modern Web Architecture
No ratings yet
Principled Design of The Modern Web Architecture
36 pages
Storytelling Rubric Summative
No ratings yet
Storytelling Rubric Summative
2 pages
Pittston Sunday Dispatch 4-13-2011
No ratings yet
Pittston Sunday Dispatch 4-13-2011
70 pages
VW - BV (Series 50)
No ratings yet
VW - BV (Series 50)
2 pages
Project 12C
No ratings yet
Project 12C
2 pages
Arab Cultural Clothing - Google Search 2
No ratings yet
Arab Cultural Clothing - Google Search 2
1 page
Answer Sheet - Phil Hist 2018
No ratings yet
Answer Sheet - Phil Hist 2018
1 page
Ultimate Salesforce Data Cloud for Customer Experience: Explore, Implement and Elevate B2C Experiences Through Customer Data Innovations Using Salesforce Data Cloud
From Everand
Ultimate Salesforce Data Cloud for Customer Experience: Explore, Implement and Elevate B2C Experiences Through Customer Data Innovations Using Salesforce Data Cloud
Gourab Mukherjee
No ratings yet
Real-Time Streaming with Apache Kafka, Spark, and Storm: Create Platforms That Can Quickly Crunch Data and Deliver Real-Time Analytics to Users
From Everand
Real-Time Streaming with Apache Kafka, Spark, and Storm: Create Platforms That Can Quickly Crunch Data and Deliver Real-Time Analytics to Users
Brindha Priyadarshini Jeyaraman
No ratings yet
Apache Spark Graph Processing
From Everand
Apache Spark Graph Processing
Ramamonjison Rindra
No ratings yet
Fast Data Processing with Spark 2 - Third Edition
From Everand
Fast Data Processing with Spark 2 - Third Edition
Krishna Sankar
No ratings yet
Apache Spark 2.x Cookbook
From Everand
Apache Spark 2.x Cookbook
Rishi Yadav
No ratings yet
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
From Everand
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
CertSquad Professional Trainers
No ratings yet
Mastering Hibernate
From Everand
Mastering Hibernate
Ramin Rad
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Heroku Cloud Application Development
From Everand
Heroku Cloud Application Development
Anubhav Hanjura
No ratings yet

Learning Spark

Uploaded by

Learning Spark

Uploaded by

Learning

Spark reading notes

Learning Spark: Lightning-Fast Big Data Analysis

《Learning Spark: Lightning-Fast Big Data Analysis》的中文读书笔记纯属个人对于Spark的兴趣，仅供学习。 读书笔记分

About the Author

Examples for Learning Spark

chapter 02 reading note

chapter 02 reading note 4

You might also like

《Learning Spark: Lightning-Fast Big Data Analysis》的中文读书笔记纯属个人对于Spark的兴趣，仅供学习。读书笔记分