0% found this document useful (0 votes)
201 views23 pages

Install Spark On Windows 10-MacOS

This document provides steps to install Spark on Windows 10 or macOS. For Windows, it involves downloading and installing 7Zip, Java, Spark, and Hadoop. It then provides steps to configure environment variables and test the Spark installation. For macOS, it recommends using Homebrew to install Java, Spark, and other dependencies and then provides commands to run Spark-shell and copy sample datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
201 views23 pages

Install Spark On Windows 10-MacOS

This document provides steps to install Spark on Windows 10 or macOS. For Windows, it involves downloading and installing 7Zip, Java, Spark, and Hadoop. It then provides steps to configure environment variables and test the Spark installation. For macOS, it recommends using Homebrew to install Java, Spark, and other dependencies and then provides commands to run Spark-shell and copy sample datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Install Spark on Windows 10

or macOS
Kazi Aminul Islam
Department of Computer Science
Kennesaw State University
Acknowledgment:
Dr. Dan Lo
Install Spark on Windows 10
Steps
• Download and install 7Zip if not in your computer
• Download and install JVM if not in your computer
• Download and install Spark
• Download and install Hadoop
• Configure environment variables
• Grant permission to temp folder
• Test it
Download and Install 7Zip (latest version)

https://www.7-zip.org/download.html
Download and Install JVM v1.8.0_221-b11
• Check the Java version your have: java –version
• Other Java versions may not work!
Download and Install Spark
http://spark.apache.org/downloads.html

• You want to select a Spark


release: 3.0.1 (Sep 02 2020)
• Package type: Pre-built for
Apache Hadoop 2.7
• Download the file
spark-3.0.1-bin-hadoop2.7.tgz
C:\spark
• Create a folder c:\spark
• Unzip the spark tarball and copy everything over c:\spark
• This will ease maintenance later. For example, you may want to try
different versions in the future. Just simply overwrite the folder with
the new release.
Copy over c:\spark
Download and install Hadoop
• Go to https://github.com/steveloughran/winutils
• Click on the green button labeled “Code” and download ZIP
It is easier to clone everything to you local PC.
• Unzip the tarball
• Create a folder c:\hadoop
• Copy everything under winutils-master\Hadoop-2.7.1\*.* over
c:\hadoop
Copy Hadoop 2.7.1 over C:\hadoop
Configure Environment Variables
• From windows logo=>search to launch: "Search advanced system
settings" program-> click the button of "Environment Variables“
• JAVA_HOME=C:\Program Files\Java\jre1.8.0_221
• SPARK_HOME=C:\spark
• HADOOP_HOME=C:\hadoop
• Append %SPARK_HOME%\bin into "Path"
Grant permission to temp folder
• Create a temp folder c:\tmp\hive
• Change mod to 777 by
>winutils.exe chmod 777 c:\tmp\hive
Test it
• Run spark-shell under command prompt
• Run pyspark under command prompt
• Run spark-submit <app_name>
Spark-shell
pyspark
Run Hello World in Scala
• Change director to c:\spark
Install Spark on macOS
• Open a terminal (running bash)
• Use Homebrew, a free and open-source software package
management system that simplifies the installation of software on
Apple's macOS operating system and Linux.
Steps
1. /bin/bash -c "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/ins
tall/master/install.sh)“
2. brew install java
3. brew install apache-spark
4. mv spark-3.0.1-bin-hadoop2.7 /usr/local/spark
5. spark-shell
6. Copy sample datasets
Install Homebrew
(base) ltksup39868mac:~ dlo2$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Warning: The Ruby Homebrew installer is now deprecated and has been rewritten in
Bash. Please migrate to the following command:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"

Password:
==> This script will install:
/usr/local/bin/brew
/usr/local/share/doc/homebrew
/usr/local/share/man/man1/brew.1
/usr/local/share/zsh/site-functions/_brew
/usr/local/etc/bash_completion.d/brew
/usr/local/Homebrew
==> The following existing directories will be made group writable:
/usr/local/bin
/usr/local/include
/usr/local/lib
/usr/local/share
/usr/local/lib/pkgconfig
/usr/local/share/doc
==> The following existing directories will have their owner set to dlo2:
/usr/local/bin
/usr/local/include
/usr/local/lib
/usr/local/share
Install Java
(base) ltksup39868mac:~ dlo2$ brew cask install java
==> Tapping homebrew/cask
Cloning into '/usr/local/Homebrew/Library/Taps/homebrew/homebrew-cask'...
remote: Enumerating objects: 543760, done.
remote: Total 543760 (delta 0), reused 0 (delta 0), pack-reused 543760
Receiving objects: 100% (543760/543760), 238.43 MiB | 29.48 MiB/s, done.
Resolving deltas: 100% (383842/383842), done.
Tapped 3790 casks (3,911 files, 255.9MB).
Error: Calling brew cask install is disabled! Use brew install [--cask] instead.
(base) ltksup39868mac:~ dlo2$ brew install java
==> Downloading https://homebrew.bintray.com/bottles/openjdk-15.0.1.catalina.bot
==> Downloading from https://d29vzk4ow07wi7.cloudfront.net/9376a1c6fdf8b0268b6cb
######################################################################## 100.0%
==> Pouring openjdk-15.0.1.catalina.bottle.tar.gz
==> Caveats
For the system Java wrappers to find this JDK, symlink it with
sudo ln -sfn /usr/local/opt/openjdk/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk.jdk

openjdk is keg-only, which means it was not symlinked into /usr/local,


because it shadows the macOS `java` wrapper.

If you need to have openjdk first in your PATH run:


echo 'export PATH="/usr/local/opt/openjdk/bin:$PATH"' >> /Users/dlo2/.bash_profile
Install Scala
(base) ltksup39868mac:~ dlo2$ brew install scala
==> Downloading https://downloads.lightbend.com/scala/2.13.4/scala-2.13.4.tgz
######################################################################## 100.0%
==> Caveats
To use with IntelliJ, set the Scala home to:
/usr/local/opt/scala/idea
==> Summary
?? /usr/local/Cellar/scala/2.13.4: 42 files, 23.4MB, built in 2 seconds
(base) ltksup39868mac:~ dlo2$ brew install apache-spark
==> Downloading https://homebrew.bintray.com/bottles/openjdk%4011-11.0.9.catalin
==> Downloading from https://d29vzk4ow07wi7.cloudfront.net/c640eade77c3ad69fef4d
######################################################################## 100.0%
==> Downloading https://www.apache.org/dyn/closer.lua?path=spark/spark-3.0.1/spa
==> Downloading from https://apache.osuosl.org/spark/spark-3.0.1/spark-3.0.1-bin
######################################################################## 100.0%
==> Installing dependencies for apache-spark: openjdk@11
==> Installing apache-spark dependency: openjdk@11
==> Pouring [email protected]
==> Caveats
For the system Java wrappers to find this JDK, symlink it with
sudo ln -sfn /usr/local/opt/openjdk@11/libexec/openjdk.jdk /Library/Java/JavaVirtualMachines/openjdk-11.jdk

openjdk@11 is keg-only, which means it was not symlinked into /usr/local,


Running Spark-Shell
(base) ltksup39868mac:~ dlo2$ spark-shell
21/01/13 14:29:29 WARN Utils: Your hostname, ltksup39868mac.local resolves to a loopback address: 127.0.0.1; using
192.168.1.67 instead (on interface en0)
21/01/13 14:29:29 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/usr/local/Cellar/apache-
spark/3.0.1/libexec/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/01/13 14:29:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes
where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://ltksup39868mac.attlocal.net:4040
Spark context available as 'sc' (master = local[*], app id = local-1610566175650).
Spark session available as 'spark'.
Welcome to

/ / //
_\ \/ _ \/ _ `/ / '_/
/ / . /\_,_/_/ /_/\_\ version 3.0.1
Copy Sample Dataset
(base) ltksup39868mac:Downloads dlo2$ ls spark*
spark-3.0.1-bin-hadoop2.7.tgz

spark-3.0.1-bin-hadoop2.7:
LICENSE README.md conf jars python
NOTICE RELEASE data kubernetes sbin
R bin examples licenses yarn
(base) ltksup39868mac:Downloads dlo2$ mv spark-3.0.1-bin-hadoop2.7 /usr/local/spark
mv: rename spark-3.0.1-bin-hadoop2.7 to /usr/local/spark: Permission denied
(base) ltksup39868mac:Downloads dlo2$ sudo mv spark-3.0.1-bin-hadoop2.7 /usr/local/spark
Password:
(base) ltksup39868mac:Downloads dlo2$ pwd
/Users/dlo2/Downloads
(base) ltksup39868mac:Downloads dlo2$ cd /usr/local
(base) ltksup39868mac:local dlo2$ ls
Caskroom bin jamf outset share
Cellar dockutil lib remotedesktop spark
Frameworks etc munki sal var
Homebrew include opt sbin
(base) ltksup39868mac:local dlo2$ cd spark/
(base) ltksup39868mac:spark dlo2$ ls
LICENSE README.md conf jars python

You might also like