0% found this document useful (0 votes)

96 views12 pages

Open Spark Shell

Spark Core provides distributed task dispatching and scheduling capabilities. It uses RDDs (Resilient Distributed Datasets), which are logical collections of data partitioned across machines, as its primary data abstraction. RDDs can be created from external storage systems or by transforming existing RDDs. The Spark shell provides an interactive way to analyze data using RDD transformations and actions. Common transformations include map, filter, and join; common actions include count, collect, and save. Word count is a common example used to illustrate RDD programming.

Uploaded by

RamyaKrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views12 pages

Open Spark Shell

Uploaded by

RamyaKrishnan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Spark Core is the base of the whole project.

It provides distributed task

dispatching, scheduling, and basic I/O functionalities. Spark uses a
specialized fundamental data structure known as RDD (Resilient Distributed
Datasets) that is a logical collection of data partitioned across machines.
RDDs can be created in two ways; one is by referencing datasets in external
storage systems and second is by applying transformations (e.g. map, filter,
reducer, join) on existing RDDs.

The RDD abstraction is exposed through a language-integrated API. This

simplifies programming complexity because the way applications manipulate
RDDs is similar to manipulating local collections of data.

Spark Shell
Spark provides an interactive shell − a powerful tool to analyze data
interactively. It is available in either Scala or Python language. Spark’s
primary abstraction is a distributed collection of items called a Resilient
Distributed Dataset (RDD). RDDs can be created from Hadoop Input Formats
(such as HDFS files) or by transforming other RDDs.

Open Spark Shell

The following command is used to open Spark shell.
$ spark-shell

Create simple RDD

Let us create a simple RDD from the text file. Use the following command to
create a simple RDD.

scala> val inputfile = sc.textFile(“input.txt”)

The output for the above command is

inputfile: org.apache.spark.rdd.RDD[String] = input.txt MappedRDD[1] at textFile at
<console>:12

The Spark RDD API introduces few Transformations and few Actions to
manipulate RDD.

RDD Transformations
RDD transformations returns pointer to new RDD and allows you to create
dependencies between RDDs. Each RDD in dependency chain (String of
Dependencies) has a function for calculating its data and has a pointer
(dependency) to its parent RDD.

Spark is lazy, so nothing will be executed unless you call some transformation
or action that will trigger job creation and execution. Look at the following
snippet of the word-count example.

Therefore, RDD transformation is not a set of data but is a step in a program

(might be the only step) telling Spark how to get data and what to do with it.

Given below is a list of RDD transformations.

S.No Transformations & Meaning

1 map(func)

Returns a new distributed dataset, formed by passing each element of the

source through a function func.

2 filter(func)

Returns a new dataset formed by selecting those elements of the source on

which func returns true.

3 flatMap(func)

Similar to map, but each input item can be mapped to 0 or more output
items (so func should return a Seq rather than a single item).

4 mapPartitions(func)

Similar to map, but runs separately on each partition (block) of the RDD,
so func must be of type Iterator<T> ⇒ Iterator<U> when running on an
RDD of type T.

5 mapPartitionsWithIndex(func)
Similar to map Partitions, but also provides func with an integer value
representing the index of the partition, so func must be of type (Int,
Iterator<T>) ⇒ Iterator<U> when running on an RDD of type T.

6 sample(withReplacement, fraction, seed)

Sample a fraction of the data, with or without replacement, using a given

random number generator seed.

7 union(otherDataset)

Returns a new dataset that contains the union of the elements in the source
dataset and the argument.

8 intersection(otherDataset)

Returns a new RDD that contains the intersection of elements in the source
dataset and the argument.

9 distinct([numTasks])

Returns a new dataset that contains the distinct elements of the source
dataset.

10 groupByKey([numTasks])

When called on a dataset of (K, V) pairs, returns a dataset of (K,

Iterable<V>) pairs.

Note − If you are grouping in order to perform an aggregation (such as a

sum or average) over each key, using reduceByKey or aggregateByKey will
yield much better performance.

11 reduceByKey(func, [numTasks])

When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs

where the values for each key are aggregated using the given reduce
function func, which must be of type (V, V) ⇒ V. Like in groupByKey, the
number of reduce tasks is configurable through an optional second
argument.

12 aggregateByKey(zeroValue)(seqOp, combOp, [numTasks])

When called on a dataset of (K, V) pairs, returns a dataset of (K, U) pairs

where the values for each key are aggregated using the given combine
functions and a neutral "zero" value. Allows an aggregated value type that
is different from the input value type, while avoiding unnecessary
allocations. Like in groupByKey, the number of reduce tasks is configurable
through an optional second argument.

13 sortByKey([ascending], [numTasks])

When called on a dataset of (K, V) pairs where K implements Ordered,

returns a dataset of (K, V) pairs sorted by keys in ascending or descending
order, as specified in the Boolean ascending argument.

14 join(otherDataset, [numTasks])

When called on datasets of type (K, V) and (K, W), returns a dataset of (K,
(V, W)) pairs with all pairs of elements for each key. Outer joins are
supported through leftOuterJoin, rightOuterJoin, and fullOuterJoin.

15 cogroup(otherDataset, [numTasks])

When called on datasets of type (K, V) and (K, W), returns a dataset of (K,
(Iterable<V>, Iterable<W>)) tuples. This operation is also called group
With.

16 cartesian(otherDataset)

When called on datasets of types T and U, returns a dataset of (T, U) pairs

(all pairs of elements).

17 pipe(command, [envVars])
Pipe each partition of the RDD through a shell command, e.g. a Perl or bash
script. RDD elements are written to the process's stdin and lines output to
its stdout are returned as an RDD of strings.

18 coalesce(numPartitions)

Decrease the number of partitions in the RDD to numPartitions. Useful for

running operations more efficiently after filtering down a large dataset.

19 repartition(numPartitions)

Reshuffle the data in the RDD randomly to create either more or fewer
partitions and balance it across them. This always shuffles all data over the
network.

20 repartitionAndSortWithinPartitions(partitioner)

Repartition the RDD according to the given partitioner and, within each
resulting partition, sort records by their keys. This is more efficient than
calling repartition and then sorting within each partition because it can push
the sorting down into the shuffle machinery.

Actions
The following table gives a list of Actions, which return values.

S.No Action & Meaning

1 reduce(func)

Aggregate the elements of the dataset using a function func (which takes
two arguments and returns one). The function should be commutative and
associative so that it can be computed correctly in parallel.

2 collect()
Returns all the elements of the dataset as an array at the driver program.
This is usually useful after a filter or other operation that returns a
sufficiently small subset of the data.

3 count()

Returns the number of elements in the dataset.

4 first()

Returns the first element of the dataset (similar to take (1)).

5 take(n)

Returns an array with the first n elements of the dataset.

6 takeSample (withReplacement,num, [seed])

Returns an array with a random sample of num elements of the dataset,

with or without replacement, optionally pre-specifying a random number
generator seed.

7 takeOrdered(n, [ordering])

Returns the first n elements of the RDD using either their natural order or
a custom comparator.

8 saveAsTextFile(path)

Writes the elements of the dataset as a text file (or set of text files) in a
given directory in the local filesystem, HDFS or any other Hadoop-supported
file system. Spark calls toString on each element to convert it to a line of
text in the file.

9 saveAsSequenceFile(path) (Java and Scala)

Writes the elements of the dataset as a Hadoop SequenceFile in a given

path in the local filesystem, HDFS or any other Hadoop-supported file
system. This is available on RDDs of key-value pairs that implement
Hadoop's Writable interface. In Scala, it is also available on types that are
implicitly convertible to Writable (Spark includes conversions for basic types
like Int, Double, String, etc).

10 saveAsObjectFile(path) (Java and Scala)

Writes the elements of the dataset in a simple format using Java

serialization, which can then be loaded using SparkContext.objectFile().

11 countByKey()

Only available on RDDs of type (K, V). Returns a hashmap of (K, Int) pairs
with the count of each key.

12 foreach(func)

Runs a function func on each element of the dataset. This is usually, done
for side effects such as updating an Accumulator or interacting with external
storage systems.

Note − modifying variables other than Accumulators outside of the

foreach() may result in undefined behavior. See Understanding closures for
more details.

Programming with RDD

Let us see the implementations of few RDD transformations and actions in
RDD programming with the help of an example.

Example
Consider a word count example − It counts each word appearing in a
document. Consider the following text as an input and is saved as
an input.txt file in a home directory.

input.txt − input file.

people are not as beautiful as they look,
as they walk or as they talk.
they are only as beautiful as they love,
as they care as they share.
Follow the procedure given below to execute the given example.

Open Spark-Shell
The following command is used to open spark shell. Generally, spark is built
using Scala. Therefore, a Spark program runs on Scala environment.
$ spark-shell

If Spark shell opens successfully then you will find the following output. Look
at the last line of the output “Spark context available as sc” means the Spark
container is automatically created spark context object with the name sc.
Before starting the first step of a program, the SparkContext object should
be created.
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/06/04 15:25:22 INFO SecurityManager: Changing view acls to: hadoop
15/06/04 15:25:22 INFO SecurityManager: Changing modify acls to: hadoop
15/06/04 15:25:22 INFO SecurityManager: SecurityManager: authentication disabled;
ui acls disabled; users with view permissions: Set(hadoop); users with modify permissions:
Set(hadoop)
15/06/04 15:25:22 INFO HttpServer: Starting HTTP Server
15/06/04 15:25:23 INFO Utils: Successfully started service 'HTTP class server' on port 43292.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.0
/_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_71)
Type in expressions to have them evaluated.
Spark context available as sc
scala>

Create an RDD
First, we have to read the input file using Spark-Scala API and create an RDD.

The following command is used for reading a file from given location. Here,
new RDD is created with the name of inputfile. The String which is given as
an argument in the textFile(“”) method is absolute path for the input file
name. However, if only the file name is given, then it means that the input
file is in the current location.
scala> val inputfile = sc.textFile("input.txt")

Execute Word count Transformation

Our aim is to count the words in a file. Create a flat map for splitting each
line into words (flatMap(line ⇒ line.split(“ ”)).

Next, read each word as a key with a value ‘1’ (<key, value> =
<word,1>)using map function (map(word ⇒ (word, 1)).

Finally, reduce those keys by adding values of similar keys

(reduceByKey(_+_)).

The following command is used for executing word count logic. After
executing this, you will not find any output because this is not an action, this
is a transformation; pointing a new RDD or tell spark to what to do with the
given data)
scala> val counts = inputfile.flatMap(line => line.split(" ")).map(word => (word,
1)).reduceByKey(_+_);

Current RDD
While working with the RDD, if you want to know about current RDD, then
use the following command. It will show you the description about current
RDD and its dependencies for debugging.
scala> counts.toDebugString

Caching the Transformations

You can mark an RDD to be persisted using the persist() or cache() methods
on it. The first time it is computed in an action, it will be kept in memory on
the nodes. Use the following command to store the intermediate
transformations in memory.
scala> counts.cache()

Applying the Action

Applying an action, like store all the transformations, results into a text file.
The String argument for saveAsTextFile(“ ”) method is the absolute path of
output folder. Try the following command to save the output in a text file. In
the following example, ‘output’ folder is in current location.
scala> counts.saveAsTextFile("output")

Checking the Output

Open another terminal to go to home directory (where spark is executed in
the other terminal). Use the following commands for checking output
directory.
[hadoop@localhost ~]$ cd output/
[hadoop@localhost output]$ ls -1

part-00000
part-00001
_SUCCESS

The following command is used to see output from Part-00000 files.

[hadoop@localhost output]$ cat part-00000

Output
(people,1)
(are,2)
(not,1)
(as,8)
(beautiful,2)
(they, 7)
(look,1)

The following command is used to see output from Part-00001 files.

[hadoop@localhost output]$ cat part-00001

Output
(walk, 1)
(or, 1)
(talk, 1)
(only, 1)
(love, 1)
(care, 1)
(share, 1)

UN Persist the Storage

Before UN-persisting, if you want to see the storage space that is used for
this application, then use the following URL in your browser.
http://localhost:4040

You will see the following screen, which shows the storage space used for the
application, which are running on the Spark shell.
If you want to UN-persist the storage space of particular RDD, then use the
following command.
Scala> counts.unpersist()

You will see the output as follows −

15/06/27 00:57:33 INFO ShuffledRDD: Removing RDD 9 from persistence list
15/06/27 00:57:33 INFO BlockManager: Removing RDD 9
15/06/27 00:57:33 INFO BlockManager: Removing block rdd_9_1
15/06/27 00:57:33 INFO MemoryStore: Block rdd_9_1 of size 480 dropped from memory (free
280061810)
15/06/27 00:57:33 INFO BlockManager: Removing block rdd_9_0
15/06/27 00:57:33 INFO MemoryStore: Block rdd_9_0 of size 296 dropped from memory (free
280062106)
res7: cou.type = ShuffledRDD[9] at reduceByKey at <console>:14

For verifying the storage space in the browser, use the following URL.
http://localhost:4040/

You will see the following screen. It shows the storage space used for the
application, which are running on the Spark shell.

Write Scala Code To Parallelize A Simple Collection (E.g., An Array or List) Into An RDD in Spark
No ratings yet
Write Scala Code To Parallelize A Simple Collection (E.g., An Array or List) Into An RDD in Spark
48 pages
Spark RDD
No ratings yet
Spark RDD
60 pages
Chapter 7 Spark Computing Engine
No ratings yet
Chapter 7 Spark Computing Engine
42 pages
A204080739 - 28953 - 20 - 2025 - Unit 3 Introduction To RDD
No ratings yet
A204080739 - 28953 - 20 - 2025 - Unit 3 Introduction To RDD
51 pages
Class 06 IntroToSpark
No ratings yet
Class 06 IntroToSpark
51 pages
Lec28 - RDD
No ratings yet
Lec28 - RDD
56 pages
Introduction To Structured Query Language (SQL) : Ms. Kielly Chrizza Mae T. Lara
No ratings yet
Introduction To Structured Query Language (SQL) : Ms. Kielly Chrizza Mae T. Lara
22 pages
Curling v. Raffensperger Transcript, Volume 9
No ratings yet
Curling v. Raffensperger Transcript, Volume 9
274 pages
Lecture 10 - Spark
No ratings yet
Lecture 10 - Spark
87 pages
Pyspark DataEngineering Power Guide
No ratings yet
Pyspark DataEngineering Power Guide
73 pages
Slide 8 Spark Shell Tutorial
No ratings yet
Slide 8 Spark Shell Tutorial
61 pages
Spark
No ratings yet
Spark
51 pages
BDA Lect5 Apache Spark 2023
No ratings yet
BDA Lect5 Apache Spark 2023
115 pages
L7A - Spark RDD With Scala
No ratings yet
L7A - Spark RDD With Scala
21 pages
BDT Unit 3
No ratings yet
BDT Unit 3
105 pages
Lab 04 Spark APIs
No ratings yet
Lab 04 Spark APIs
20 pages
3 - Spark
No ratings yet
3 - Spark
51 pages
15 PDFsam Apache Spark Tutorial
No ratings yet
15 PDFsam Apache Spark Tutorial
7 pages
Flautista de Hemelin
No ratings yet
Flautista de Hemelin
10 pages
Exploiting Honeypot For Cryptojacking The Other Side of The Story of Honeypot Deployment
No ratings yet
Exploiting Honeypot For Cryptojacking The Other Side of The Story of Honeypot Deployment
5 pages
Lecture 19-RDD in Spark
No ratings yet
Lecture 19-RDD in Spark
12 pages
Basics of RDD
No ratings yet
Basics of RDD
84 pages
Writing Spark Application
No ratings yet
Writing Spark Application
37 pages
React Developer Resume
100% (1)
React Developer Resume
9 pages
SPARK Architecture
No ratings yet
SPARK Architecture
22 pages
Apache Spark
No ratings yet
Apache Spark
31 pages
Unit-V Spark
No ratings yet
Unit-V Spark
69 pages
SPARK
No ratings yet
SPARK
35 pages
The Complete Guide To Event-Driven Architecture - by Seetharamugn - Medium
No ratings yet
The Complete Guide To Event-Driven Architecture - by Seetharamugn - Medium
11 pages
OfficeServ 7400 Call Server Programming Guide
No ratings yet
OfficeServ 7400 Call Server Programming Guide
617 pages
Lambda DG
No ratings yet
Lambda DG
553 pages
2335 m8 Demo1 v1 0h2 Cq188do
No ratings yet
2335 m8 Demo1 v1 0h2 Cq188do
9 pages
DLD Lab 5
No ratings yet
DLD Lab 5
9 pages
Pyspark
No ratings yet
Pyspark
31 pages
Indoor Intelligence: Product Suite "Industrial Apps"
No ratings yet
Indoor Intelligence: Product Suite "Industrial Apps"
38 pages
PF 7120enrm
No ratings yet
PF 7120enrm
4 pages
BDA Unit III
No ratings yet
BDA Unit III
19 pages
Spark
No ratings yet
Spark
160 pages
4.1. Spark Basics
No ratings yet
4.1. Spark Basics
28 pages
C5-SPARK Technology
No ratings yet
C5-SPARK Technology
39 pages
GDM December 1999
No ratings yet
GDM December 1999
43 pages
Spark Slides
No ratings yet
Spark Slides
23 pages
Knowledge Representation: Facts: Representations of Facts in Some Chosen Formalism
No ratings yet
Knowledge Representation: Facts: Representations of Facts in Some Chosen Formalism
12 pages
Chapter 6
No ratings yet
Chapter 6
26 pages
External Video-En
No ratings yet
External Video-En
2 pages
Learning Spark Programming Basics: Introduction To Rdds
No ratings yet
Learning Spark Programming Basics: Introduction To Rdds
70 pages
Managing Security
No ratings yet
Managing Security
30 pages
Spark Transformations and Actions
No ratings yet
Spark Transformations and Actions
4 pages
Spark
No ratings yet
Spark
96 pages
Unit-5 Spark
No ratings yet
Unit-5 Spark
24 pages
Spark Cheatsheet - BEPEC
No ratings yet
Spark Cheatsheet - BEPEC
1 page
Big Data Analysis With Scala and Spark: Heather Miller
No ratings yet
Big Data Analysis With Scala and Spark: Heather Miller
17 pages
Resilient Distributed Datasets
No ratings yet
Resilient Distributed Datasets
40 pages
Big Data - Spark
100% (1)
Big Data - Spark
72 pages
Introduction To Big Data With Apache Spark: Uc Berkeley
No ratings yet
Introduction To Big Data With Apache Spark: Uc Berkeley
43 pages
Spark RDD
No ratings yet
Spark RDD
4 pages
Apache Spark With Java
No ratings yet
Apache Spark With Java
209 pages
Big Data Computing Spark Basics and RDD: Ke Yi
No ratings yet
Big Data Computing Spark Basics and RDD: Ke Yi
43 pages
AnsysEMInstallGuide Linux PDF
No ratings yet
AnsysEMInstallGuide Linux PDF
64 pages
PHD Thesis Topics in Data Mining
100% (2)
PHD Thesis Topics in Data Mining
5 pages
HDP Developer Apache Pig and Hive
No ratings yet
HDP Developer Apache Pig and Hive
42 pages
MOAC Word 2016 Expert
100% (1)
MOAC Word 2016 Expert
166 pages
A) Explain The Different Types of RAM and ROM
No ratings yet
A) Explain The Different Types of RAM and ROM
2 pages
Apache Spark Tutorials
No ratings yet
Apache Spark Tutorials
9 pages
Spark
No ratings yet
Spark
13 pages
Self Test Questions
No ratings yet
Self Test Questions
46 pages
Avigilon Control Center Software Flyer
No ratings yet
Avigilon Control Center Software Flyer
2 pages
Fedora Operating System
100% (1)
Fedora Operating System
11 pages
Transformations and Actions: A Visual Guide of The API
No ratings yet
Transformations and Actions: A Visual Guide of The API
122 pages
XMLP DevelopmentGuide
No ratings yet
XMLP DevelopmentGuide
26 pages
Ravi Pyspark RDD Tutorial 1665758938
No ratings yet
Ravi Pyspark RDD Tutorial 1665758938
20 pages
Spark - RDD CS DESIGN
No ratings yet
Spark - RDD CS DESIGN
1 page
Apache Spark Python Slides
No ratings yet
Apache Spark Python Slides
186 pages
Article in Petromin 1 PDF
No ratings yet
Article in Petromin 1 PDF
6 pages
Ffmpeg Watch-Folder PDF
No ratings yet
Ffmpeg Watch-Folder PDF
2 pages
VMware Scenario Based
100% (1)
VMware Scenario Based
18 pages
Learning Python Design Patterns - Second Edition - Sample Chapter
No ratings yet
Learning Python Design Patterns - Second Edition - Sample Chapter
16 pages
MCQ Web Tech
No ratings yet
MCQ Web Tech
8 pages
Lisp Programming Language
From Everand
Lisp Programming Language
Faiz ul haque Zeya
No ratings yet
Distributed Database Systems: - Spark I
No ratings yet
Distributed Database Systems: - Spark I
59 pages
Username: Password:: Sign Up Help
No ratings yet
Username: Password:: Sign Up Help
4 pages
PySpark Transformations Tutorial
100% (1)
PySpark Transformations Tutorial
58 pages
GraphWorX64 Scripting - Local and Global Aliases
No ratings yet
GraphWorX64 Scripting - Local and Global Aliases
1 page
PySpark Cheat Sheet Spark in Python PDF
No ratings yet
PySpark Cheat Sheet Spark in Python PDF
1 page
4 - Action and RDD Transformations
No ratings yet
4 - Action and RDD Transformations
25 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
PySpark Cheat Sheet Python
No ratings yet
PySpark Cheat Sheet Python
1 page
PySpark RDD Basics PDF
No ratings yet
PySpark RDD Basics PDF
1 page

Open Spark Shell

Uploaded by

Open Spark Shell

Uploaded by

Spark Core is the base of the whole project.

It provides distributed task

The RDD abstraction is exposed through a language-integrated API. This

Open Spark Shell

Create simple RDD

scala> val inputfile = sc.textFile(“input.txt”)

The output for the above command is

Therefore, RDD transformation is not a set of data but is a step in a program

Given below is a list of RDD transformations.

S.No Transformations & Meaning

Returns a new distributed dataset, formed by passing each element of the

Returns a new dataset formed by selecting those elements of the source on

6 sample(withReplacement, fraction, seed)

Sample a fraction of the data, with or without replacement, using a given

When called on a dataset of (K, V) pairs, returns a dataset of (K,

Note − If you are grouping in order to perform an aggregation (such as a

When called on a dataset of (K, V) pairs, returns a dataset of (K, V) pairs

12 aggregateByKey(zeroValue)(seqOp, combOp, [numTasks])

When called on a dataset of (K, V) pairs, returns a dataset of (K, U) pairs

When called on a dataset of (K, V) pairs where K implements Ordered,

When called on datasets of types T and U, returns a dataset of (T, U) pairs

Decrease the number of partitions in the RDD to numPartitions. Useful for

S.No Action & Meaning

Returns the number of elements in the dataset.

Returns the first element of the dataset (similar to take (1)).

Returns an array with the first n elements of the dataset.

6 takeSample (withReplacement,num, [seed])

Returns an array with a random sample of num elements of the dataset,

9 saveAsSequenceFile(path) (Java and Scala)

Writes the elements of the dataset as a Hadoop SequenceFile in a given

10 saveAsObjectFile(path) (Java and Scala)

Writes the elements of the dataset in a simple format using Java

Note − modifying variables other than Accumulators outside of the

Programming with RDD

input.txt − input file.

Execute Word count Transformation

Finally, reduce those keys by adding values of similar keys

Caching the Transformations

Applying the Action

Checking the Output

The following command is used to see output from Part-00000 files.

The following command is used to see output from Part-00001 files.

UN Persist the Storage

You will see the output as follows −

You might also like