0% found this document useful (0 votes)
7 views2 pages

Apache Spark

The document is a Scala program that utilizes Apache Spark to perform various DataFrame operations. It creates a SparkSession, initializes a DataFrame with student names and marks, and demonstrates filtering, grouping, aggregation, and sorting of the data. Additionally, it calculates the average and standard deviation of the marks before stopping the SparkSession.

Uploaded by

mayurgbari52076
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views2 pages

Apache Spark

The document is a Scala program that utilizes Apache Spark to perform various DataFrame operations. It creates a SparkSession, initializes a DataFrame with student names and marks, and demonstrates filtering, grouping, aggregation, and sorting of the data. Additionally, it calculates the average and standard deviation of the marks before stopping the SparkSession.

Uploaded by

mayurgbari52076
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

import org.apache.spark.sql.

{SparkSession, DataFrame}

import org.apache.spark.sql.functions._

//if u want to create session in ide

object UniqueData{

def main(args: Array[String]): Unit = {

// Create a SparkSession

val spark = SparkSession.builder()

.appName("More DataFrame Operations")

.master("local[*]") // Run locally using all available cores

.getOrCreate()

val numbersDF = Seq(

(1, "Alex", 50.0),

(2, "Bob", 20.0),

(3, "Cara", 30.0),

(4, "Devin", 45.0),

(5, "Euro", 75.0)

).toDF("Sr", "Name", "Marks")

// Display DataFrame

println("Original DataFrame:")

numbersDF.show()

// Filter numbers greater than 2

val filteredDF = numbersDF.filter($"Marks" > 20)

println("Filtered DataFrame:")

filteredDF.show()

// Group by letter and calculate sum of numbers for each group

val sumDF = numbersDF.groupBy("Name").agg(sum(“Marks").alias("total_number"),


sum("Sr").alias("total_value"))

println("Sum of numbers and values for each letter:")


sumDF.show()

// Calculate average value

val avgValue = numbersDF.agg(avg("Marks")).first().getDouble(0)

println("Average value: " + avgValue)

// Calculate standard deviation of value

val stdDevValue = numbersDF.agg(stddev("Marks")).first().getDouble(0)

println("Standard deviation of value: " + stdDevValue)

// Sort DataFrame by number in descending order

val sortedDF = numbersDF.sort($"Sr".desc)

println("Sorted DataFrame:")

sortedDF.show()

// Stop the SparkSession

spark.stop()

You might also like