L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
people.json
{"name":"Alice", "pcode":"94304"}
{"name":"Brayden", "age":30, "pcode":"94304"}
{"name":"Carla", "age":19, "pcode":"10036"}
{"name":"Diana", "age":46}
{"name":"Etienne", "pcode":"94104"}
• Other methods:
– Distinct: returns a new DataFrame with distinct
elements of this DataFrame
– join: joins this DataFrame with a second
DataFrame
• In memory
• Partitioned
• Typed
• Lazy Evaluation
• Immutable
• Parallel
• Cacheable