EDA Python for Data Analsis
EDA Python for Data Analsis
Data Loading
df = spark.read.parquet('filename.parquet')
df=spark.read.format("jdbc").options(url="jdbc_url",dbtable="table_name").lo
ad()
2. show data
3. Data Cleaning
6. Statistical Analysis
7. Data Visualization
• ……………………
• Repartitioning: df.repartition(10)
13. Joins
• Loading ML Model:
df.write.format("jdbc").options(url="jdbc_url", dbtable="table_name").save()
df = spark.read.format('binaryFile').load('path_to_binary_file')
• Using ML Pipelines:
df = spark.read.format('kafka').option('kafka.bootstrap.servers',
'host1:port1').load()
• Hypothesis Testing:
• Configuring SparkSession:
spark=SparkSession.builder.appName('app').config('spark.some.config.optio
n', 'value').getOrCreate()