0% found this document useful (0 votes)
9 views1 page

Multiclass Classification On IRIS

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views1 page

Multiclass Classification On IRIS

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Multi-Label Classification on IRIS Dataset Using PySpark

1. Import Necessary Libraries


2. Create a SparkSession
3. Load the IRIS Dataset
4. Preprocess the Data
o Handle Categorical Features (if applicable)
o Assemble Features
5. Create a Multi-Label Dataset
6. Split the Data
7. Create and Train the Model
8. Evaluate the Model

from pyspark.sql import SparkSession


from pyspark.ml.feature import VectorAssembler, StringIndexer
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import MulticlassClassificationEvaluator

spark = SparkSession.builder.appName("IRIS_MultiClass").getOrCreate()

iris_df = spark.read.csv("iris.csv", header=True, inferSchema=True)

# Handle categorical features (if applicable)


indexer = StringIndexer(inputCol="species", outputCol="label").fit(iris_df)
iris_df = indexer.transform(iris_df)

# Assemble features
assembler = VectorAssembler(inputCols=["sepal_length", "sepal_width", "petal_length",
"petal_width"], outputCol="features")
iris_df = assembler.transform(iris_df)

# Split the data


train_df, test_df = iris_df.randomSplit([0.8, 0.2], seed=42)

# Create and train the model


rf = RandomForestClassifier(labelCol="label", featuresCol="features", numTrees=10)
model = rf.fit(train_df)

# Evaluate the model


predictions = model.transform(test_df)
evaluator = MulticlassClassificationEvaluator(metricName="accuracy", labelCol="label",
predictionCol="prediction")
accuracy = evaluator.evaluate(predictions)
print("Accuracy:", accuracy)

You might also like