Co2 Emission Project
Co2 Emission Project
ARTIFICIAL INTELLIGENCE
INVESTIGATORY PROJECT
PREDICTING CO2 EMISSIONS FROM CAR FEATURES
CLASS X
2024-2025
REPRESENTED By:
HARIKRISHNAN K
X-A
SAIRAM VIDYALAYA
SENIOR SECONDARY SCHOOL
AFFILIATED TO CBSE NO: 1930478
NO.5 MADIPAKKAM MAIN ROAD,
MADIPAKKAM, CHENNAI-600091
This is to certify that this Bonafide Project Work has been done by
________________________________________________________of
class X in SAIRAMVIDYALAYA Sr.Sec. School, Madipakkam,
Chennai- 91 during the year 2024-2025.
Submitted for (All India Secondary certificate Examination) Artificial
Intelligence Practical Examination held on ____________ at
___________________________________
by AISSCE, New Delhi.
Project Structure:
1. Introduction:
This project addresses the need for predicting CO2 emissions from key car features to better
understand and mitigate the environmental impact of vehicles. With the increasing focus on
sustainability, accurate predictions can aid policymakers, manufacturers, and consumers in
making informed decisions.
2. Data:
The dataset used in this project comprises information about various car models, including
their engine size, number of cylinders, fuel consumption, and CO2 emissions. The data was
sourced from [Dataset Source].
3. Data Preprocessing:
Missing values in the dataset were addressed by removing rows with missing CO2 emission
values. No imputation was performed, as the dataset size allowed for the removal of
incomplete entries. Data cleaning involved checking for outliers and ensuring consistent
formatting.
4. Model Training:
A linear regression model was chosen due to its simplicity and interpretability. The features
used for training include engine size, number of cylinders, and fuel consumption. The model
was trained using scikit-learn's Linear Regression class, with default parameters.
5. Model Evaluation:
The performance of the linear regression model was evaluated using mean squared error
(MSE) as the primary metric. The MSE provided a measure of the model's accuracy in
predicting CO2 emissions. Challenges included addressing potential multicollinearity among
features. The ‘None’ value is predicted by the machine using linear regression.
6. Prediction:
The trained linear regression model was applied to predict the missing CO2 emission value
for a specific car model. The predicted value serves as an estimate of the environmental
impact of the vehicle based on its engine size, number of cylinders, and fuel consumption.
7. Visualization:
Scatter plots with regression lines were employed to visualize the relationship between actual
and predicted CO2 emissions. The green dashed line represents perfect prediction, while blue
points denote training data, and the red point represents the predicted value for the missing
data point.
8. Dependencies:
- Python 3.8
- scikit-learn 0.24.1
- pandas 1.2.1
- matplotlib 3.3.3
9. Usage:
1. Python should be installed on our machine.
2. Required libraries should be installed using `pip install -r requirements.txt`.
3. Run the main script: `python main.py`.
10. Conclusion:
The project successfully demonstrated the application of linear regression in predicting CO 2
emissions from car features. While the model showed reasonable accuracy, further refinement
could involve exploring more sophisticated regression techniques or incorporating additional
features.
11. References:
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning.
Springer.dman, J. (2009). The Elements of Statistical Learning. Springer.
SOURCE CODE:
import pandas as pd
from sklearn.linear_model import LinearRegression
# Given data
data = {
'ENGINESIZE': [2, 2.4, 1.5, 3.5, 3.5, 3.5, 3.5, 3.7, 3.7, 2.4],
'CYLINDERS': [4, 4, 4, 6, 6, 6, 6, 6, 6, 4],
'FUELCONSUMPTION': [8.5, 9.6, 5.9, 11.1, 10.6, 10, 10.1, 11.1, 11.6, 9.2],
'CO2EMISSIONS': [196, 221, 136, 255, 244, 230, 232, 255, 267, None] }
df = pd.DataFrame(data)
# Separate the data into training and testing sets
train_data = df.dropna() # Remove rows with missing values
test_data = df[df['CO2EMISSIONS'].isnull()]
# Train a linear regression model
X_train = train_data[['ENGINESIZE', 'CYLINDERS', 'FUELCONSUMPTION']]
y_train = train_data['CO2EMISSIONS']
model = LinearRegression()
model.fit(X_train, y_train)
# Predict the missing value
X_test = test_data[['ENGINESIZE', 'CYLINDERS', 'FUELCONSUMPTION']]
predicted_co2 = model.predict(X_test)
# Replace the missing value in the original dataframe
df.loc[df['CO2EMISSIONS'].isnull(), 'CO2EMISSIONS'] =
predicted_co2 print("Predicted CO2 Emissions:", predicted_co2[0])
print(df)
# Scatter plot with regression line
plt.scatter(train_data['CO2EMISSIONS'], model.predict(X_train), color='blue',
label='Training Data')
plt.scatter(test_data['CO2EMISSIONS'], predicted_co2, color='red', label='Predicted Value')
plt.plot([min(df['CO2EMISSIONS']), max(df['CO2EMISSIONS'])],
[min(df['CO2EMISSIONS']), max(df['CO2EMISSIONS'])], linestyle='--', color='green',
label='Perfect Prediction')
OUTPUT: