0% found this document useful (0 votes)
63 views23 pages

Project

Uploaded by

Ekansh Girdhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views23 pages

Project

Uploaded by

Ekansh Girdhar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

CERTIFICATE

This is to certify that the Project entitled,


PROJECT ON DATA ANALYSIS OF THE LUNG
CANCER is a bonded work done by EKANSH
GIRDHAR of class XII-A AND ANUJ SALIL of class
XII-B, session 2024-25, and has been carried out
under my supervision and guidance.

……………………………..
……………………………
(Signature of Teacher) (Signature of
External Examiner)

…………………………….
(Signature of Principal)
ACKNOWLEDGEMENTS

I would like to express my sincere gratitude


towards my project guide and Informatics
Practices teacher, Ms. Pooja Thakur for
guiding me throughout the making of this
project. Her valuable advice helped me
immensely in the successful completion of
this project. I would also like to thank our
school principal, Ms. Meenu Kanwar, for her
coordination in extending every possible
support.
INTRODUCTION

Lung cancer is one of the most common and


serious types of cancer, causing a large number
of deaths worldwide. Understanding the disease,
its causes, and how it progresses is essential for
improving diagnosis, treatment, and survival
rates. However, the complexity of lung cancer
makes it a challenge for doctors and researchers.

This project uses a detailed dataset of 23,658


lung cancer patients to study the disease. The
data includes basic information like age, gender,
and ethnicity, as well as lifestyle details such as
smoking history and the number of years a
person smoked. It also contains important
medical details like the size and location of the
tumor, the stage of cancer, and the type of
treatment received. Blood test results and other
health indicators, such as hemoglobin, glucose,
and creatinine levels, provide additional insights.

By analyzing this information, the project aims to


find patterns and connections that can help
doctors make better decisions about diagnosis
and treatment. These insights could lead to
better care for patients and contribute to the
ongoing fight against lung cancer.

The dataset used for this project is obtained from


www.kaggle.com and is given ahead
I have done the analysis using Python Pandas on
windows machine but the project can be run on
any machine supporting Python and Pandas.
Aside from pandas, matplotlib python module has
also been used for visualization of this dataset.

FEATURES OF THE PROJECT

The whole project is majorly divided into four


major parts of reading, analysis and visualization.

It includes:
1.Reading the data of csv file
2.Display the Dataframe Statistics
3.Data Analysis (Detailed)
4.Data Analysis (Comprehensive)
5.Plotting graphs using the data from csv file
(data visualization)
CODE

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("C:\\Users\\ekans\\Desktop\\lc.csv",nrows=250)

while(True):
print("Main Menu")
print("1. Fetch data")
print("2. Dataframe Statistics")
print("3. Detailed and comprehensive Data Analysis")
print("4. Data Visualization")
print("5. Exit")
ch=int(input("Enter your choice:"))
if ch==1:
print(df)
elif ch==2:
while (True):
print("Dataframe Statistics Menu")
print("1. Display the Transpose")
print("2. Display all column names")
print("3. Display the indexes")
print("4. Display the shape")
print("5. Display the dimension")
print("6. Display the data types of all columns")
print("7. Display the size")
print("8. Modify Dataset")
print("9. Exit")
ch2=int(input("Enter choice"))
if ch2==1:
print(df.T)
elif ch2==2:
print(df.columns)
elif ch2==3:
print(df.index)
elif ch2==4:
print(df.shape)
elif ch2==5:
print(df.ndim)
elif ch2==6:
print(df.dtypes)
elif ch2==7:
print(df.size)
elif ch2==8:
while(True) :
print( "Modify the Dataset")
print( "1. Add a column")
print( "2. Add a row")
print( "3. Remove a Row")
print( "4. Remove a Column")
print( "5. Exit")
choice=int(input( "Enter your choice: "))
if choice==1:
i=input("Enter the colum name")
b=input("Enter the values needed to be added")
df.i=b
elif choice==2:
i=input("Enter the row name")
b=input("Enter the values needed to be added")
df.loc[i]=b
elif choice==3:
a=input("Enter the row needed to be deleted")
df.drop(a,inplace=True)
elif choice==4:
a=input("Enter the column needed to be deleted")
df.drop(a,axis=1,inplace=True)
elif choice==5:
break
elif ch2==9:
break
elif ch==3:
while(True):
print('Data Extraction and Analysis:')
print('1. Display Records')
print('2. Extract patients data')
print('3. Data Analysis')
print('4. Exit')
ch3=int(input("Enter choice"))
if ch3==1:
print('Display Data:')
print(df)
elif ch3==2:
while(True):
print('Extract Data:')
print('1. Top 5 Records')
print('2. Bottom 5 Records')
print('3. Specific number of records from the top')
print('4. Specific number of records from the bottom')
print('5. Sort Based on Category')
print('6. Exit')
e=int(input("Enter choice"))
if e==1:
print(df.head(5))
elif e==2:
print(df.tail(5))
elif e==3:
i_top=int(input("Enter no of records to be extracted from
the top"))
print(df.head(i_top))
elif e==4:
i_bottom=int(input("Enter no of records to be extracted
from the bottom"))
print(df.tail(i_bottom))
elif e==5:
while (True):
print('Sort based on category:')
print('1. Gender')
print('2. Ethnicity')
print('3. Smoking History of patient')
print('3. Smoking History in family')
print('4. Tumour Location')
print('5. Treatment Used')
print('6. Exit')
oi=int(input("Enter choice"))
if oi==1:
gender=int(input('Gender: 1-Male | 2-Female' ))
if gender==1:
print(df[df.Gender=="Male"])
elif gender==2:
print(df[df.Gender=="Female"])
elif oi==2:
while(True):
print('Ethnicity:')
print('1-Hispanic')
print('2-Caucasian')
print('3-African American')
print('4-Asian')
print('5-Other')
print('6. Exit')
ethnicity=int(input("Enter choice"))
if ethnicity==1:
print(df[df.Ethnicity=="Hispanic"])
elif ethnicity==2:
print(df[df.Ethnicity=="Caucasian"])
elif ethnicity==3:
print(df[df.Ethnicity=="African American"])
elif ethnicity==4:
print(df[df.Ethnicity=="Asian"])
elif ethnicity==5:
print(df[df.Ethnicity=="Other"])
elif ethnicity==6:
exit
elif oi==3:
while(True):
print('Somking History of patient:')
print('1-Current Smoker')
print('2-Never Smoked')
print('3-Former Smoker')
print('4. Exit')
smoke=int(input("Enter choice"))
if smoke==1:
print(df[df.Smoking_History=="Current
Smoker"])
elif smoke==2:
print(df[df.Smoking_History=="Former
Smoker"])
elif smoke==3:
print(df[df.Smoking_History=="Former
Smoker"])
elif smoke==4:
exit
elif oi==4:
history=int(input('Smoking History: 1-Yes | 2-No'))
if history==1:
print(df[df.Family_History=="Yes"])
elif history==2:
print(df[df.Family_History=="No"])
elif oi==5:
while(True):
print('Tumour location:')
print('1-Upper Lobe')
print('2-Middle Lobe')
print('3-Lower Lobe')
print('4. Exit')
loc=int(input("Enter choice"))
if loc==1:
print(df[df.Tumor_Location=="Upper Lobe"])
elif loc==2:
print(df[df.Tumor_Location=="Middle Lobe"])
elif loc==3:
print(df[df.Tumor_Location=="Lower Lobe"])
elif loc==4:
exit
elif oi==6:
while(True):
print('Treatment Used:')
print('1-Surgery')
print('2-Radiation Therapy')
print('3-Chemotherapy')
print('4-Targeted Therapy')
print('5. Exit')
method=int(input("Enter choice"))
if method==1:
print(df[df.Treatment=="Surgery"])
elif method==2:
print(df[df.Treatment=="Radiation Therapy"])
elif method==3:
print(df[df.Treatment=="Chemotherapy"])
elif method==4:
print(df[df.Treatment=="Targeted Therapy"])
elif method==5:
exit
elif oi==7:
break
if ch3==3:
while(True):
print('Data Analysis:')
print('1. Count of all ethnic groups')
print('2. Average age of patients')
print('3. Oldest patient')
print('4. Youngest patient')
print('5. Maximum no. of smoking packs consumed by a
patient yearly')
print('6. Minimum no. of smoking packs consumed by a
patient yearly')
print('7. No. of patients having a histoy of smoking in both
personaly and family')
print('8. Exit')
data = int(input('Enter the Choice'))
if data==1:
m=df.groupby("Ethnicity")
print(m.count())
elif data==2:
print(df.Age.mean())
elif data==3:
print(df.Age.max())
elif data==4:
print(df.Age.min())
elif data==5:
print(df.Smoking_Pack_Years.max())
elif data==6:
print(df.Smoking_Pack_Years.min())
elif data==7:
c=0
for x,y in (len(pd.d.index)+1):
if (df.loc[x,"Smoking_History"] in ("Current
Smoker","Former Smoker")) and df.loc[x,"Family_History"]=="yes":
c+=1
print(c)
elif data==8:
break
elif ch==4:
while(True):
print("Line Plot Sub Menu")
print("1. graph between male or female paitents ")
print("2. pie chart different ethinicity")
print('3. graph b/w Smoking history')
print('4. family history')
print('5. histogram for age category')
print("6. Exit")
ch4=int(input("Enter choice"))
if ch4==1:
gender= df['Gender'].value_counts()
gender.plot(kind='bar')
plt.title('Gender Distribution')
plt.xlabel('Gender')
plt.ylabel('Number of Patients')
plt.show()

elif ch4==2:
m = df.groupby("Ethnicity").size()
plt.pie(m.values,labels=m.index)
plt.title("Distribution by Ethnicity")
plt.show()

elif ch4==3:
smoking_history= df['Smoking_History'].value_counts()
smoking_history.plot(kind='bar')
plt.title('Distribution of Smoking History')
plt.xlabel('Smoking History')
plt.ylabel('Number of Patients')
plt.xticks(rotation=45)
plt.show()

elif ch4==4:
family_history = df['Family_History'].value_counts()
family_history.plot(kind='bar', color='lightgreen',
edgecolor='black')
plt.title('Distribution of Family History')
plt.xlabel('Family History')
plt.ylabel('Number of Patients')
plt.xticks(rotation=45)
plt.show()

elif ch4==5:
plt.hist(df['Age'].dropna(), rwidth=0.8)
plt.title('Distribution by Age')
plt.xlabel('Age')
plt.ylabel('Number of Patients')
plt.show()
OUTPUT
1. Fetch Data

2. Dataframe Statistics
a. Display the Transpose

b. Display all column names

c. Display the indexes

d. Display the shape


e. Display the dimension

f. Display the data types of all columns

g. Display the size

h. Modify Dataset
3. Detailed and comprehensive Data Analysis

4. Data Visualization
a. Graph between Male or Female paitents
b. Pie chart Different Ethnicity

c. Graph b/w Smoking History

d. Family History
e. Histogram for Age Category
BIBLIOGRAPHY

The following books/websites helped me in


the completion of this project.

• Informatics Practices NCERT


• Informatics Practices by Preeti Arora
• stackoverflow.com
• Kaggle.com
• www.w3schools.com
CONCLUSION

Through this project, I have garnered an


extensive knowledge of the Pandas, and
PyPlot modules and their workings. I have
also understood the importance of data
analysis through its practical application.

You might also like