Project Documentaiotn - InDIA Abellllll
Project Documentaiotn - InDIA Abellllll
Roll No : -----------------------------------------------------
Project Report submitted in fulfillment of Class XII
Syllabus Requirement By
--------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
Apart from our personal hard work, this project has received great support from the
We take this opportunity to thank our manager Rev. Fr. John Panchickal, our Principal
Fr. Antony T L , Vice Principal Fr. Shiju Varghese Kandaplackal and our teachers who
have helped us in our endeavor. A special mention is to make about our computer
teachers, Mr. Cherian K Abraham, and Mrs. Thanuja Mathew in this regard. We would
also like to thank our friends for their co-operation and suggestions. Finally, we thank
When confronted with a large amount of data, we seek to summarize the data into
statistics that capture the essence of the data with as few numbers as possible. This project
provides a statistical analysis of the COVID patient data collected from the States of India
over a period of one month. . It contains data of infected people with the COVID-19 virus.
This data set has 36 observations that include the STATE wise details of Total cases, Active
cases, Discharged cases, Number of Deaths, Active case ratio, Discharge cases ratio and
Death ratio etc collected over a month.
Graphing the data has a similar goal: to reduce the data to an image that represents
all the key aspects of the raw data. Various types of plots were used in the project to
summarize the statistics.
Contents
1. INTRODUCTION
2. SYSTEM ANALYSIS
3. SYSTEM DESIGN
4. SYSTEM IMPLEMENTATION
5. CONCLUSION
6. REFERENCES
1. INTRODUCTION
What is Pandas?
Pandas is a package commonly used to deal with data analysis. It simplifies the loading
of data from external sources such as text files and databases, as well as providing ways
of analysing and manipulating data once it is loaded into your computer. The features
provided in pandas automate and simplify a lot of the common tasks that would take
many lines of code to write in the basic Python language.
Pandas is a hugely popular, and still growing Python library which is used across a range
of disciplines like environmental and climate science, social science, linguistics, biology,
as well as a number of applications in industry such as data analytics, financial trading,
and many others.
Pandas is best suited for structured, labelled data, in other words, tabular data, that has
headings associated with each column of data. Pandas is fast. Python sometimes gets a
bad rap for being a bit slow compared to ‘compiled’ languages such as C and Fortran. But
deep down in the internals of Pandas, it is actually written in C, and so processing large
datasets is no problem for Pandas.
The results obtained after analysis is used to make inferences or draw conclusions about
data as well as to make important business decisions. Sometimes, it is not easy to infer
by merely looking at the results. In such cases, visualisation helps in better
understanding of results of the analysis. Data visualisation means graphical or pictorial
representation of the data using graph, chart, etc. The purpose of plotting data is to
visualize variation or show relationships between variables. Visualisation also helps to
effectively communicate information to intended users.
2. SYSTEM ANALYSIS
The first case of COVID-19 in India, which originated from China, was reported on
30 January 2020. India currently has the largest number of confirmed cases in Asia and
has the second-highest number of confirmed cases in the world after the United States.
Analysis of COVID patient data collected from different states of India over a period
of one month. For this project, we took data from the web site of Govt of India. It contains
data of infected people with the COVID-19 virus. This data set has 36 observations that
include the STATE wise details of Total cases, Active cases, Discharged cases, Number of
Deaths, Active case ratio, Discharge cases ratio and Death ratio etc. collected over a month.
2.3 SYSTEM SPECIFICATIONS
HARDWARE SPECIFICATIONS
The following is the hardware specification of the system on which the software
has been developed:-
We used Intel core i5 2nd generation based system, it is fast than other processors
and provide reliable and stable performance and we can run our pc for longtime.
By using this processor, we can keep on developing our project without any
worries. 4gb RAM is used as it will provide fast reading and writing capabilities
and will support in processing.
SOFTWARE SPECIFICATIONS
For this project, we took data from the web site of Govt of India. It contains data of
infected people with the COVID-19 virus. This data set has 36 observations that include the
STATE wise details of Total cases, Active cases, Discharged cases, Number of Deaths, Active
case ratio, Discharge cases ratio and Death ratio etc. collected over a month.
COVIDIN.CSV
Pandas is an excellent toolkit for working with real world data that often have a
tabular structure (rows and columns). The main data structures in Pandas are
implemented with Series and DataFrame classes. The former is a one-dimensional
indexed array of some fixed data type. The latter is a two-dimensional data structure - a
table - where each column contains data of the same type.
Pandas makes it very convenient to load, process, and analyze such tabular data
using SQL-like queries.
When confronted with a large amount of data, we seek to summarize the data
into statistics that capture the essence of the data with as few numbers as possible.
Graphing the data has a similar goal: to reduce the data to an image that represents all
the key aspects of the raw data.
For this project, we took data from the web site of Govt of India. It contains data of
infected people with the COVID-19 virus. This data set has 36 observations that include the
STATE wise details of Total cases, Active cases, Discharged cases, Number of Deaths, Active
case ratio, Discharge cases ratio and Death ratio etc. collected over a month. Various types
of plots were used in the project to summarize the statistics.
3.4 MENU STRUCTURE
4. Exit
4. SYSTEM IMPLEMENTATION
4.1 SOURCE CODE AND MODULE
import csv
import time
import sys
import os
global covdf
try:
covdf=pd.read_csv("C:\\PYTHON38\\IP_PROJECTS\\COVID\\covidin.csv")
except Exception as e:
print(e) print('\
t\t',end='') for i in
range (38):
print('*',end='')
print()
print('\t\t',end='')
print('*',end='')
print('\n\n')
while True:
print("4. Exit")
if option==1:
while True:
print("6. Exit")
state='''
NL Nagaland OR Odisha PY
Puducherry
PB Punjab RJ Rajasthan SK Sikkim
if choice==1:
print(st)
covdf=covdf.set_index('STATE')
if st in ['AN',
'AP','AR','AS','BR','CH','CG','DH','DL','GA','GJ','HR','HP','JK','JH','KA','KL','LK','LD','MP','MH','MN','M
L','MZ','NL','OR','PY','PB','RJ','SK','TN','TL','TR','UP','UK','WB']:
data=covdf.loc[st]
print(data)
else:
elif choice==2:
data=covdf[covdf.TOTAL==covdf.TOTAL.max()]
data=covdf[covdf.ACTIVE==covdf.ACTIVE.max()]
data=covdf[covdf.DISCHARGED==covdf.DISCHARGED.max()]
data=covdf[covdf.DEATHS==covdf.DEATHS.max()]
elif choice==3:
data=covdf[covdf.TOTAL==covdf.TOTAL.min()]
data=covdf[covdf.DISCHARGED==covdf.DISCHARGED.min()]
data=covdf[covdf.DEATHS==covdf.DEATHS.min()]
elif choice==4:
COVID\\covidin.csv") covdf=covdf.set_index('STATE')
ratioDF=covdf[covdf.columns[5:8]]
print(ratioDF)
elif choice==6:
break
elif option==2:
while True:
print("\t\t\t 5. Exit")
if option==1:
tot=covdf[['STATE','TOTAL']]
tot=tot.set_index('STATE')
print(tot)
tot.plot(kind='bar')
plt.xlabel('State')
plt.show()
elif option==2:
tot=covdf[['STATE','ACTIVE']]
tot=tot.set_index('STATE')
print(tot)
tot.plot(kind='bar')
plt.xlabel('State')
plt.ylabel('Active Cases')
plt.show()
elif option==3:
tot=covdf[['STATE','DISCHARGED']]
tot=tot.set_index('STATE')
print(tot)
tot.plot(kind='bar')
plt.xlabel('State')
plt.ylabel('Discharge Cases')
plt.show()
elif option==4:
tot=covdf[['STATE','DEATHS']]
tot=tot.set_index('STATE')
print(tot)
tot.plot(kind='bar')
plt.xlabel('State')
plt.ylabel('Death Cases')
plt.show()
elif option==5:
break
elif option==3:
try:
covdf=pd.read_csv("C:\\PYTHON38\\IP_PROJECTS\\COVID\\COVID.csv")
except Exception as e:
print(e)
print(covdf)
val=covdf.loc[rowidx,colidx]
ch=input("do you really want to update the columns value "+str(val) +" [Y/N]")
if ch in 'YyNn':
covdf.loc[rowidx,colidx]=newval
print("Data frame updated successfully............")
print(covdf)
covdf.to_csv("C:\\PYTHON38\\IP_PROJECTS\\COVID\\COVID.csv", index=False)
elif option==4:
print("System exiting.............")
time.sleep(5)
break
else:
project described here focus mainly on the fundamentals of manipulating data (indexing,
grouping, aggregating etc ), making use of the core DataFrame and Series objects. The
software appears very flexible since it is menu driven with user-friendly screens. No Formal
programming knowledge is required for the user. Also, the user is not burdened with data
storing and data retrieval procedures as both are done internally. Visualisation also helps to