PM Shri Kendriya Vidyalaya Pattom Shift Ii: Movie Data Analysis
PM Shri Kendriya Vidyalaya Pattom Shift Ii: Movie Data Analysis
PATTOM SHIFT II
Submitted by:
Name:……………………….
Class: XII ……..
1
This is certified to be the bonafide work of
………………………………….. of Class XII (Roll
No:…………..) in the AISSC Practical Examination 2024
-25 for the subject Informatics Practices (065) and has
completed his/her Project on << Project name>> during
the academic session 2024-2025 as per the guidelines
issued by the Central Board of Secondary Education
(CBSE).
Teacher-In-Charge
1. Chapter 1
Project Introduction……………………………………………….………
2. Chapter 2:
Objectives………………………………………………………………………
3. Chapter 3:
System requirements ……………………………………………………….
4. Chapter 4:
System Study …………………………………………………………………………..
5. Chapter 5:
System Design……………………………………………………………………………..
6. Chapter 6 :
Source code ……………………………………………………………………………………
7. Chapter 7 :
Output Window………………………………………………………………………
8. Chapter 8 :
Conclusion………………………………………..
9. Bibliography………………………………………………………………
3
CHAPTER 1
INTRODUCTION
1.1 GENERAL INTRODUCTION
This project focuses on analyzing movie data using Python. The dataset includes
information about various movies such as their titles, overviews, languages, vote
counts, and vote averages. By leveraging Python libraries like pandas and matplotlib,
this analysis explores different aspects of the dataset, such as vote distribution,
average ratings, and movie popularity.
The project is divided into several parts, including data visualization and menu-driven
functionalities. Users can interact with the program through a menu that allows them
to view specific parts of the dataset, add new records, delete existing ones, and
generate graphs.
Introduction to tools used
1.2 INTRODUCTION TO PYTHON
Python is a widely used general-purpose, high level programming language. It was
created by Guido van Rossum in 1991 and further developed by the Python Software
Foundation. It was designed with an emphasis on code readability, and its syntax allows
programmers to express their concepts in fewer lines of code.
Python is a programming language that lets you work quickly and integrate systems
more efficiently.
4
Pandas is a package commonly used to deal with data analysis. It simplifies the loading
of data from external sources such as text files and databases, as well as providing
ways of analyzing and manipulating data once it is loaded into your computer. The
features provided in pandas automate and simplify a lot of the common tasks that
would take many lines of code to write in the basic Python language.
5
Data visualization in Python can be done via many packages. One example of a package
is matplotlib. The matplotlib is a Python library that provides many interfaces and
functionality for 2D- graphics similar to MATLAB’s various forms.
The matplotlib library offers many different named collections of methods; PyPlot is
one such interface. PyPlot is a collection of methods within matplotlib which allows
user to construct 2D plots and graphs easily and interactively. matplotlib is a 2D
plotting library that helps in visualizing figures.
6
CHAPTER 2
OBJECTIVES
2.1 OBJECTIVE OF THE PROJECT
This project ‘MOVIE DATA ANALYSIS’ aims to work with the data and visualize the data
in the form of different charts and graphs using Python and its different modules. The
software is designed in Python IDLE script mode and uses CSV file as background.
The Movies are integral part of our life. We love to watch movies but it is very hard to
find out good movies from the world cinema. TMDB.org is a crowd-sourced movie
information database used by many film-related consoles, sites and apps, such as
XBMC, MythTV and Plex. Dozens of media managers, mobile apps and social sites make
use of its API.
TMDb lists some 80,000 films at time of writing, which is considerably fewer than
IMDb. While not as complete as IMDb, it holds extensive information for most
popular/Hollywood films.This is dataset of the 10,000 most popular movies across the
world has been fetched through the read API. TMDB's free API provides for developers
and their team to programmatically fetch and use TMDb's data. Their API is to use as
long as you attribute TMDb as the source of the data and/or images. Also, they update
their API from time to time. This data set is fetched using exception handling process
so the data set contains some null values as there are missing fields in the TMDB
database. Thought it's good for a young analyst to deal with missing value.
In this project we are going to analyse the same dataset using Python Pandas on
windows machine but the project can be run on any machine support Python and
Pandas. Besides pandas we also used matplotlib python module for visualization of this
dataset.The data is stored in csv format and accessed using read_csv() function in to
our project to plot different graphs as per the choice of the end user.
7
CHAPTER 3
SYSTEM REQUIREMENTS
3.1 MINIMUM HARDWARE REQUIREMENTS
• x86 64-bit CPU (Intel / AMD architecture)
• 4 GB RAM
• 5 GB free disk space
• Display – Monochrome/VGA
• Standard QWERTY Keyboard
• Mouse
8
SYSTEM STUDY
4.1 EXISTING SYSTEM
The existing system forces as to relay on any software or online websites for storing
the data and analysing the data for obtaining output. This could limit the possibilities as
the software available for such data analysis is always restricted to the developer’s
choices and ideas.
Also, the data collected has to be entered to such software which is time consuming.
Also, it leads to confusions, greater work load and at times tedious too. Developing an
own program can be always customisable and changed as per own decisions and
requirements. This is not always possible with most existing software/system.
9
CHAPTER 5
SYSTEM DESIGN
5.1 SYSTEM DESIGN
System design is an approach to the creation of the system. It involves selection of
system functions, drawing of the system, flowchart etc. It provides the understanding
procedure details necessary for implementing the system recommended in the
feasibility studies.
The activities carried in the design phase are as follows:
1.Output design
2.Input design
3.Database design
10
5.3 INPUT DESIGN
Input system is a phase system designing. It is the process of converting the collected
input data into the computer based format. The aim of designing input data is to make
data entry easy. The phase often the first place of detecting and correcting errors,
during data entry one should know the space allocated for each field, the field
sequences that matches the source document and the format in which the data is to be
entered. The objective of input data design is to create an input layout that easy to
follow and that which does not include operate errors.
11
CHAPTER 6
SOURCE CODE
import pandas as pd
import matplotlib.pyplot as plt
while True:
print('\n\nData Analysis MENU')
print('=' * 100)
print('1. Show Whole DataFrame')
print('2. Show Columns')
print('3. Show Top Rows')
print('4. Show Bottom Rows')
print('5. Show Specific Column')
print('6. Add a New Record')
print('7. Add a New Column')
print('8. Delete a Column')
print('9. Delete a Record')
print('10. Exit (Move to main menu)')
try:
ch = int(input('Enter your choice: '))
except ValueError:
print("Invalid input. Please enter a number between 1 and 10.")
continue
13
if ch == 1:
print(df)
elif ch == 2:
print(df.columns)
elif ch == 3:
n = int(input('Enter the number of top rows to display: '))
print(df.head(n))
elif ch == 4:
n = int(input('Enter the number of bottom rows to display: '))
print(df.tail(n))
elif ch == 5:
print(df.columns)
col_name = input('Enter the column name you want to display: ')
if col_name in df.columns:
print(df[col_name])
else:
print(f"Column '{col_name}' not found.")
elif ch == 6:
new_record = {}
for col in df.columns:
new_record[col] = input(f"Enter value for {col}: ")
df = df.append(new_record, ignore_index=True)
14
print("New record added.")
elif ch == 7:
col_name = input('Enter the new column name: ')
col_value = input('Enter the default value for the new column: ')
df[col_name] = col_value
print(f"Column '{col_name}' added.")
elif ch == 8:
col_name = input('Enter the column name to delete: ')
if col_name in df.columns:
del df[col_name]
print(f"Column '{col_name}' deleted.")
else:
print(f"Column '{col_name}' not found.")
elif ch == 9:
index_no = int(input('Enter the index number of the record to delete:
'))
if 0 <= index_no < len(df):
df = df.drop(index_no, axis=0)
print(f"Record at index {index_no} deleted.")
else:
print("Invalid index.")
elif ch == 10:
15
break
else:
print("Invalid choice. Please try again.")
# Graph menu
def graph():
try:
df = pd.read_csv(csv_file)
except FileNotFoundError:
print("Error: File not found. Please check the file path.")
return
while True:
print('\nGRAPH MENU')
print('=' * 100)
print('1. Line Graph')
print('2. Bar Graph')
print('3. Horizontal Bar Graph')
print('4. Exit')
try:
ch = int(input('Enter your choice: '))
16
except ValueError:
print("Invalid input. Please enter a number between 1 and 4.")
continue
if ch == 1:
g = df.groupby('language')
x = g['language'].count().index
y = g['language'].count().values
plt.plot(x, y)
plt.xlabel('Language')
plt.ylabel('Count')
plt.title('Language-wise Movie Count')
plt.grid(True)
plt.show()
elif ch == 2:
g = df.groupby('language')
x = g['language'].count().index
y = g['language'].count().values
plt.bar(x, y)
plt.xlabel('Language')
plt.ylabel('Count')
plt.title('Language-wise Movie Count')
17
plt.grid(True)
plt.show()
elif ch == 3:
g = df.groupby('language')
x = g['language'].count().index
y = g['language'].count().values
plt.barh(x, y)
plt.xlabel('Count')
plt.ylabel('Language')
plt.title('Language-wise Movie Count')
plt.grid(True)
plt.show()
elif ch == 4:
break
else:
print("Invalid choice. Please try again.")
# Export menu
def export_menu():
try:
df = pd.read_csv(csv_file)
except FileNotFoundError:
18
print("Error: File not found. Please check the file path.")
return
while True:
print('\nEXPORT MENU')
print('=' * 100)
print('1. Export to CSV')
print('2. Export to Excel')
print('3. Exit')
try:
ch = int(input('Enter your choice: '))
except ValueError:
print("Invalid input. Please enter a number between 1 and 3.")
continue
if ch == 1:
df.to_csv('c:/backup/newMovies.csv', index=False)
print('Data exported to c:/backup/newMovies.csv.')
elif ch == 2:
df.to_excel('c:/backup/newMovies.xlsx', index=False)
print('Data exported to c:/backup/newMovies.xlsx.')
19
elif ch == 3:
break
else:
print("Invalid choice. Please try again.")
# Main menu
def main_menu():
while True:
print('\nMAIN MENU')
print('=' * 100)
print('1. Read CSV File')
print('2. Data Analysis Menu')
print('3. Graph Menu')
print('4. Export Data')
print('5. Exit')
try:
choice = int(input('Enter your choice: '))
except ValueError:
print("Invalid input. Please enter a number between 1 and 5.")
continue
20
if choice == 1:
read_csv_file()
elif choice == 2:
data_analysis_menu()
elif choice == 3:
graph()
elif choice == 4:
export_menu()
elif choice == 5:
print("Exiting program.")
break
else:
print("Invalid choice. Please try again.")
21
CHAPTER 7
OUTPUT WINDOW
7.1 MAIN MENU WINDOW
22
7.3 DATAFRAME DISPLAYED FROM CSV
23
7.5 OUTPUT CHART
24
25
26
CHAPTER 8
CONCLUSION
This program is used to analyse the data store in a csv
file using Python’s DataFrame and visualization
techniques by matplotlib. The program generates line
chart or bar graph based on the choice given by user.
27
CHAPTER 9
BIBLIOGRAPHY
Data Handling with Pandas. NCERT. Informatics
Practices Textbook for Class 12, National Council of
Educational Research and Training,
https://ncert.nic.in/textbook/pdf/keip103.pdf. Accessed
6 January 2025.
28
29
30
CHAPTER 2
TECHNOLOGIES USED
31
CHAPTER 3
PROGRAM CODE
CODE:
32
CHAPTER 4
Screenshots:
33
CHAPTER 5
CONCLUSION
34
Acknowledgement
I would like to express my special thanks of
gratitude to my physics teachers Padmaja
madam for giving me this golden opportunity
to do this project. I would also like to thank
my parents and friends who helped me
finishing this project. I sincerely Thank the
Principal shri R giri Sankaran Thampi sir for
providing an excellent environment and
facilities to complete the project
Sreesathya bharadwaj ks
12:B