0% found this document useful (0 votes)
51 views35 pages

PM Shri Kendriya Vidyalaya Pattom Shift Ii: Movie Data Analysis

Uploaded by

muhdismail8921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views35 pages

PM Shri Kendriya Vidyalaya Pattom Shift Ii: Movie Data Analysis

Uploaded by

muhdismail8921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

PM SHRI KENDRIYA VIDYALAYA

PATTOM SHIFT II

Project REPORT FOR


INFORMATICS PRACTICES (065) - 2024-2025

Movie data analysis

Submitted by:
Name:……………………….
Class: XII ……..

Under the Guidance of


Mrs. Ambily krishna, PGT CS

1
This is certified to be the bonafide work of
………………………………….. of Class XII (Roll
No:…………..) in the AISSC Practical Examination 2024
-25 for the subject Informatics Practices (065) and has
completed his/her Project on << Project name>> during
the academic session 2024-2025 as per the guidelines
issued by the Central Board of Secondary Education
(CBSE).

Teacher-In-Charge

Examiner’s signature Principal

Institution Rubber Stamp


2
Contents

1. Chapter 1
Project Introduction……………………………………………….………
2. Chapter 2:
Objectives………………………………………………………………………
3. Chapter 3:
System requirements ……………………………………………………….
4. Chapter 4:
System Study …………………………………………………………………………..
5. Chapter 5:
System Design……………………………………………………………………………..
6. Chapter 6 :
Source code ……………………………………………………………………………………
7. Chapter 7 :
Output Window………………………………………………………………………
8. Chapter 8 :
Conclusion………………………………………..
9. Bibliography………………………………………………………………

3
CHAPTER 1

INTRODUCTION
1.1 GENERAL INTRODUCTION
This project focuses on analyzing movie data using Python. The dataset includes
information about various movies such as their titles, overviews, languages, vote
counts, and vote averages. By leveraging Python libraries like pandas and matplotlib,
this analysis explores different aspects of the dataset, such as vote distribution,
average ratings, and movie popularity.

The project is divided into several parts, including data visualization and menu-driven
functionalities. Users can interact with the program through a menu that allows them
to view specific parts of the dataset, add new records, delete existing ones, and
generate graphs.
Introduction to tools used
1.2 INTRODUCTION TO PYTHON
Python is a widely used general-purpose, high level programming language. It was
created by Guido van Rossum in 1991 and further developed by the Python Software
Foundation. It was designed with an emphasis on code readability, and its syntax allows
programmers to express their concepts in fewer lines of code.
Python is a programming language that lets you work quickly and integrate systems
more efficiently.

1.3 INTRODUCTION TO PANDAS


Pandas is an open-source Python Library providing high-performance data
manipulation and analysis tool using its powerful data structures. [pandas] is derived
from the term "panel data", an econometrics term for data sets that include
observations over multiple time periods for the same individuals. Pandas offers many
data structures to handle variety of data.

4
Pandas is a package commonly used to deal with data analysis. It simplifies the loading
of data from external sources such as text files and databases, as well as providing
ways of analyzing and manipulating data once it is loaded into your computer. The
features provided in pandas automate and simplify a lot of the common tasks that
would take many lines of code to write in the basic Python language.

1.4 INTRODUCTION TO PANDAS DATAFRAME


A DataFrame in Pandas stores data in two-dimensional way. It is a two- dimensional
labelled array, which is actually an ordered collection of columns where columns may
store different types of data, e.g., numeric or string or floating point or Boolean type
etc.
i. It has two indexes or we can say that two axes – a row index(axis=0) and
column index (axis=1)
ii. Each value is identifiable with the combination of row index and column index.
iii. The row index is known as index in general and the column index is called the
column-name.
iv. The indexes can be of numbers or letters or strings.
v. There is no condition of having all data of same type across columns; its
columns can have data of different types.
vi. You can easily change its values, i.e., it is value-mutable.
vii. You can add or delete rows/columns in a DataFrame. In other words it is size-
mutable.

1.5 INTRODUCTION TO DATA VISUALIZATION


Data visualization basically refers to the graphical or visual representation of
information and data using visual elements like charts, graphs, maps, etc.

5
Data visualization in Python can be done via many packages. One example of a package
is matplotlib. The matplotlib is a Python library that provides many interfaces and
functionality for 2D- graphics similar to MATLAB’s various forms.
The matplotlib library offers many different named collections of methods; PyPlot is
one such interface. PyPlot is a collection of methods within matplotlib which allows
user to construct 2D plots and graphs easily and interactively. matplotlib is a 2D
plotting library that helps in visualizing figures.

1.6 INTRODUCTION TO CSV FILES


A simple way to store big data sets is to use CSV files (comma separated files).
CSV files contains plain text and is a well know format that can be read by everyone
including Pandas. The csv file acts as the database where we store all the data related
to the program for saving, accessing and modification of the data.

6
CHAPTER 2
OBJECTIVES
2.1 OBJECTIVE OF THE PROJECT
This project ‘MOVIE DATA ANALYSIS’ aims to work with the data and visualize the data
in the form of different charts and graphs using Python and its different modules. The
software is designed in Python IDLE script mode and uses CSV file as background.

The Movies are integral part of our life. We love to watch movies but it is very hard to
find out good movies from the world cinema. TMDB.org is a crowd-sourced movie
information database used by many film-related consoles, sites and apps, such as
XBMC, MythTV and Plex. Dozens of media managers, mobile apps and social sites make
use of its API.
TMDb lists some 80,000 films at time of writing, which is considerably fewer than
IMDb. While not as complete as IMDb, it holds extensive information for most
popular/Hollywood films.This is dataset of the 10,000 most popular movies across the
world has been fetched through the read API. TMDB's free API provides for developers
and their team to programmatically fetch and use TMDb's data. Their API is to use as
long as you attribute TMDb as the source of the data and/or images. Also, they update
their API from time to time. This data set is fetched using exception handling process
so the data set contains some null values as there are missing fields in the TMDB
database. Thought it's good for a young analyst to deal with missing value.

In this project we are going to analyse the same dataset using Python Pandas on
windows machine but the project can be run on any machine support Python and
Pandas. Besides pandas we also used matplotlib python module for visualization of this
dataset.The data is stored in csv format and accessed using read_csv() function in to
our project to plot different graphs as per the choice of the end user.

7
CHAPTER 3
SYSTEM REQUIREMENTS
3.1 MINIMUM HARDWARE REQUIREMENTS
• x86 64-bit CPU (Intel / AMD architecture)
• 4 GB RAM
• 5 GB free disk space
• Display – Monochrome/VGA
• Standard QWERTY Keyboard
• Mouse

3.2 MINIMUM SOFTWARE REQUIREMENTS


• Modern Operating System:
• Windows 8 or higher
• Mac OS X 10.11 or higher, 64-bit
• Linux: RHEL 6/7, 64-bit (almost all libraries also work in Ubuntu)
• Python IDLE v3.7 or later
• Spreadsheet program, such as Microsoft Excel, OpenOffice Calc for csv file
operation.
• Modules – pandas, matplotlib.pyplot , numpy

8
SYSTEM STUDY
4.1 EXISTING SYSTEM
The existing system forces as to relay on any software or online websites for storing
the data and analysing the data for obtaining output. This could limit the possibilities as
the software available for such data analysis is always restricted to the developer’s
choices and ideas.
Also, the data collected has to be entered to such software which is time consuming.
Also, it leads to confusions, greater work load and at times tedious too. Developing an
own program can be always customisable and changed as per own decisions and
requirements. This is not always possible with most existing software/system.

4.2 PROPOSED SYSTEM


In the proposed system, we are using Python for creating data and storing it with help
of DataFrame and csv files. This can be used for data visualization. The project is done
in such a way that we are in-cooperating functions to access the data as per the user’s
choice. This will help to obtain the desired output as per the choice of the user. We also
provide the user to continue/exit the program at any point of time.

9
CHAPTER 5
SYSTEM DESIGN
5.1 SYSTEM DESIGN
System design is an approach to the creation of the system. It involves selection of
system functions, drawing of the system, flowchart etc. It provides the understanding
procedure details necessary for implementing the system recommended in the
feasibility studies.
The activities carried in the design phase are as follows:
1.Output design
2.Input design
3.Database design

5.2 OUTPUT DESIGN


Computer output is the means of directing information to the user. Efficiency in output
design helps in decision making and improving relationship between the system and the
user. The design team defines the sketches into detailed description of the output by
planning the output with the specific medium. Print out is designed based on the output
requirements of the user.

10
5.3 INPUT DESIGN
Input system is a phase system designing. It is the process of converting the collected
input data into the computer based format. The aim of designing input data is to make
data entry easy. The phase often the first place of detecting and correcting errors,
during data entry one should know the space allocated for each field, the field
sequences that matches the source document and the format in which the data is to be
entered. The objective of input data design is to create an input layout that easy to
follow and that which does not include operate errors.

5.4 DATABASE DESIGN


The user uses database to store integrated data that is input. It is organized so that
various files can be accessed through a single reference. Here we use csv files to
store the data and access the same using Python.

11
CHAPTER 6
SOURCE CODE
import pandas as pd
import matplotlib.pyplot as plt

# Global variable for the CSV file path


csv_file = "F://imbd_updated.csv"

# Function to read and display the CSV file


def read_csv_file():
try:
df = pd.read_csv(csv_file)
print(df)
except FileNotFoundError:
print("Error: File not found. Please check the file path.")

# Data analysis menu


def data_analysis_menu():
try:
df = pd.read_csv(csv_file)
except FileNotFoundError:
print("Error: File not found. Please check the file path.")
12
return

while True:
print('\n\nData Analysis MENU')
print('=' * 100)
print('1. Show Whole DataFrame')
print('2. Show Columns')
print('3. Show Top Rows')
print('4. Show Bottom Rows')
print('5. Show Specific Column')
print('6. Add a New Record')
print('7. Add a New Column')
print('8. Delete a Column')
print('9. Delete a Record')
print('10. Exit (Move to main menu)')

try:
ch = int(input('Enter your choice: '))
except ValueError:
print("Invalid input. Please enter a number between 1 and 10.")
continue

13
if ch == 1:
print(df)
elif ch == 2:
print(df.columns)
elif ch == 3:
n = int(input('Enter the number of top rows to display: '))
print(df.head(n))
elif ch == 4:
n = int(input('Enter the number of bottom rows to display: '))
print(df.tail(n))
elif ch == 5:
print(df.columns)
col_name = input('Enter the column name you want to display: ')
if col_name in df.columns:
print(df[col_name])
else:
print(f"Column '{col_name}' not found.")
elif ch == 6:
new_record = {}
for col in df.columns:
new_record[col] = input(f"Enter value for {col}: ")
df = df.append(new_record, ignore_index=True)

14
print("New record added.")
elif ch == 7:
col_name = input('Enter the new column name: ')
col_value = input('Enter the default value for the new column: ')
df[col_name] = col_value
print(f"Column '{col_name}' added.")
elif ch == 8:
col_name = input('Enter the column name to delete: ')
if col_name in df.columns:
del df[col_name]
print(f"Column '{col_name}' deleted.")
else:
print(f"Column '{col_name}' not found.")
elif ch == 9:
index_no = int(input('Enter the index number of the record to delete:
'))
if 0 <= index_no < len(df):
df = df.drop(index_no, axis=0)
print(f"Record at index {index_no} deleted.")
else:
print("Invalid index.")
elif ch == 10:

15
break
else:
print("Invalid choice. Please try again.")

# Graph menu
def graph():
try:
df = pd.read_csv(csv_file)
except FileNotFoundError:
print("Error: File not found. Please check the file path.")
return

while True:
print('\nGRAPH MENU')
print('=' * 100)
print('1. Line Graph')
print('2. Bar Graph')
print('3. Horizontal Bar Graph')
print('4. Exit')

try:
ch = int(input('Enter your choice: '))

16
except ValueError:
print("Invalid input. Please enter a number between 1 and 4.")
continue

if ch == 1:
g = df.groupby('language')
x = g['language'].count().index
y = g['language'].count().values
plt.plot(x, y)
plt.xlabel('Language')
plt.ylabel('Count')
plt.title('Language-wise Movie Count')
plt.grid(True)
plt.show()
elif ch == 2:
g = df.groupby('language')
x = g['language'].count().index
y = g['language'].count().values
plt.bar(x, y)
plt.xlabel('Language')
plt.ylabel('Count')
plt.title('Language-wise Movie Count')

17
plt.grid(True)
plt.show()
elif ch == 3:
g = df.groupby('language')
x = g['language'].count().index
y = g['language'].count().values
plt.barh(x, y)
plt.xlabel('Count')
plt.ylabel('Language')
plt.title('Language-wise Movie Count')
plt.grid(True)
plt.show()
elif ch == 4:
break
else:
print("Invalid choice. Please try again.")

# Export menu
def export_menu():
try:
df = pd.read_csv(csv_file)
except FileNotFoundError:

18
print("Error: File not found. Please check the file path.")
return

while True:
print('\nEXPORT MENU')
print('=' * 100)
print('1. Export to CSV')
print('2. Export to Excel')
print('3. Exit')

try:
ch = int(input('Enter your choice: '))
except ValueError:
print("Invalid input. Please enter a number between 1 and 3.")
continue

if ch == 1:
df.to_csv('c:/backup/newMovies.csv', index=False)
print('Data exported to c:/backup/newMovies.csv.')
elif ch == 2:
df.to_excel('c:/backup/newMovies.xlsx', index=False)
print('Data exported to c:/backup/newMovies.xlsx.')

19
elif ch == 3:
break
else:
print("Invalid choice. Please try again.")

# Main menu
def main_menu():
while True:
print('\nMAIN MENU')
print('=' * 100)
print('1. Read CSV File')
print('2. Data Analysis Menu')
print('3. Graph Menu')
print('4. Export Data')
print('5. Exit')

try:
choice = int(input('Enter your choice: '))
except ValueError:
print("Invalid input. Please enter a number between 1 and 5.")
continue

20
if choice == 1:
read_csv_file()
elif choice == 2:
data_analysis_menu()
elif choice == 3:
graph()
elif choice == 4:
export_menu()
elif choice == 5:
print("Exiting program.")
break
else:
print("Invalid choice. Please try again.")

# Run the main menu


main_menu()

21
CHAPTER 7
OUTPUT WINDOW
7.1 MAIN MENU WINDOW

7.2 CSV FILE

22
7.3 DATAFRAME DISPLAYED FROM CSV

7.4 RUN-TIME SAMPLE OUTPUT WINDOWS

23
7.5 OUTPUT CHART

24
25
26
CHAPTER 8
CONCLUSION
This program is used to analyse the data store in a csv
file using Python’s DataFrame and visualization
techniques by matplotlib. The program generates line
chart or bar graph based on the choice given by user.

The program stores data in the csv file and uses


functions to call different choices opted by the user. The
advantage is that, a particular function or block can be
modified easily or added as per the need of the service.
The program works fine and meets all basic comparison
techniques and through visualization we are able to
represent the result graphically.
The program can be used in general analysis and also in
reel world for the comparison of rating of different
movies

27
CHAPTER 9
BIBLIOGRAPHY
 Data Handling with Pandas. NCERT. Informatics
Practices Textbook for Class 12, National Council of
Educational Research and Training,
https://ncert.nic.in/textbook/pdf/keip103.pdf. Accessed
6 January 2025.

 Data Handling – II. NCERT. Informatics Practices


Textbook for Class 12, National Council of Educational
Research and Training,
https://ncert.nic.in/textbook/pdf/keip104.pdf. Accessed
6 January 2025.

 Plotting Data Using Matplotlib. NCERT. Informatics


Practices Textbook for Class 12, National Council of
Educational Research and Training,
https://ncert.nic.in/textbook/pdf/keip105.pdf. Accessed
6 January 2025.

28
29
30
CHAPTER 2
TECHNOLOGIES USED

31
CHAPTER 3
PROGRAM CODE

CODE:

32
CHAPTER 4
Screenshots:

33
CHAPTER 5
CONCLUSION

34
Acknowledgement
I would like to express my special thanks of
gratitude to my physics teachers Padmaja
madam for giving me this golden opportunity
to do this project. I would also like to thank
my parents and friends who helped me
finishing this project. I sincerely Thank the
Principal shri R giri Sankaran Thampi sir for
providing an excellent environment and
facilities to complete the project

Sreesathya bharadwaj ks
12:B

You might also like