IT Project: Data Analysis and Visualization with Pandas, Matplotlib, and SQL
Name: [Your Name]
Class: 12th CBSE
Subject: Informatics Practices
School: [Your School's Name]
Project Title: Data Analysis and Visualization for Real-World Applications
Project Overview
The aim of this project is to develop a small IT application to solve a real-world problem by utilizing
data handling, visualization, and database management skills. I chose to analyze and visualize
student performance data and to manage it using SQL.
1. Data Handling (Using Pandas)
1.1 Create a Series from a Dictionary and ndarray
import pandas as pd
import numpy as np
# Series from dictionary
data_dict = {'Math': 85, 'Science': 90, 'English': 78}
series_dict = pd.Series(data_dict)
# Series from ndarray
data_array = np.array([10, 20, 30, 40])
series_array = pd.Series(data_array)
1.2 Print All Elements Above 75th Percentile in a Series
series = pd.Series([10, 50, 30, 70, 80, 90, 100])
threshold = series.quantile(0.75)
above_75 = series[series > threshold]
1.3 Data Frame for Quarterly Sales
sales_data = {
'Category': ['Electronics', 'Electronics', 'Furniture', 'Furniture'],
'Item': ['TV', 'Laptop', 'Table', 'Chair'],
'Expenditure': [500, 1200, 300, 150]
sales_df = pd.DataFrame(sales_data)
total_expenditure = sales_df.groupby('Category').sum()
1.4 Data Frame for Examination Results
exam_data = {
'Student': ['A', 'B', 'C'],
'Math': [90, 75, 82],
'Science': [88, 67, 93],
'English': [78, 85, 88]
exam_df = pd.DataFrame(exam_data)
row_labels = exam_df.index
column_labels = exam_df.columns
data_types = exam_df.dtypes
dimensions = exam_df.shape
1.5 Filter Rows Based on Criteria (e.g., Duplicate Rows)
filtered_df = exam_df.drop_duplicates()
1.6 Importing and Exporting Data between Pandas and CSV File
exam_df.to_csv('exam_data.csv', index=False)
loaded_df = pd.read_csv('exam_data.csv')
2. Data Visualization (Using Matplotlib)
2.1 Analyze and Plot School Performance Data
import matplotlib.pyplot as plt
subjects = ['Math', 'Science', 'English']
scores = [90, 75, 82]
plt.bar(subjects, scores, color='skyblue')
plt.xlabel('Subjects')
plt.ylabel('Scores')
plt.title('Student Performance')
plt.show()
2.2 Plotting Charts with Data from Data Frames
categories = total_expenditure.index
expenditures = total_expenditure['Expenditure']
plt.pie(expenditures, labels=categories, autopct='%1.1f%%')
plt.title('Expenditure by Category')
plt.show()
3. Data Management (Using SQL)
3.1 Create a Student Table with Student ID, Name, and Marks
CREATE TABLE Students (
student_id INT PRIMARY KEY,
name VARCHAR(50),
marks INT
);
3.2 Insert Details of a New Student
INSERT INTO Students (student_id, name, marks) VALUES (1, 'Alice', 85);
3.3 Delete Details of a Student
DELETE FROM Students WHERE student_id = 1;
3.4 Select Students with Marks Greater than 80
SELECT * FROM Students WHERE marks > 80;
3.5 Find Minimum, Maximum, Sum, and Average of Marks
SELECT MIN(marks), MAX(marks), SUM(marks), AVG(marks) FROM Students;
3.6 Count Total Customers from Each Country
SELECT country, COUNT(customer_id) FROM Customers GROUP BY country;
3.7 Order Students by Marks in Descending Order
SELECT student_id, marks FROM Students ORDER BY marks DESC;
Conclusion
In this project, I applied my knowledge of data handling, visualization, and database management to
analyze and visualize data meaningfully. This project demonstrates the practical application of
Python and SQL in analyzing real-world data.