Rajni Ip File Final
Rajni Ip File Final
AR L
NAME : Rajni
MANJOT
CLASS : 12 “A”
12TH-A
ROLL NO. : 25
17711593
Admission NO. : 3869
4425
SUBMITTED TO : MS. HARSHA
MS. HARSHA
CERTIFICATE
This is to certify that Ms. Rajni, a student of
Ms. Manjot
class 12th has successfully completed the
practical research on the topic of ‘Data
Handling using: Python & SQL’ under the
guidance of Ms. Harsha during the year
2024-2025.
PANDAS
Introduction to Pandas
PANDAS (PANel DAta) is a high-level data manipulation tool
used for analysing data. It is very easy to export and import
data for pandas library which has a very rich set of functions.
SERIES
A one-dimensional labeled array in Pandas, capable of
holding data of any type (e.g., integers, strings, floats). It's
similar to a column in a spreadsheet or a single Python list
but with labels (indices) for each element.
CREATION OF SERIES
• FROM SCALAR VALUES
OUTPUT
• FROM NUMPY ARRAY
OUTPUT
• FROM DICTIONARY
OUTPUT
ACCESSING ELEMENTS OF
SERIES
• INDEXING
o By using defined index
OUTPUT
OUTPUT
• SLICING
o By using defined index
OUTPUT
OUTPUT
ATTRIBUTES OF SERIES
Series:
Attributes:
● .name
● .index.name
● .values
● .size
● .empty
MATHEMATICAL OPERATION
ON SERIES
Series:
Operations:
• ADDITION
o By using + operator
• SUBSTRACTION
o By using - operator
o By using sub() function
• MULTIPLICATION
o By using * operator
• DIVISION
o By using / operator
o By using div() function
METHOD OF SERIES
Series:
Methods:
• HEAD(n)
OUTPUT
• COUNT()
OUTPUT
• TAIL(n)
OUTPUT
DATAFRAME
A two-dimensional labeled data structure in Pandas, similar
to a table in a database or a spreadsheet. It consists of rows
and columns, where each column is a Series, and it supports
various data types and operations like filtering, grouping,
and statistical analysis.
CREATION OF DATAFRAME
• FROM EMPTY DATFRAME
OUTPUT
OUTPUT
• FROM LIST OF DICTIONARY
OUTPUT
OUTPUT
OUTPUT
OUTPUT
OUTPUT
• RENAMING A NEW ROW
OUTPUT
OUTPUT
ACCESSING DATAFRAME
ELEMENT THROUGH
INDEXING
• LABEL BASED INDEXING
OUTPUT
• BOOLEAN INDEXING
OUTPUT
JOINING OF DATAFRAME
OUTPUT
ATTRIBUTES OF DATAFRAME
Dataframe:
Attributes:
ATTRIBUTE OUTPUT
NAME
CSV FILE
A COMMA SEPARATED VALUES (CSV) is a text file format
that uses a comma to separate values and newlines to
separate records. A CSV file stores tabular data in plain text,
where each line of file typically represents one record.
OUTPUT
OUTPUT
MATPLOTLIB
Matplotlib in python is used for plotting graphs and
visualization using matplotlib, with just a few lines of code
we can generate publication quality plots, Histograms, Bar
charts, Scatter plots etc.
PLOTTING MATPLOTLIB
COMPONENTS OF PLOT
OUTPUT
CUSTOMISATION OF PLOTS
PYPLOT LIBRARY GIVES US NUMEROUS FUNCTIONS WHICH
CAN BE USED TO CUSTOMISE CHARTS SUCH AS ADDING
TITLES OR LEGENDS.
OUTPUT
PANDAS PLOT FUNCTIONS
We can call the plot method by writing:
s.plot() or df.plot()
We will learn to use plot() functions to create various types
of charts. They are:
LINE CHART BAR CHART HISTOGRAM
LINE CHART
A LINE CHART displays the evolution of one or several
numeric variables.
BAR CHART
BAR plots are a type of data visualization used to represent
data in the form of rectangular bars.
HISTOGRAM
It represents distribution of continuous dataset.
PLOTTING LINE CHART
A LINE plot is a graph that shows a frequency of data along a
number line.
OUTPUT
CUSTOMISING LINE CHART
We can substitute the ticks at x-axis with a list of values, by
using plt.xticks where ticks is a list of location on x axis at which
ticks should be placed.
OUTPUT
PLOTTING BAR CHART
To plot a BAR chart, we will specify kind= “bar”. We can also
specify the DATAFRAME columns to be used as X and Y Axis.
OUTPUT
CUSTOMISING BAR CHART
We can customize the bar chart by adding certain
parameters to the plot functions. We can control the edge
color, line style and line width of the bar.
OUTPUT
PLOTTING HISTOGRAM
CHART
HISTOGRAMS are column charts where each column
represents a range of values and the Height of the Columns
corresponds to how many values are in that range.
OUTPUT
CUSTOMISING HISTOGRAM
CHART
We will explore how to leverage Pandas to customize
histograms, making it good looking and studying available
options.
OUTPUT
DATA HANDLING
USING SQL
Data handling using SQL involves managing and analyzing
data in relational databases. It includes storing, retrieving,
modifying, filtering, and combining data efficiently, ensuring
integrity and enabling insightful analysis for various
applications.
SQL
SQL (Structured Query Language) is a powerful tool for
managing and manipulating data in relational databases. It
includes operations like:
• Defining database structure (DDL)
• Querying and retrieving data (DQL)
• Modifying data (DML)
It also manages user access and permissions through DCL,
making it essential for database management and analysis.
Database query using SQL
(Mathematical, string,
Date and time functions in
SQL)
Table:
Consider table SALESMAN with following data:
SNO SNAME SALARY BONUS DATEOFJOIN
A01 Beena Mehta 30000 45.23 2019-10-29
A02 K. L. Sahay 50000 25.34 2018-03-13
B03 Nisha Thakkar 30000 35.00 2017-03-18
B04 Leela Yadav 80000 NULL 2018-12-31
C05 Gautam Gola 20000 NULL 1989-01-23
C06 Trapti Garg 70000 12.37 1987-06-15
D07 Neena Sharma 50000 27.89 1999-03-18
Queries:
• Display Salesman name, bonus after rounding
off to zero decimal places.
Select SNAME, round(BONUS,0) from SALESMAN;
• Display name, total salary of all salesman after
addition of salary and bonus and truncate it to 1
decimal places.
Select sname, truncate((SALARY+BONUS),1) from
SALESMAN;
Queries:
• Display the average price of each type of vehicle
having quantity more than 20.
Select Type, avg(price) from vehicle where
qty>20 group by Type;
• Count the type of vehicles manufactured by each
company.
Select Company, count(distinct Type) from
Vehicle group by Company;