0% found this document useful (0 votes)

4 views13 pages

PYTHON

The document provides an overview of Python programming and data science concepts, including file types and modes, as well as the Pandas library's Series and DataFrame structures. It outlines the data science lifecycle, which consists of six stages: discovery, data preparation, model planning, model building, operationalize, and communicate results. Additionally, it explains data visualization techniques using various plots such as scatter plots, bar charts, histograms, box plots, and heat maps.

Uploaded by

Poorna Chandu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views13 pages

PYTHON

Uploaded by

Poorna Chandu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 13

UNIT-III

SUBJECT: PYTHON PROGRAMMING AND DATA SCIENCE

1. Define a File? Explain about types of files? What are the different modes in Files?

Ans.) A file is the common storage unit in a computer, and all programs and data are
“written” into a file and “read” from a file.
To store data permanently , for future purpose, we use files.
Python supports two types of files :
1. text files
2.binary files.
 text files are used to store character data. Ex. File1.txt
 Binary files are used to store binary data like images, audio & video files etc.
 Binary file formats may include multiple types of data in the same file,
such as image, video, and audio data.
Different modes in Python are :
1) r --> open an existing file for read operation. The file pointer is positioned at the
beginning of the file. If the specified file does not exist then we will get
FileNotFoundError. This is default mode.
2) w --> open an existing file for write operation. If the file already contains some data then
it will be overwritten. If the specified file is not available then this mode will create that
file.
3) a --> Opens the file for appending data at the end of the file automatically. If the file
does not exist it creates a new file.
4) r+ --> To read and write data into the file. The previous data in the file will not be
deleted. The file pointer is placed at the beginning of the file.
5) w+ à Opens the file for reading and writing. If the file does not exist it creates a new
file. If a file already exists then it will get overwritten.
6) a+ à Opens the file for reading and appending. If a file already exists, the data is
appended. If the file does not exist it creates a new file.
7) x à Opens a file in exclusive creation mode for write operation. If the file already exists
then we will get FileExistsError.

EXAMPLES :
1. >>> file_handler = open("example.txt","x")
2. >>> file_handler = open("moon.txt","r")
3. >>> file_handler = open("C:\\fauna\\bison.txt","r")
4. >>> file_handler = open(r"C:\network\computer.txt","r")
5. >>> file_handler = open("titanic.txt","w+")
6. >>> file_handler = open("titanic.txt","a+")
2.Explain Series and DataFrames of Pandas with examples.
ANS.) Pandas is one of the most widely used python libraries in datascience.

1. Series :
 ASeries is a single-dimensional array structures that stores homogeneous data
i.e.,dataofasingletype.
 All the elements of a Seriesarevalue-mutableandsize-immutable.
Datacanbeofmultipledatatypessuchasndarray,lists,constants,series, dictetc.
A pandas Series can be created using the following constructor
pandas.Series(data,index,dtype,copy)
data-datatakesvariousformslikendarray,list,constants 2
index-Indexvaluesmust beuniqueandhashable,samelengthas data.
dtype-dtypeis fordatatype. IfNone, datatypewillbeinferred
copy-Copydata.Default is False.
A seriescanbecreatedusingvariousinputslike:
1.Scalarvalueorconstant
2.Dict
3.Array
 Creating an empty series.
import pandasaspd
s=pd.Series()
print s
Output:
Series([],dtype:float64)

Creatingserieswithscalarvalues.
Ex-1:
import pandas as pd
import numpyasnp
Data=[10,30,40,50,60,20,90]
#Creatingserieswithdefaultindexvalues
s=pd.Series(Data)
Print s

Ouput:

0 10
1 30
2 40
3 50
4 60
5 20
6 90

CreateaSeriesfromndarray :
Example:
import pandas as pd
import numpyasnp
data=np.array(['a','b','c','d'])
s=pd.Series(data)
prints

ouput:

1 a
2 b
3 c
4 d
dtype : object

CreateaSeriesfromdict:
Adictcanbepassedas inputandifnoindexis specified,thenthe dictionarykeysaretakeninasortedordertoconstructindex.

If index is passed, the values in data corresponding to the labels in theindexwillbepulledout.

Example:

importpandasaspd
import numpyasnp
data={'a':0.,'b':1.,'c':2.}
s=pd.Series(data)

prints

output:
a 0.0

b 1.0

c 2.0

dtype : float 64

2.DataFrame:

ADataframeisatwo-dimensionaldatastructure,i.e.,dataisalignedin a tabular fashion inrowsandcolumns.

Features of DataFrame
1.Potentiallycolumnsareofdifferenttypes
2.Size–Mutable
3.Labeled axes(rowsandcolumns)
4.CanPerformArithmeticoperations on rows and columns.
ApandasDataFramecanbecreatedusingthefollowingconstructor -
pandas.DataFrame(data,index,columns,dtype,copy)
Data-datatakesvariousformslike ndarray,series,map, lists,dict, constantsandalsoanotherDataFrame.
index-Fortherowlabels,theIndextobeusedfortheresultingframe isOptionalDefaultnp.arange(n)ifno indexispassed.
columns-Forcolumnlabels,theoptionaldefault syntax is : np.arange(n).This isonlytrueifnoindexispassed.
dtype-Datatypeofeachcolumn.
copy-This command (or whatever it is) is used for copying of data& the defaultisFalse.

A pandas DataFrame can be created using variousinputslike−

1.Lists
2.dict
3.Series
4.Numpyndarrays
5.AnotherDataFrame
CreatingDataFrame
An EmptyDataframeiscreated asfollows:
importpandasaspd
df=pd.DataFrame()
printdf
OUTPUT:
0
0 1
1 2
3 4
4 5

Create a DataFrame from Dict of ndarrays/Lists

Ex:
Import pandas as pd
data={'Name':['Tom','Jack','Steve',
'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
printdf

output:
Age Name

rank1 28 Tom
rank 2 34 Jack
rank3 29 steve
rank4 42 Ricky
Note:
All the ndarrays must be ofsamelength.If index ispassed,then the length of the index should equal
tothelength of thearrays.
If no index is passed, thenbydefault,indexwillberange(n), where n is the arraylength.

Create aDataFramefromListofDicts:
List of Dictionaries can be passed as input data tocreatea DataFrame.
The dictionary keys arebydefault takenascolumnnames.
•Ex:
importpandasaspd
data=[{'a':1,'b':2},{'a':5,'b':10,'c':20}]
df=pd.DataFrame(data)
print(df)

Output:
a b c
0 1 2 NaN
1 5 10 20.0
NaN(Not aNumber)
is appendedin missing areas.

Createa DataFramefrom DictofSeries :

Dictionary of Series can be passed to form a DataFrame.

The resultant indexis the unionof all theseries indexes passed.
Example:
import pandas pd
d={'one':pd.Series([1,2,3],
index=['a','b','c']),'two' :pd.Series([1,2,3,4],
index=['a','b','c','d'])}
df = pd.DataFrame(d)
print(df)

output:
one two
a 1.0 1
b 2.0 2cc
c 3.0 3
d NaN 4

there is no label‘d’passed,butintheresult,
for thedlabel,NaN is appended.

3. Explain Data Science life cycle.

ANS. Data Science:

Data science is a deep study of the massive amount of data, which involves extracting meaningful insights from raw,
structured, and unstructured data that is processed using the scientific method, different technologies, and algorithms.
Data Science Lifecycle:
The life-cycle of data science consists of 6 stages.
1. Discovery:
The first phase is discovery, which involves asking the right questions.
When we start any data science project, we need to determine what are the basic requirements, priorities, and
project budget.
In this phase, we need to determine all the requirements of the project such as the number of people, technology,
time, data, an end goal, and then we can frame the business problem on first hypothesis level.

2. Data preparation:
Data preparation is also known as Data Munging.
In this phase, we need to perform the following tasks:
1.Data cleaning
2.Data Reduction
3.Data integration
4.Data transformation
After performing all the above tasks, we can easily use this data for our further processes.

3. Model Planning:
In this phase, we need to determine the various methods and techniques to establish the relation between input
variables.
We will apply Exploratory data analytics(EDA) by using various statistical formulae and visualization tools to
understand the relation between variables and to see what data can inform us.
Common tools used for model planning are:
1.SQL Analysis Services
2.R
3.SAS
4.Python

4. Model-building:
In this phase, the process of model building starts.
We will create datasets for training and testing purpose.

We will apply different techniques such as association, classification, and clustering, to build the model.
Following are some common Model building tools:
1.SAS Enterprise Miner
2.WEKA
3.SPCS Modeler
4.MATLAB

5. Operationalize:
In this phase, we will deliver the final reports of the project, along with briefings, code, and technical documents.
This phase provides us a clear overview of complete project performance and other components on a small scale
before the full deployment.

6. Communicate results:
In this phase, we will check if we reach the goal, which we have set on the initial phase. We will communicate the
findings and final result with the business team.

4. What is Data Visualization? Explain scatter plot, bar chart, histogram, boxplot and heat maps with examples.

ANS. Data Visualization:

Data Visualization is the presentation of data in graphical format.
It helps in understand the significance of data by summarizingand presenting huge amount of data in a simple and
easy-to-understand format and helps communicate information clearly and effectively.
It enables stakeholders and decision makers to analyze data visually.
The data in a graphical format allows them to identify new trends and patterns easily.
Types of Plots:
The following type of plots can be created using
Matplotlib.
1.Scatter plot
2.Bar Chart
3.Histogram
4.Heat Map
5.Box Plot

Scatter Plot:
scatter plots are used to compare variables.
The data is displayed as a collection of points, each
having the value of one variable which determines the
position on the horizontal axis and the value of other
variable determines the position on the vertical axis.
scatter()function is used to draw a scatter plot.
It has several advantages:
1.It shows the correlation between variables
2.It is suitable for large data sets
3.It is easy to find clusters
4.It is possible to represent each piece of data as a point on the plot.

Example:
import matplotlib.pyplotas plt
x = [1,1.5,2,2.5,3,3.5,3.6]
y = [7.5,8,8.5,9,9.5,10,10.5]
x1=[8,8.5,9,9.5,10,10.5,11]
y1=[3,3.5,3.7,4,4.5,5,5.2]
plt.scatter(x,y, label='high income low saving',color='r')
plt.scatter(x1,y1,label='low income high savings',color='b')
plt.xlabel('saving*100')
plt.ylabel('income*1000')
plt.title('Scatter Plot')
plt.legend()
plt.show()

Bar Graph:
A bar graph uses bars to compare data among different categories.
It is well suited when we want to measurethe changes over a period of time.
It can be represented horizontally or vertically.
thebar()function is used to draw bar graphs.
Example:
from matplotlibimport pyplotas plt
plt.bar([0.25,1.25,2.25,3.25,4.25],[50,40,70,80,20],
label="BMW",width=.5)
plt.bar([.75,1.75,2.75,3.75,4.75],[80,20,20,50,60],
label="Audi", color='r',width=.5)
plt.legend()
plt.xlabel('Days')
plt.ylabel('Distance (kms)')
plt.title('Information')
plt.show()

Histograms:
Histograms are graphical representations of a probability distribution (normal distribution).
histogram is a kind of bar chart.
In MatplotlibHistogram is created using the hist() method.
Thehist()function will use an array of numbers to create a histogram, the array is sent into the function as an
argument.
A histogram chart has several advantages.
1.It displays the number of values within a specified interval.
2.It is suitable for large data sets as they can be grouped within the intervals.
Example :
import matplotlib.pyplotas plt
population_age= [22,55,62,45,21,22,34,42,42,4,2,102,95,
85,55,110,120,70,65,55,111,115,80,75,65,54,44,43,42,48]
bins = [0,10,20,30,40,50,60,70,80,90,100]
plt.hist(population_age, bins, histtype='bar',
rwidth=0.8)
plt.xlabel('age groups')
plt.ylabel('Number of people')
plt.title('Histogram')
plt.show()
Output:

Heat Maps
A heat map is a better way to visualize two-dimensional data.
Using heat maps, we can gain deeper and quicker insight into data than those
afforded by other types of plots.
It has several advantages:
1.It draws attention to the risky-prone area.
2.It uses the entire data set to draw bigger and more meaningful insights.
3.It's used for cluster analysis and deals with large data sets.

EXAMPLE:
import numpyas np
import numpy.random
import matplotlib.pyplotas plt
# Create data
x = np.random.randn(4096)
y = np.random.randn(4096)
# Create heatmap
heatmap, xedges, yedges= np.histogram2d(x, y, bins=(64,64))
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
# Plot heatmap
plt.clf()
plt.title('Python heatmap example')
plt.ylabel('y')plt.xlabel('x')
plt.imshow(heatmap, extent=extent)
plt.show()

Output:
Box Plot:
ABox Plotis also known asWhisker plotis created to display the summary of the set of data
values having properties like minimum, first quartile, median, third quartile and maximum.
In the box plot, a box is created from the first quartile to the third quartile, a verticlelineis also
there which goes through the box at the median.
Here x-axis denotes the data to be plotted while the y-axis shows the frequency distribution.
boxplot()function is used to create box plots.

matplotlib.pyplot.boxplot(data, notch=None, vert=None, patch_artist=None, widths=None)

Example:

import matplotlib.pyplotas plt

import numpyas np
# Creating dataset
np.random.seed(10)
data =
np.random.normal(100, 20,
200)
fig = plt.figure(figsize=(10,
7))
# Creating plot
plt.boxplot(data)
# show plot
plt.show()
Output:

Virtual 5G MOD APK v1.10.29 (Unlocked) - Apkmody
No ratings yet
Virtual 5G MOD APK v1.10.29 (Unlocked) - Apkmody
1 page
Manual UPS Polaris RT3000
100% (1)
Manual UPS Polaris RT3000
28 pages
DAY6 Pandas Seaborn
No ratings yet
DAY6 Pandas Seaborn
97 pages
Pandas
No ratings yet
Pandas
82 pages
Pandas Python
No ratings yet
Pandas Python
11 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
64 pages
Python Data Frame New
No ratings yet
Python Data Frame New
32 pages
18 Pandas
No ratings yet
18 Pandas
33 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
UNIT II Notes
No ratings yet
UNIT II Notes
23 pages
Python Pandas
No ratings yet
Python Pandas
96 pages
Python Pandas Interview Questions
100% (1)
Python Pandas Interview Questions
17 pages
1 Data Handling Using Pandas 1
No ratings yet
1 Data Handling Using Pandas 1
63 pages
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
No ratings yet
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
14 pages
Pandas
No ratings yet
Pandas
13 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
Python Pandas New Sylabus
No ratings yet
Python Pandas New Sylabus
53 pages
The Pandas Library
No ratings yet
The Pandas Library
39 pages
Data Handling Using Pandas - 1-2-1
No ratings yet
Data Handling Using Pandas - 1-2-1
10 pages
Numpy Data Analysis and Visualisation With Python
No ratings yet
Numpy Data Analysis and Visualisation With Python
75 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
DF 1
No ratings yet
DF 1
17 pages
Python Pandas Module - Introduction-07-11-2023
No ratings yet
Python Pandas Module - Introduction-07-11-2023
84 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Unit 4
No ratings yet
Unit 4
36 pages
UNIT-4 Important Q-A
No ratings yet
UNIT-4 Important Q-A
28 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
UNIT 3 (Chapter 2) Pandas
No ratings yet
UNIT 3 (Chapter 2) Pandas
43 pages
Pandas
No ratings yet
Pandas
163 pages
Wa0046.
No ratings yet
Wa0046.
8 pages
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
No ratings yet
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
25 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
Data Visualization1
No ratings yet
Data Visualization1
52 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
Mohit
No ratings yet
Mohit
19 pages
Class Xii Information Practices PPT On Data Handling Using Pandas-I
No ratings yet
Class Xii Information Practices PPT On Data Handling Using Pandas-I
64 pages
Pandas
No ratings yet
Pandas
16 pages
Pandas
No ratings yet
Pandas
41 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
Lecture 9 Pandas
No ratings yet
Lecture 9 Pandas
176 pages
All Document Reader 1715619870900
No ratings yet
All Document Reader 1715619870900
6 pages
Unit 5 PythonPackages (Numpy, Pandas, Tkinter)
No ratings yet
Unit 5 PythonPackages (Numpy, Pandas, Tkinter)
68 pages
Data Manipulation With Pandas
No ratings yet
Data Manipulation With Pandas
38 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Information Practices: Section A
No ratings yet
Information Practices: Section A
8 pages
Data Visualization and Data Handling Using Pandas CLASS 12 - Aashi Nagiya
No ratings yet
Data Visualization and Data Handling Using Pandas CLASS 12 - Aashi Nagiya
19 pages
Lab Manual ET Lab III
No ratings yet
Lab Manual ET Lab III
38 pages
Python Pandas
No ratings yet
Python Pandas
22 pages
Pandas
No ratings yet
Pandas
7 pages
Week 4.1
No ratings yet
Week 4.1
16 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Pandas
No ratings yet
Pandas
29 pages
Top 50 Pandas Interview Questions and Answers (2024)
No ratings yet
Top 50 Pandas Interview Questions and Answers (2024)
34 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Mastering Pandas in Python: Course Book
From Everand
Mastering Pandas in Python: Course Book
Pedro Martins
No ratings yet
Se & Ooad Mid-2 Long Answers
No ratings yet
Se & Ooad Mid-2 Long Answers
30 pages
Round Robin Algorithm For Os
No ratings yet
Round Robin Algorithm For Os
5 pages
CC 2 Marks
No ratings yet
CC 2 Marks
4 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
112 pages
Best Mt5 Mobile Bot
No ratings yet
Best Mt5 Mobile Bot
2 pages
CV Bayu Aji Pamungkas
No ratings yet
CV Bayu Aji Pamungkas
2 pages
CFD Exercise #1 - Laminar Flow Through A Pipe
No ratings yet
CFD Exercise #1 - Laminar Flow Through A Pipe
18 pages
AWS Practice Questions
No ratings yet
AWS Practice Questions
17 pages
Algorithm Analysis PDF
No ratings yet
Algorithm Analysis PDF
5 pages
Manual
No ratings yet
Manual
18 pages
Variables in Visual Basic 6
No ratings yet
Variables in Visual Basic 6
64 pages
Technical Report: Sharp Corporation
No ratings yet
Technical Report: Sharp Corporation
2 pages
Exhibition Labelling
No ratings yet
Exhibition Labelling
4 pages
MSIT 501 - Fall 2013 Syllabus
No ratings yet
MSIT 501 - Fall 2013 Syllabus
9 pages
Edr Designguidelines Hvac Simulation 2ed PDF
No ratings yet
Edr Designguidelines Hvac Simulation 2ed PDF
64 pages
SAP - ABAP - Module Pool
No ratings yet
SAP - ABAP - Module Pool
4 pages
Unit-1 DA
No ratings yet
Unit-1 DA
23 pages
Living Surface en
No ratings yet
Living Surface en
24 pages
Chapter 1 Slides Posted
No ratings yet
Chapter 1 Slides Posted
26 pages
An Ecient Algorithm For Mining Frequent Closed Itemsets
No ratings yet
An Ecient Algorithm For Mining Frequent Closed Itemsets
10 pages
Swati Trigun 23746
No ratings yet
Swati Trigun 23746
11 pages
Oracle 9i PL SQL
No ratings yet
Oracle 9i PL SQL
207 pages
Naveen I: Consultant - SAP FICO
No ratings yet
Naveen I: Consultant - SAP FICO
3 pages
Aerobask GTN Manual
No ratings yet
Aerobask GTN Manual
25 pages
4-4 Graphing Equations
No ratings yet
4-4 Graphing Equations
29 pages
Poweredge r760 Spec Sheet
No ratings yet
Poweredge r760 Spec Sheet
3 pages
Experiment No 6 - DONE
No ratings yet
Experiment No 6 - DONE
8 pages
Permission Request Form
No ratings yet
Permission Request Form
3 pages
Rhce
No ratings yet
Rhce
2 pages
How To Disable HTTP Methods
No ratings yet
How To Disable HTTP Methods
7 pages
Ieee Srs Template
No ratings yet
Ieee Srs Template
7 pages
20231120-Finalcorrigendum18112023 - JPDCL
No ratings yet
20231120-Finalcorrigendum18112023 - JPDCL
117 pages

PYTHON

Uploaded by

PYTHON

Uploaded by

UNIT-III

SUBJECT: PYTHON PROGRAMMING AND DATA SCIENCE

ADataframeisatwo-dimensionaldatastructure,i.e.,dataisalignedin a tabular fashion inrowsandcolumns.

A pandas DataFrame can be created using variousinputslike−

Create a DataFrame from Dict of ndarrays/Lists

Createa DataFramefrom DictofSeries :

Dictionary of Series can be passed to form a DataFrame.

3. Explain Data Science life cycle.

ANS. Data Science:

ANS. Data Visualization:

matplotlib.pyplot.boxplot(data, notch=None, vert=None, patch_artist=None, widths=None)

import matplotlib.pyplotas plt

You might also like