PYTHON
PYTHON
1. Define a File? Explain about types of files? What are the different modes in Files?
Ans.) A file is the common storage unit in a computer, and all programs and data are
“written” into a file and “read” from a file.
To store data permanently , for future purpose, we use files.
Python supports two types of files :
1. text files
2.binary files.
text files are used to store character data. Ex. File1.txt
Binary files are used to store binary data like images, audio & video files etc.
Binary file formats may include multiple types of data in the same file,
such as image, video, and audio data.
Different modes in Python are :
1) r --> open an existing file for read operation. The file pointer is positioned at the
beginning of the file. If the specified file does not exist then we will get
FileNotFoundError. This is default mode.
2) w --> open an existing file for write operation. If the file already contains some data then
it will be overwritten. If the specified file is not available then this mode will create that
file.
3) a --> Opens the file for appending data at the end of the file automatically. If the file
does not exist it creates a new file.
4) r+ --> To read and write data into the file. The previous data in the file will not be
deleted. The file pointer is placed at the beginning of the file.
5) w+ à Opens the file for reading and writing. If the file does not exist it creates a new
file. If a file already exists then it will get overwritten.
6) a+ à Opens the file for reading and appending. If a file already exists, the data is
appended. If the file does not exist it creates a new file.
7) x à Opens a file in exclusive creation mode for write operation. If the file already exists
then we will get FileExistsError.
EXAMPLES :
1. >>> file_handler = open("example.txt","x")
2. >>> file_handler = open("moon.txt","r")
3. >>> file_handler = open("C:\\fauna\\bison.txt","r")
4. >>> file_handler = open(r"C:\network\computer.txt","r")
5. >>> file_handler = open("titanic.txt","w+")
6. >>> file_handler = open("titanic.txt","a+")
2.Explain Series and DataFrames of Pandas with examples.
ANS.) Pandas is one of the most widely used python libraries in datascience.
1. Series :
ASeries is a single-dimensional array structures that stores homogeneous data
i.e.,dataofasingletype.
All the elements of a Seriesarevalue-mutableandsize-immutable.
Datacanbeofmultipledatatypessuchasndarray,lists,constants,series, dictetc.
A pandas Series can be created using the following constructor
pandas.Series(data,index,dtype,copy)
data-datatakesvariousformslikendarray,list,constants 2
index-Indexvaluesmust beuniqueandhashable,samelengthas data.
dtype-dtypeis fordatatype. IfNone, datatypewillbeinferred
copy-Copydata.Default is False.
A seriescanbecreatedusingvariousinputslike:
1.Scalarvalueorconstant
2.Dict
3.Array
Creating an empty series.
import pandasaspd
s=pd.Series()
print s
Output:
Series([],dtype:float64)
Creatingserieswithscalarvalues.
Ex-1:
import pandas as pd
import numpyasnp
Data=[10,30,40,50,60,20,90]
#Creatingserieswithdefaultindexvalues
s=pd.Series(Data)
Print s
Ouput:
0 10
1 30
2 40
3 50
4 60
5 20
6 90
CreateaSeriesfromndarray :
Example:
import pandas as pd
import numpyasnp
data=np.array(['a','b','c','d'])
s=pd.Series(data)
prints
ouput:
1 a
2 b
3 c
4 d
dtype : object
CreateaSeriesfromdict:
Adictcanbepassedas inputandifnoindexis specified,thenthe dictionarykeysaretakeninasortedordertoconstructindex.
If index is passed, the values in data corresponding to the labels in theindexwillbepulledout.
Example:
importpandasaspd
import numpyasnp
data={'a':0.,'b':1.,'c':2.}
s=pd.Series(data)
prints
output:
a 0.0
b 1.0
c 2.0
dtype : float 64
2.DataFrame:
Features of DataFrame
1.Potentiallycolumnsareofdifferenttypes
2.Size–Mutable
3.Labeled axes(rowsandcolumns)
4.CanPerformArithmeticoperations on rows and columns.
ApandasDataFramecanbecreatedusingthefollowingconstructor -
pandas.DataFrame(data,index,columns,dtype,copy)
Data-datatakesvariousformslike ndarray,series,map, lists,dict, constantsandalsoanotherDataFrame.
index-Fortherowlabels,theIndextobeusedfortheresultingframe isOptionalDefaultnp.arange(n)ifno indexispassed.
columns-Forcolumnlabels,theoptionaldefault syntax is : np.arange(n).This isonlytrueifnoindexispassed.
dtype-Datatypeofeachcolumn.
copy-This command (or whatever it is) is used for copying of data& the defaultisFalse.
output:
Age Name
rank1 28 Tom
rank 2 34 Jack
rank3 29 steve
rank4 42 Ricky
Note:
All the ndarrays must be ofsamelength.If index ispassed,then the length of the index should equal
tothelength of thearrays.
If no index is passed, thenbydefault,indexwillberange(n), where n is the arraylength.
Create aDataFramefromListofDicts:
List of Dictionaries can be passed as input data tocreatea DataFrame.
The dictionary keys arebydefault takenascolumnnames.
•Ex:
importpandasaspd
data=[{'a':1,'b':2},{'a':5,'b':10,'c':20}]
df=pd.DataFrame(data)
print(df)
Output:
a b c
0 1 2 NaN
1 5 10 20.0
NaN(Not aNumber)
is appendedin missing areas.
output:
one two
a 1.0 1
b 2.0 2cc
c 3.0 3
d NaN 4
there is no label‘d’passed,butintheresult,
for thedlabel,NaN is appended.
2. Data preparation:
Data preparation is also known as Data Munging.
In this phase, we need to perform the following tasks:
1.Data cleaning
2.Data Reduction
3.Data integration
4.Data transformation
After performing all the above tasks, we can easily use this data for our further processes.
3. Model Planning:
In this phase, we need to determine the various methods and techniques to establish the relation between input
variables.
We will apply Exploratory data analytics(EDA) by using various statistical formulae and visualization tools to
understand the relation between variables and to see what data can inform us.
Common tools used for model planning are:
1.SQL Analysis Services
2.R
3.SAS
4.Python
4. Model-building:
In this phase, the process of model building starts.
We will create datasets for training and testing purpose.
We will apply different techniques such as association, classification, and clustering, to build the model.
Following are some common Model building tools:
1.SAS Enterprise Miner
2.WEKA
3.SPCS Modeler
4.MATLAB
5. Operationalize:
In this phase, we will deliver the final reports of the project, along with briefings, code, and technical documents.
This phase provides us a clear overview of complete project performance and other components on a small scale
before the full deployment.
6. Communicate results:
In this phase, we will check if we reach the goal, which we have set on the initial phase. We will communicate the
findings and final result with the business team.
4. What is Data Visualization? Explain scatter plot, bar chart, histogram, boxplot and heat maps with examples.
Scatter Plot:
scatter plots are used to compare variables.
The data is displayed as a collection of points, each
having the value of one variable which determines the
position on the horizontal axis and the value of other
variable determines the position on the vertical axis.
scatter()function is used to draw a scatter plot.
It has several advantages:
1.It shows the correlation between variables
2.It is suitable for large data sets
3.It is easy to find clusters
4.It is possible to represent each piece of data as a point on the plot.
Example:
import matplotlib.pyplotas plt
x = [1,1.5,2,2.5,3,3.5,3.6]
y = [7.5,8,8.5,9,9.5,10,10.5]
x1=[8,8.5,9,9.5,10,10.5,11]
y1=[3,3.5,3.7,4,4.5,5,5.2]
plt.scatter(x,y, label='high income low saving',color='r')
plt.scatter(x1,y1,label='low income high savings',color='b')
plt.xlabel('saving*100')
plt.ylabel('income*1000')
plt.title('Scatter Plot')
plt.legend()
plt.show()
Bar Graph:
A bar graph uses bars to compare data among different categories.
It is well suited when we want to measurethe changes over a period of time.
It can be represented horizontally or vertically.
thebar()function is used to draw bar graphs.
Example:
from matplotlibimport pyplotas plt
plt.bar([0.25,1.25,2.25,3.25,4.25],[50,40,70,80,20],
label="BMW",width=.5)
plt.bar([.75,1.75,2.75,3.75,4.75],[80,20,20,50,60],
label="Audi", color='r',width=.5)
plt.legend()
plt.xlabel('Days')
plt.ylabel('Distance (kms)')
plt.title('Information')
plt.show()
Histograms:
Histograms are graphical representations of a probability distribution (normal distribution).
histogram is a kind of bar chart.
In MatplotlibHistogram is created using the hist() method.
Thehist()function will use an array of numbers to create a histogram, the array is sent into the function as an
argument.
A histogram chart has several advantages.
1.It displays the number of values within a specified interval.
2.It is suitable for large data sets as they can be grouped within the intervals.
Example :
import matplotlib.pyplotas plt
population_age= [22,55,62,45,21,22,34,42,42,4,2,102,95,
85,55,110,120,70,65,55,111,115,80,75,65,54,44,43,42,48]
bins = [0,10,20,30,40,50,60,70,80,90,100]
plt.hist(population_age, bins, histtype='bar',
rwidth=0.8)
plt.xlabel('age groups')
plt.ylabel('Number of people')
plt.title('Histogram')
plt.show()
Output:
Heat Maps
A heat map is a better way to visualize two-dimensional data.
Using heat maps, we can gain deeper and quicker insight into data than those
afforded by other types of plots.
It has several advantages:
1.It draws attention to the risky-prone area.
2.It uses the entire data set to draw bigger and more meaningful insights.
3.It's used for cluster analysis and deals with large data sets.
EXAMPLE:
import numpyas np
import numpy.random
import matplotlib.pyplotas plt
# Create data
x = np.random.randn(4096)
y = np.random.randn(4096)
# Create heatmap
heatmap, xedges, yedges= np.histogram2d(x, y, bins=(64,64))
extent = [xedges[0], xedges[-1], yedges[0], yedges[-1]]
# Plot heatmap
plt.clf()
plt.title('Python heatmap example')
plt.ylabel('y')plt.xlabel('x')
plt.imshow(heatmap, extent=extent)
plt.show()
Output:
Box Plot:
ABox Plotis also known asWhisker plotis created to display the summary of the set of data
values having properties like minimum, first quartile, median, third quartile and maximum.
In the box plot, a box is created from the first quartile to the third quartile, a verticlelineis also
there which goes through the box at the median.
Here x-axis denotes the data to be plotted while the y-axis shows the frequency distribution.
boxplot()function is used to create box plots.
Example: