0% found this document useful (0 votes)
62 views

Lab Manual Data Science

Uploaded by

saisateeshwar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
62 views

Lab Manual Data Science

Uploaded by

saisateeshwar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 24
LIST OF EXPERIMENTS NAME OF THE EXPERIMENT Working with Numpy arrays 2. | Working with Pandas data frames 3. | Basie plots using Matplotlib 4. | Frequency distributions 3._| Averages 6. | Variability 7._| Normal curves 8. | Correlation and scatter plots 9. | Correlation coefficient 10. [Regression CONTENTS SI. No. Name of the Experiment Page Ni Marks (100) staff ignature Working with Numpy arrays 2 Working with Pandas data frames 3 | Develop python program for Basic plots using Matplotlib 4 | Develop python program for Frequency distributions 5 _ | Develop python program for Variability 6 _ | Develop python program for Averages 7 | Develop python program for Normal Curves Develop python program for Correlation and scatter plots| 8 9 Develop python program for Correlation coefficient 10 Develop python program for Simple Linear Regre: Ex no: 1 Working with Numpy arrays AIM Working with Numpy arrays ALGORITHM Stepl: Start Step2: Import numpy module Step3: Print the basic characterist Step4: Stop and operactions of array PROGRAM import numpy as np # Creating array object arr = np.array( [[ 1, 2,3], (4,2, S]]) # Printing type of arr object print(""Array is of type: ", type(arr)) # Printing array dimensions (axes) print("No, of dimensions: ", arr.ndim) # Printing shape of array print("Shape of array: ", arr shape) # Printing size (total number of elements) of array print("Size of array: ", arr-size) # Printing type of elements in array print("Array stores elements of type: ", arr.dtype) OUTPUT Array is of type: No. of dimensions: 2 Shape of array: (2,3) Size of array: 6 Array stores elements of type: int32 Program to Perform Array Slicing a= nparray({[1,2,3]{3,4,51.[4,5,61) print(a) print(""After slicing") print(a{1:)) Output (23) 345] (456)] After slicing 1345] (456]] Program to Perform Array Slicing # array to begin with import numpy as np a= nparray(([1,2,3].(3,4,5].(4.5,6])) print(Our array is:") print(a) # this returns array of items in the second column print("The items in the second column are:' ) print(af...1)) print(‘n' ) ## Now we will slice all items from the second row print ("The items in the second row are:' ) print(af1,...) print('\n' ) # Now we will slice all items from column | onwards print('The items column 1 onwards are:' ) print(af.1:)) Output: Our array is: The items in the second column are: (245) The items in the second row are: (345) The items column | onwards are: 123] 145] 561] Result: ‘Thus the working with Numpy arrays was successfully completed. Ex no: 2 Create a dataframe using a list of elements. Aim: ‘To work with Pandas data frames ALGORITHM Stepl: Start Step2: import numpy and pandas module Step3: Create a dataframe using the dictionary Step4: Print the output Step5: Stop PROGRAM import numpy as np import pandas as pd data = np.array({[[",'Col1','Col2'], ['Rowl',1,2], [Row2'3,4])) print(pd.DataFrame(data=data{|:,1:], index = data[1:,0], columns=data[0,1:])) # Take a 2D array as input to your DataFrame my_2darray = np.array({[1, 2, 3], (4, 5, 61). print(pd.DataFrame(my_2darray)) #t Take a dictionary as input to your DataFrame my_di {A(T 3], 2: [1 2], 3: [2447 print(pd.DataFrame(my_dict)) # Take a DataFrame as input to your DataFrame my_df= pd.DataFrame(data~[4,5,6,7], index=range(04), columns=['A")) print(pd.DataFrame(my_df)) # Take a Series as input to your DataFrame my_series = pd.Series({"United Kingdom":"London", "India":"New Delhi", "United States":" Washington", "Belgium":"Brussels"}) print(pd.DataFrame(my_series)) df= pd.DataFrame(np.array(([1, 2, 3], [4. 5. 6]])) # Use the "shape" property print(df.shape) # Or use the “Ien()° function with the ‘index’ property print(len(df.index)) Output: Coll Col2 Rowl 1 2 Row2 3 4 012 1 more bok wre aan 0 United Kingdom — London India New Delhi United States Washington Belgium Brussels, 2,3) Result: ‘Thus the working with Pandas data frames wa: iccessfully completed, Ex. No. Basic plots using Matplotlib Aim: ‘To draw basic plots in Python program using Matplotlib ALGORITHM Step]: Start Step2: import Matplotlib module Step3: Create a Basic plots using Matplotlib Step4: Print the output Step5: Stop Program:3a # importing the required module import matplotlib.pyplot as plt # x axis values x= [12,3] # corresponding y axis values y=[2,4,1] # plotting the points plt.plot(x, y) i naming the x axis plt.xlabel('x - axis’) # naming the y axis pltylabel('y - axis’) # giving a title to my graph pletitle(My first graph!") # function to show the plot pltshow() Output: My first graph! aof 100 125 150 175 200 225 250 275 300 Program:3b import matplotlib.pyplot as plt a=[1,2,3,4,5] b=[0, 0.6, 0.2, 15, 10, 8, 16, 21] pltplot(a) o is for circles and ris # for red pltplot(b, "or") pltplot(list(range(0, 22, 3))) ‘# naming the x-axis plt.xlabel(’Day ->') # naming the y-axi pltylabel(’Temp ->") = [4, 2, 6,8, 3, 20, 13, 15] pltplot(c, label = ‘4th Rep’) if get current axes command ax = plt.gea() zet command over the individual # boundary line of the graph body ax.spines['right'].set_visible(False) ax.spinesf'top'].set_visible(False) # set the range or the bounds of # the left boundary line to fixed range ax.spinesf'left'] set_bounds(-3, 40) #f set the interval by which if the x-axis set the marks plt.xticks(list(range(-3, 10))) if set the intervals by which y-axis # set the marks plt yticks(list(range(-3, 20, 3))) #f legend denotes that what color # signifies what ax.legend({"Ist Rep’, '2nd Rep’, 1rd Rep’, '4th Rep’) # annotate command helps to write # ON THE GRAPH any text xy denotes # the position on the graph plt.annotate(’Temperature V /'s Days’, xy = (1.01, -2.15)) # gives a title to the Graph plttitle( All Features Discussed’) plt.show() Output: imps Program:4e import matplotlib.pyplot as plt a=[1,2,3,4,5] b=[0, 0.6, 0.2, 15, 10, 8, 16, 21] c=[4, 2, 6, 8, 3, 20, 13, 15] # use fig whenever u want the # output in a new window also #f specify the window size you # want ans to be displayed fig = plt.figure(figsize =(10, 10) # creating multiple plots in a # single plot subl =plisubplot(2, 2, 1) sub2 = plt.subplot(2, 2, 2) sub3 = plt.subplot(2, 2, 3) sub4 = plt.subplot(2, 2, 4) sub] plot(a,'sb’) if sets how the display subplot ## x axis values advances by 1 # within the specified range subl.set_xticks(list(range(0, 10, 1))) subl set_title('Ist Rep’) sub2.plot(b, ‘or’) # sets how the display subplot x axis # values advances by 2 within the # specified range sub2.set_xticks(list(range(0, 10, 2))) sub2.set_title(‘2nd Rep’) # can directly pass a list in the plot # function instead adding the reference sub3.plot(list(range(0, 22, 3)),'ve") sub3.set_xticks(list(range(0, 10, 1))) sub3.set_title("3rd Rep’) sub4.plot(e, Dm’) # similarly we can set the ticks for if the y-axis range(start(inclusive), # end(exclusive), step) sub4.set_yticks(list(range(0, 24, 2))) sub4.set_title('4th Rep’) # without writing pit show() no plot # will be visible pltshow() Output: 0 Result: ‘Thus the basic plots using Matplotlib in Python program was succe: fully completed. Ist Rep 2nd Rep id Rep 2 ath Rep 0 . a . , . ° Ex. No. Frequency distributions Aim: ‘To Count the frequency of occurrence of a word in a body of text is often needed during text processing. ALGORITHM Step 1: Start the Program Step 2: Create text file blake-poems.txt Step 3: Import the word_tokenize function and gutenberg Step 4: Write the code to count the frequency of occurrence of a word in a body of text Step 5: Print the result Step 6: Stop the process Program: from nltk.tokenize import word_tokenize from nltk.corpus import gutenberg sample = gutenberg.raw("blake-poems.txt") token = word_tokenize(sample) wlist = [] for i in range(50): wlist.append(token{i]) wordfreq = [wlist.count(w) for w in wlist] print("Pairsin" + str(zip(token, wordsreq))) Output: [C, D, Poems’, 1), (by', 1), (William, 1), (Blake', 1), (1789 1), (I, J, (SONGS', 2), (OF, 3), (INNOCENCE, 2), (AND', 1), (OF, 3), (EXPERIENCE, 1), (and’, 1), 1), (BOOK', 1), (of, 2), (THEL’, 1), (SONGS’, 2), (OF', 3), INNOCENCE’ 2), (INTRODUCTION, 1), (Piping’, 2), (down’, 1), (the', 1), (valleys', 1), (wild’, 1), (°, 3), Piping’, 2), (songs’, 1), (of, 2), (pleasant’, 1), (glee’, 1), (', 3), (On’, 1), (a, 2), (cloud, 1), (T', 1), (saw', 1), (a, 2), (child’, 1), (', 3), (And’, 1), (he’, 1), (aughing’, 1), (said, 1), (o', 1), (me, 1), C1), DI Result: ‘Thus the count the frequeney of occurrence of a word in a body of text is often needed during text processing and Conditional Frequency Distribution program using python was suecessfully completed, Averages To compute weig! ted averages in Python ei her defining your own functions or using ALGORITHM Step 1: Start the Program Step 2: Create the employees_salary table and save as .csv file Step 3: Import pa -kages (pandas and numpy) and the employees, alary table itself: Step 4: Calculate weighted sum and average using Numpy Average() Function Step 5 : Stop the process Program:6¢ #Method Using Numpy Average() Function weighted_avg_m3 = round(average( dfl'salary_p_year'], weights = df['employees_number'}),2) weighted_avg_m3 Output: 44225. 5 Result: Thus the compute weighted averages in Python either defining your own functions or using Numpy was successfully completed To write a python program to calculate the variance. ALGORITHM Step 1: Start the Program Step 2: Import statistics module from statistics import variance Step 3: Import fractions as parameter values from fractions import Fraction as fr Step 4: Create tuple of a set of positive and negative numbers Step 5: Print the variance of each samples Step 6: Stop the process Program: # Python code to demonstrate variance() # function on varying range of data-types # importing statisties module from statistics import variance # importing fractions as parameter values from fractions import Fraction as fr # tuple of a set of positive integers # numbers are spread apart but not very much samplel = (1, 2, 5, 4, 8, 9, 12) # tuple of a set of negative integers sample? = (-2, «4, «3, «1, -5, -6) # tuple of a set of positive and negative numbers # data-points are spread apart considerably sample3 = (-9, «1, -0, 2, 1, 3, 4, 19) # tuple of a set of fractional numbers sampled = (fr(1, 2), fr(2, 3), fr(3, 4), fir(5, 6), fr(7, 8) # tuple of a set of floating point values samples = (1.23, 1.45, 2.1, 2.2, 1.9) # Print the variance of each samples print("Variance of Samplel is % s " %(variance(sample1))) print("'Variance of Sample2 is % s " %(variance(sample2))) print("Variance of Sample3 is % s " %(variance(sample3))) print("Variance of Sampled is % s " %(variance(sample4))) print("Variance of Samples is % s " %(variance(sample5))) Output : Variance of Sample 1 is 15.80952380952381 Variance of Sample 2 is 3.5 Variance of Sample 3 is 61.125 Variance of Sample 4 is 1/45 Variance of Sample 5 is 0.17613000000000006 Result: ‘Thus the computation for variance was successfully completed Ex. No.:7 Normal Curve Aim: To create a normal curve using python program. ALGORITHM Step 1: Start the Program Step 2: Import packages scipy and call function seipy.stats Step 3: Import packages numpy, matplotlib and seaborn Step 4: Create the distribution Step 5: Visualizing the distribution Step 6: Stop the process Program: # import required libraries from seipy.stats import norm import numpy as np import matplotlib.pyplot as plt import seaborn as sb # Creating the distribution data = np.arange(1,10,0.01) pdf=norm.pdf(data , loc = 5.3 , seale= 1) +#Visualizing the distribution sb.set_style(whitegrid’) sb.lineplot(data, pdf , color = "black’) plt.xlabel(Heights’) pit ylabel( Probability Density’) Outi Result: ‘Thus the normal curve using python program was successfully completed. Correlation and scatter plots Aim: To write a python program for correlation with scatter plot ALGORITHM Step 1: Start the Program Step 2: Create variable y1, y2 Step 3: Create variable x, y3 using random function Step 4: plot the scatter plot Step 5: Print the result Step 6: Stop the process Progra # Scatterplot and Correlations # Data x-pp random randn(100) yIEx*54+9 y: ox y3-no_random.randn(100) #Plot plt.reParams update('figure figsize’ (10,8), ‘figure dpi':100}) pit scatter(x, yl, label=fyl, Correlation = {np.round(np.corrcoef(x,y1)[0,1], 2)}) atter(x, y2, label=fy2 Correlation = (np.tound(np.correoef(x,y2)[0,1], 2)}) fabel=fy3 Correlation = (np.round(np.corrcoef(x,y3){0,1], 2)}) plt titlef('Scatterplot and Correlations’) plt(legend) plt(show) Output Seaterplot and Corelations x - . Result: ‘Thus the Correlation and scatter plots using python program was successfully completed. Correlation coefficient To write a python program to compute correlation coefficient. ALGORITHM, Step 1: Start the Program Step 2: Import math package Step 3: Define correlation coefficient function Step 4: Calculate correlation using formula Step 5:Print the result Step 6 : Stop the process Program: # Python Program to find correlation coefficient, import math. # function that returns correlation coefficient. def correlationCoefficient(X, Y, n) sum_X =0 sum_Y=0 sum_XY=0 squareSum_X =0 squareSum_Y =0 i=0 while i

You might also like