0% found this document useful (0 votes)
9 views18 pages

Data Science Lab Manual

Data science lab

Uploaded by

Sasi Praba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
9 views18 pages

Data Science Lab Manual

Data science lab

Uploaded by

Sasi Praba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 18
LIST OF EXPERIMENTS 83362 DATA SCIENCE LABORATORY LTPC 0042 COURSE OBJECTIVES: © To understand the python libraries for data science * To understand the basic Statistical and Probability measures for data science ‘© To learn descriptive analytics on the benchmark data sets. + To apply correlation and regression analytics on standard data sets ‘© To present and interpret data using visualization packages in Python LIST OF EXPERIMENT: 1, Download, install and explore the features of NumPy, Sci Statsmodels, and Pandas packages. 2. Working with Numpy arrays. 3. Working with Pandas data frames 4. Reading data from text files, Excel, and the web and exploring various commands for doing descriptive analytics on the Iris data set 5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following: a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness, and Kurtosis. Bivariate analysis: Linear and logistic regression modeling Multiple Regression analysis Also compare the results of the above analysis for the two data sets. Apply and explore various plotting functions on UCI data sets, ‘Normal curves Density and contour plots . Correlation and scatter plots Histograms Three-dimensional plotting Visualizing Geographic Data with Basemap . Tupyter, ap pe re anos List of Equipments:(30 Students per Batch) Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seabom, plotly, bokeh Note: Example data sets like: UCI, Iris, Pima Indians Diabetes ete. EX.NO.:1 PACKAGES FOR DATA SCIENCE IN PYTHON DATE AIM: To download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels, and Pandas packages in Python. ALGORITHM: Python is an open-source, object-oriented, and cross-platform programming Janguage. Compared to programming languages like C++ or Java, Python is very concise. It allows us to build a working software prototype in a very short time. It has become the most used language in the data scientist's toolbox. It is also a general-purpose language, and it is very flexible due to a variety of available packages that solve a wide spectrum of problems and necessities. To install the necessary packages, use ‘pip’. ‘Anaconda (ttp://continuum.io/downloads) is a Python distribution offered by Continuum Analytics that includes nearly 200 packages, which comprises NumPy, SciPy, pandas, Jupyter, Matplotlib, Scikit-learn, and NLTK. It is a cross- platform distribution (Windows, Linux, and Mac OS X) that can be installed on machines with other existing Python distributions and versions. Its base version is free; instead, add-ons that contain advanced features are charged separately. Anaconda introduces “conda’, a binary package manager, as a command-line tool to manage your package installations. Anaconda's goal is to provide enterprise- ready Python distribution for large-scale processing, predictive analytics, and scientific computing 1, Download Anaconda 2. Install Anaconda 3. Start Anaconda 4, Install data science packages 1. Download Anaconda This step downloads the Anaconda Python package for the Windows platform. Anaconda is a free and easy-to-use environment for scientific Python. 1. Visit the Anaconda homepage. 2. Click “Anaconda” from the menu and click “Download” to go to the download page. 3. Choose the download suitable for your platform (Windows, OSX, or Linux), © Choose Python 3.5 + Choose the Graphical Installer 2. Install Anaconda This step installs the Anaconda Python software on the system. This step assumes that sufficient administrative privileges are contained to install software on the system, 1. Double click the downloaded file. 2. Follow the installation wizard. 3. Start Anaconda Anaconda comes with a suite of graphical tools called Anaconda Navigator. Start Anaconda Navigator by opening it from the application launcher First, start with the Anaconda command line environment called conda Conda is fast, and simple, it’s hard for error messages to hide, and you can quickly confirm your environment is installed and working correctly. 1, Open a terminal (command line window). 2. Confirm conda is installed correctly, by typing: conda -V 1 conde 4.2.9 3. Confirm Python is installed correctly by typing: python -V Python 3.5.2 11 draconds 4.2.0 (86-64) 4. Install packages With pip, a package is installed, To install the < package-name > generic package, run this ‘command: > pip install < package-name > To install the generic package, you just need to nm the following command: > conda install To install a particular version of the package S> conda install =1.11.0 To install multiple packages at once by listing all their names: > conda install To update a package that you previously installed, you can keep on using conda: $> conda update To update all the available packages simply by using the --all argument $> conda update —all To uninstall packages using conda: > conda remove Installing Data Science Packages a, NumPy NumPy is the true analytical workhorse of the Python language. It provides the user with multidimensional arrays, along with a large set of functions to operate a multiplicity of mathematical operations on these arrays. Arrays are blocks of data arranged along multiple dimensions, which implement mathematical vectors and matrices, Characterized by optimal memory allocation, arrays are useful not just for storing data, but also for fast matrix operations (vectorization), which are indispensable when solving ad hoc data science problems, > conda install numpy b. SciPy SciPy completes NumPy's functionalities, offering a larger variety of scientific algorithms for linear algebra, sparse matrices, signal and image processing, optimization, fast Fourier transformation, and much more. > conda install scipy ¢. Statsmodels Statsmodels is a complement to SciPy's statistical functions, It features generalized linear models, discrete choice models, time series analysis, and a series of descriptive statistics as well as parametric and nonparametric tests S> conda install -c conda-forge statsmodels d. Pandas The pandas package deals with everything that NumPy and SciPy cannot do. ‘Thanks to its specific data structures, namely DataFrames and Series, pandas allow us to handle complex tables of data of different types and time series. It enables the easy and smooth loading of data from a variety of sources. The data can then be sliced, diced, handled with missing elements, added, renamed, aggregated, reshaped, and finally visualized > conda install pandas ¢. Jupyter ‘A scientific approach requires the fast experimentation of different hypotheses in a reproducible fashion. Initially named [Python and limited to working only with the Python language, Jupyter was created to address the need for an interactive command shell for several languages (based on the shell, web browser, and application interface), featuring graphical integration, customizable commands, rich history (in the JSON format), and computational parallelism for enhanced performance Steps to install Jupyter using Anaconda ‘+ Launch Anaconda Navigator * Click on the Install Jupyter Notebook Button RESULT Thus, the Anaconda Navigator has been successfully downloaded and installed and the data science packages such as NumPy, SciPy, Jupyter, Statsmodels, and Pandas packages were explored in Python. “WORKING WITH NUMPY ARRAYS EX. NO.:2.a BASIC NUMPY OPERATIONS DATE: AIM: To perform basic NumPy operations in python for (i) creating different types of NumPy arrays and displaying basic information, such as the data type, shape, size, and strides Gi) creating an array using built-in NumPy functions (iii) performing file operations with NumPy arrays ALGORITHM: Step 1: Start the program, Step 2: Import the NumPy Library. Step 3: Define a one-dimensional array, two-dimensional array, and three- dimensional array. Step 4: Print the memory address, the shape, the data type, and the stride of the array. Step 5: Then, create an array using built-in NumPy functions. Step 6: Perform file operations with NumPy arrays. Step 7: Display the output. Step 8: Stop the program. PROGRAM: (@ Creation of different types of Numpy arrays and displaying basic information # Importing numpy import numpy as np # Defining 1D array mylDArray = np.array((1, 8, 27, 64)) print(my1DAcray) # Defining and printing 2D array my2DArray = np.array(([1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18, 32]]) print(my2D Array) #Defining and printing 3D array my3Darray = np.array(([[ 1.2.3. 4).[5,6,7,8]). ([1,2. 3.41.09, 10, 11, 12]1)) print(my3Darray) # Print out memory address print(my2D Array data) # Print the shape of array print(my2DArray shape) # Print out the data type of the array print(my2DArray.dtype) # Print the stride of the array. print(my2DArray strides) (i) Creation of an array using built-in NumPy functions # Array of ones ‘ones = np.ones((3.4)) print(ones) # Array of zeros zeros = np.zeros((2,3,4),dtype-np..int16) print(zeros) # Array with random values np.andom.random((2,2)) # Empty array ‘emptyArray = np.empty((3,2)) print(emptyArray) # Full array fullArray = np.full((2,2),7) print(full Array) # Array of evenly-spaced values evenSpacedArray = np arange(10,25,5) print(evenSpacedArray) # Array of evenly-spaced values evenSpacedArray2 = np linspace(0,2,9) print(evenSpacedArray2) ii) Performing file operations with NumPy arrays import numpy as np #initialize an array arr = np array({{[11, 11, 9, 9}, [11, 0, 2, 0), ({10. 14, 9, 14), (0,1, 11, 11))) # open a binary file in write mode file = open("arr", "wb") # save array to the file np save(file, arr) # close the file file.close # open the file in read binary mode file = open("arr", "rb") #read the file to numpy array arrl =np load(file) #elose the file print(arrl) OUTPUT: (Creation of different types of Numpy arrays and displaying basic information [1 82764] [1 23 4] [24 916) [4 81832] [f1 23 4] [5678] [1234] [91011 12])) 6.4) int32 (6, 4) Gi) Creation of an array using built-in NumPy functions (td (hi) (hid) [[[0 0.0 0} (0000) (0 000)] {[0. 000} (0000) foooo}] [0.0] [0.0] (0. 0.1] (77 (77) [1015 20] [0. 0.2505 0.751. 17 51.5 1.752. ] (iii) Performing file operations with NumPy arrays [111 9 9) {11 0 2 oJ [014 9 14) (o 1 11)]) RESULT: ‘Thus, the program to implement NumPy operations with arrays using Python has been executed and the output was verified suecessfully b BASIC ARITHMETIC OPERATIONS WITH NUMPY To implement arithmetic operations with NumPy arrays using python. ALGORITHM: Step 1: Start the program, Step 2: Import the NumPy Library. Step 3: Initialize the NumPy arrays to two different variables. Step 4: Perform the arithmetic operations on the two arrays using NumPy Step 5: Display the output. Step 6: Stop the program. PROGRAM: import numpy as np a= np.arange(9, dtype = np.float_).reshape(3,3) print (First array-’) print (a) print (\n') print (Second array’) b= nparray({10,10,10]) print (b ) print (“\n') print (‘Add the two arrays:') print (np.add(a,b)) print (“\n') print (‘Subtract the two arrays:") print (np subtract(a,b)) print (\n') print (‘Multiply the two arrays:') print (np.multiply(a,b)) print (“\n') print (‘Divide the two arrays:’) print (np.divide(a.b)) OUTPUT: First array: [9.1.2] [3.4.5] [6.7.8] Second array: [10 10 10] Add the two arrays: [[ 10. 11.12.) [ 13.14. 15.] (16.17. 18.]] Subtract the two arrays: [[-10.-9.-8] [-7.-6.-5] [-4.-3.-2] Multiply the two arrays: [L0. 10. 20] [ 30. 40. 50.] [ 60. 70. 80.]] Divide the two arrays: [[0.0.1 0.2] [030405] [0.6 0.70.8] RESULT: Thus, the program to implement NumPy arithmetic operations with arrays using Python has been executed and the output was verified successfully. EX. NO.:3 WORKING WITH PANDAS DATAFRAMES: DATE: Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a ‘two-dimensional data structure, ic., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns. In pandas, data structures can be created in two ways: series and dataframes. AIM: (To create a dataframe from a series (ii) To create a dataframe from a dictionary (iii) To create a dataframe from n-dimensional arrays (iv) To load a dataset from an external source into a pandas dataframe ALGORITHM: Step 1: Start the program. Step 2: Import the NumPy and pandas packages. Step 3: Create a dataframe for the list of elements (numbers, dictionary, and n- dimensional arrays) Step 4: Load a dataset from an extemal source into a pandas dataftame Step 5: Display the output. Step 6: Stop the program. PROGRAM: () CREATION OF 4 DATAFRAME FROM A SERIES import numpy as np import pandas as pd print("Pandas Version:", pd.__version_) pd.set_option(‘display.max_columns', 500) pa.set_option(‘display.max_rows', 500) series = pd Series({2, 3, 7, 11, 13, 17, 19,23) print(series) series_df = pd DataFrame({ "A" range(I, 5), ‘B": pd. Timestamp('20190526'), 'C': pd Series(5, index=list(range(4)), dtype='float64’), 'D' np.array([3] * 4, dtype=int64’), 'E’: pd Categorical(["Depression’, "Social Anxiety", "Bipolar Disorder" Disorder") ‘F*: Mental health’, 'G* is challenging! » print(series_df) (ii) CREATION OF A DATAFRAME FROM DICTIONARY import numpy as np import pandas as pd dict_df=[{'A': ‘Apple’, 'B': ‘Ball ,{1A': ‘Aeroplane’, 'BI'Bat’, 'C’ 'Cat’}] dict_df = pd DataFrame(dict_df) prini(dict_df) (ii) CREATION OF A DATAFRAME FROM DIMENSIONAL ARRAYS import numpy as np import pandas as pd sdf = (County':[Ostfold’, ‘Hordalane’, ‘Oslo’, ‘Hedmark’, ‘Oppland’, 'Buskerud’], '1SO-Code'[1,2,3.4,5.6], "Area’: [4180.69, 4917.94, 454.07, 27397.76, 25192.10, 14910.94], ‘Administrative centre’: ["Sarpsborg", "Oslo", "City of Oslo", "Hamar", "Lillehammer", "Drammen"}} sdf = pd-DataFrame(sdf) print(sdf) (ix) LOADING A DATASET FROM AN EXTERNAL SOURCE INTO A PANDAS DATAFRAME import numpy as np import pandas as pd columns=['age’, 'workclass', ‘filwgt’, ‘education’, ‘education_ num’, ‘marital_status’, ‘occupation’, ‘relationship’, ‘ethnicity’, ‘gender’, ‘capital_gain’, ‘capital_loss’,‘hours_per_week’, 'country_of origin',income’] 3 42019-05-26 Gi) Creation of a dataframe from a dictionary A BC 0 Apple Ball NaN 1 Aeroplane Bat Cat Creation of a dataframe from n-dimensional array County ISO-Code Area Administrative centre 0 Ostfold 1 4180.69 Sarpsborg 1 Hordaland = 2 4917.94 Oslo 2 Oslo 3 454.07 City of Oslo 3 Hedmark — 4 27397.76 Hamar 4 Oppland 5 25192.10 Lillehammer 5 Buskerud 6 14910.94 Drammen RESULT: Thus, the programs for creating and loading pandas dataframes using Python have been implemented and the output was verified successfully.

You might also like