Data Science Lab Manual

Data science lab

Uploaded by

Sasi Praba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

9 views18 pages

Data Science Lab Manual

Data science lab

Uploaded by

Sasi Praba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 18

LIST OF EXPERIMENTS 83362 DATA SCIENCE LABORATORY LTPC 0042 COURSE OBJECTIVES: © To understand the python libraries for data science * To understand the basic Statistical and Probability measures for data science ‘© To learn descriptive analytics on the benchmark data sets. + To apply correlation and regression analytics on standard data sets ‘© To present and interpret data using visualization packages in Python LIST OF EXPERIMENT: 1, Download, install and explore the features of NumPy, Sci Statsmodels, and Pandas packages. 2. Working with Numpy arrays. 3. Working with Pandas data frames 4. Reading data from text files, Excel, and the web and exploring various commands for doing descriptive analytics on the Iris data set 5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the following: a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness, and Kurtosis. Bivariate analysis: Linear and logistic regression modeling Multiple Regression analysis Also compare the results of the above analysis for the two data sets. Apply and explore various plotting functions on UCI data sets, ‘Normal curves Density and contour plots . Correlation and scatter plots Histograms Three-dimensional plotting Visualizing Geographic Data with Basemap . Tupyter, ap pe re anos List of Equipments:(30 Students per Batch) Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seabom, plotly, bokeh Note: Example data sets like: UCI, Iris, Pima Indians Diabetes ete.EX.NO.:1 PACKAGES FOR DATA SCIENCE IN PYTHON DATE AIM: To download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels, and Pandas packages in Python. ALGORITHM: Python is an open-source, object-oriented, and cross-platform programming Janguage. Compared to programming languages like C++ or Java, Python is very concise. It allows us to build a working software prototype in a very short time. It has become the most used language in the data scientist's toolbox. It is also a general-purpose language, and it is very flexible due to a variety of available packages that solve a wide spectrum of problems and necessities. To install the necessary packages, use ‘pip’. ‘Anaconda (ttp://continuum.io/downloads) is a Python distribution offered by Continuum Analytics that includes nearly 200 packages, which comprises NumPy, SciPy, pandas, Jupyter, Matplotlib, Scikit-learn, and NLTK. It is a cross- platform distribution (Windows, Linux, and Mac OS X) that can be installed on machines with other existing Python distributions and versions. Its base version is free; instead, add-ons that contain advanced features are charged separately. Anaconda introduces “conda’, a binary package manager, as a command-line tool to manage your package installations. Anaconda's goal is to provide enterprise- ready Python distribution for large-scale processing, predictive analytics, and scientific computing 1, Download Anaconda 2. Install Anaconda 3. Start Anaconda 4, Install data science packages 1. Download Anaconda This step downloads the Anaconda Python package for the Windows platform. Anaconda is a free and easy-to-use environment for scientific Python. 1. Visit the Anaconda homepage. 2. Click “Anaconda” from the menu and click “Download” to go to the download page. 3. Choose the download suitable for your platform (Windows, OSX, or Linux), © Choose Python 3.5 + Choose the Graphical Installer2. Install Anaconda This step installs the Anaconda Python software on the system. This step assumes that sufficient administrative privileges are contained to install software on the system, 1. Double click the downloaded file. 2. Follow the installation wizard. 3. Start Anaconda Anaconda comes with a suite of graphical tools called Anaconda Navigator. Start Anaconda Navigator by opening it from the application launcher First, start with the Anaconda command line environment called conda Conda is fast, and simple, it’s hard for error messages to hide, and you can quickly confirm your environment is installed and working correctly. 1, Open a terminal (command line window). 2. Confirm conda is installed correctly, by typing: conda -V 1 conde 4.2.9 3. Confirm Python is installed correctly by typing: python -V Python 3.5.2 11 draconds 4.2.0 (86-64) 4. Install packages With pip, a package is installed, To install the < package-name > generic package, run this ‘command: > pip install < package-name > To install the generic package, you just need to nm the following command: > conda install To install a particular version of the package S> conda install =1.11.0 To install multiple packages at once by listing all their names:> conda install

To update a package that you previously installed, you can keep on using conda: $> conda update To update all the available packages simply by using the --all argument $> conda update —all To uninstall packages using conda: > conda remove Installing Data Science Packages a, NumPy NumPy is the true analytical workhorse of the Python language. It provides the user with multidimensional arrays, along with a large set of functions to operate a multiplicity of mathematical operations on these arrays. Arrays are blocks of data arranged along multiple dimensions, which implement mathematical vectors and matrices, Characterized by optimal memory allocation, arrays are useful not just for storing data, but also for fast matrix operations (vectorization), which are indispensable when solving ad hoc data science problems, > conda install numpy b. SciPy SciPy completes NumPy's functionalities, offering a larger variety of scientific algorithms for linear algebra, sparse matrices, signal and image processing, optimization, fast Fourier transformation, and much more. > conda install scipy ¢. Statsmodels Statsmodels is a complement to SciPy's statistical functions, It features generalized linear models, discrete choice models, time series analysis, and a series of descriptive statistics as well as parametric and nonparametric testsS> conda install -c conda-forge statsmodels d. Pandas The pandas package deals with everything that NumPy and SciPy cannot do. ‘Thanks to its specific data structures, namely DataFrames and Series, pandas allow us to handle complex tables of data of different types and time series. It enables the easy and smooth loading of data from a variety of sources. The data can then be sliced, diced, handled with missing elements, added, renamed, aggregated, reshaped, and finally visualized > conda install pandas ¢. Jupyter ‘A scientific approach requires the fast experimentation of different hypotheses in a reproducible fashion. Initially named [Python and limited to working only with the Python language, Jupyter was created to address the need for an interactive command shell for several languages (based on the shell, web browser, and application interface), featuring graphical integration, customizable commands, rich history (in the JSON format), and computational parallelism for enhanced performance Steps to install Jupyter using Anaconda ‘+ Launch Anaconda Navigator * Click on the Install Jupyter Notebook ButtonRESULT Thus, the Anaconda Navigator has been successfully downloaded and installed and the data science packages such as NumPy, SciPy, Jupyter, Statsmodels, and Pandas packages were explored in Python.“WORKING WITH NUMPY ARRAYS EX. NO.:2.a BASIC NUMPY OPERATIONS DATE: AIM: To perform basic NumPy operations in python for (i) creating different types of NumPy arrays and displaying basic information, such as the data type, shape, size, and strides Gi) creating an array using built-in NumPy functions (iii) performing file operations with NumPy arrays ALGORITHM: Step 1: Start the program, Step 2: Import the NumPy Library. Step 3: Define a one-dimensional array, two-dimensional array, and three- dimensional array. Step 4: Print the memory address, the shape, the data type, and the stride of the array. Step 5: Then, create an array using built-in NumPy functions. Step 6: Perform file operations with NumPy arrays. Step 7: Display the output. Step 8: Stop the program. PROGRAM: (@ Creation of different types of Numpy arrays and displaying basic information # Importing numpy import numpy as np # Defining 1D array mylDArray = np.array((1, 8, 27, 64))print(my1DAcray) # Defining and printing 2D array my2DArray = np.array(([1, 2, 3, 4], [2, 4, 9, 16], [4, 8, 18, 32]]) print(my2D Array) #Defining and printing 3D array my3Darray = np.array(([[ 1.2.3. 4).[5,6,7,8]). ([1,2. 3.41.09, 10, 11, 12]1)) print(my3Darray) # Print out memory address print(my2D Array data) # Print the shape of array print(my2DArray shape) # Print out the data type of the array print(my2DArray.dtype) # Print the stride of the array. print(my2DArray strides) (i) Creation of an array using built-in NumPy functions # Array of ones ‘ones = np.ones((3.4)) print(ones) # Array of zeros zeros = np.zeros((2,3,4),dtype-np..int16) print(zeros) # Array with random values np.andom.random((2,2)) # Empty array ‘emptyArray = np.empty((3,2)) print(emptyArray) # Full array fullArray = np.full((2,2),7) print(full Array)# Array of evenly-spaced values evenSpacedArray = np arange(10,25,5) print(evenSpacedArray) # Array of evenly-spaced values evenSpacedArray2 = np linspace(0,2,9) print(evenSpacedArray2) ii) Performing file operations with NumPy arrays import numpy as np #initialize an array arr = np array({{[11, 11, 9, 9}, [11, 0, 2, 0), ({10. 14, 9, 14), (0,1, 11, 11))) # open a binary file in write mode file = open("arr", "wb") # save array to the file np save(file, arr) # close the file file.close # open the file in read binary mode file = open("arr", "rb") #read the file to numpy array arrl =np load(file) #elose the file print(arrl)OUTPUT: (Creation of different types of Numpy arrays and displaying basic information [1 82764] [1 23 4] [24 916) [4 81832] [f1 23 4] [5678] [1234] [91011 12])) 6.4) int32 (6, 4) Gi) Creation of an array using built-in NumPy functions (td (hi) (hid) [[[0 0.0 0} (0000) (0 000)]{[0. 000} (0000) foooo}] [0.0] [0.0] (0. 0.1] (77 (77) [1015 20] [0. 0.2505 0.751. 17 51.5 1.752. ] (iii) Performing file operations with NumPy arrays [111 9 9) {11 0 2 oJ [014 9 14) (o 1 11)]) RESULT: ‘Thus, the program to implement NumPy operations with arrays using Python has been executed and the output was verified suecessfullyb BASIC ARITHMETIC OPERATIONS WITH NUMPY To implement arithmetic operations with NumPy arrays using python. ALGORITHM: Step 1: Start the program, Step 2: Import the NumPy Library. Step 3: Initialize the NumPy arrays to two different variables. Step 4: Perform the arithmetic operations on the two arrays using NumPy Step 5: Display the output. Step 6: Stop the program. PROGRAM: import numpy as np a= np.arange(9, dtype = np.float_).reshape(3,3) print (First array-’) print (a) print (\n') print (Second array’) b= nparray({10,10,10]) print (b ) print (“\n') print (‘Add the two arrays:') print (np.add(a,b)) print (“\n') print (‘Subtract the two arrays:") print (np subtract(a,b)) print (\n')print (‘Multiply the two arrays:') print (np.multiply(a,b)) print (“\n') print (‘Divide the two arrays:’) print (np.divide(a.b)) OUTPUT: First array: [9.1.2] [3.4.5] [6.7.8] Second array: [10 10 10] Add the two arrays: [[ 10. 11.12.) [ 13.14. 15.] (16.17. 18.]] Subtract the two arrays: [[-10.-9.-8] [-7.-6.-5] [-4.-3.-2] Multiply the two arrays: [L0. 10. 20][ 30. 40. 50.] [ 60. 70. 80.]] Divide the two arrays: [[0.0.1 0.2] [030405] [0.6 0.70.8] RESULT: Thus, the program to implement NumPy arithmetic operations with arrays using Python has been executed and the output was verified successfully.EX. NO.:3 WORKING WITH PANDAS DATAFRAMES: DATE: Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a ‘two-dimensional data structure, ic., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns. In pandas, data structures can be created in two ways: series and dataframes. AIM: (To create a dataframe from a series (ii) To create a dataframe from a dictionary (iii) To create a dataframe from n-dimensional arrays (iv) To load a dataset from an external source into a pandas dataframe ALGORITHM: Step 1: Start the program. Step 2: Import the NumPy and pandas packages. Step 3: Create a dataframe for the list of elements (numbers, dictionary, and n- dimensional arrays) Step 4: Load a dataset from an extemal source into a pandas dataftame Step 5: Display the output. Step 6: Stop the program. PROGRAM: () CREATION OF 4 DATAFRAME FROM A SERIES import numpy as np import pandas as pd print("Pandas Version:", pd.__version_) pd.set_option(‘display.max_columns', 500) pa.set_option(‘display.max_rows', 500)series = pd Series({2, 3, 7, 11, 13, 17, 19,23) print(series) series_df = pd DataFrame({ "A" range(I, 5), ‘B": pd. Timestamp('20190526'), 'C': pd Series(5, index=list(range(4)), dtype='float64’), 'D' np.array([3] * 4, dtype=int64’), 'E’: pd Categorical(["Depression’, "Social Anxiety", "Bipolar Disorder" Disorder") ‘F*: Mental health’, 'G* is challenging! » print(series_df) (ii) CREATION OF A DATAFRAME FROM DICTIONARY import numpy as np import pandas as pd dict_df=[{'A': ‘Apple’, 'B': ‘Ball ,{1A': ‘Aeroplane’, 'BI'Bat’, 'C’ 'Cat’}] dict_df = pd DataFrame(dict_df) prini(dict_df) (ii) CREATION OF A DATAFRAME FROM DIMENSIONAL ARRAYS import numpy as np import pandas as pd sdf = (County':[Ostfold’, ‘Hordalane’, ‘Oslo’, ‘Hedmark’, ‘Oppland’, 'Buskerud’], '1SO-Code'[1,2,3.4,5.6], "Area’: [4180.69, 4917.94, 454.07, 27397.76, 25192.10, 14910.94], ‘Administrative centre’: ["Sarpsborg", "Oslo", "City of Oslo", "Hamar", "Lillehammer", "Drammen"}} sdf = pd-DataFrame(sdf) print(sdf) (ix) LOADING A DATASET FROM AN EXTERNAL SOURCE INTO A PANDAS DATAFRAME import numpy as np import pandas as pdcolumns=['age’, 'workclass', ‘filwgt’, ‘education’, ‘education_ num’, ‘marital_status’, ‘occupation’, ‘relationship’, ‘ethnicity’, ‘gender’, ‘capital_gain’, ‘capital_loss’,‘hours_per_week’, 'country_of origin',income’] 3 42019-05-26 Gi) Creation of a dataframe from a dictionary A BC 0 Apple Ball NaN 1 Aeroplane Bat Cat Creation of a dataframe from n-dimensional array County ISO-Code Area Administrative centre 0 Ostfold 1 4180.69 Sarpsborg 1 Hordaland = 2 4917.94 Oslo 2 Oslo 3 454.07 City of Oslo 3 Hedmark — 4 27397.76 Hamar4 Oppland 5 25192.10 Lillehammer 5 Buskerud 6 14910.94 Drammen RESULT: Thus, the programs for creating and loading pandas dataframes using Python have been implemented and the output was verified successfully.

Python Libraries and Packages For Data Science
100% (1)
Python Libraries and Packages For Data Science
5 pages
Fds PDF
No ratings yet
Fds PDF
4 pages
23CS302 - Dslab - Experiment 1
No ratings yet
23CS302 - Dslab - Experiment 1
5 pages
CS3361 - Data Science Laboratory
No ratings yet
CS3361 - Data Science Laboratory
31 pages
Final Fds Manual
No ratings yet
Final Fds Manual
77 pages
CS3362 Data Science Laboratory Alok Kumar
No ratings yet
CS3362 Data Science Laboratory Alok Kumar
50 pages
Cse-Fds Lab Manual
No ratings yet
Cse-Fds Lab Manual
74 pages
Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
DS Lab Manual
No ratings yet
DS Lab Manual
113 pages
FDS Lab Meterial CS3361
No ratings yet
FDS Lab Meterial CS3361
30 pages
Fds Record
No ratings yet
Fds Record
69 pages
Programming For Data Science
No ratings yet
Programming For Data Science
48 pages
Fdsa Manual
No ratings yet
Fdsa Manual
53 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
74 pages
Grace Python Numpy MB
No ratings yet
Grace Python Numpy MB
56 pages
Lab Manual Fds
No ratings yet
Lab Manual Fds
44 pages
Data Science Lab
No ratings yet
Data Science Lab
61 pages
Dev New
No ratings yet
Dev New
44 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
Fods Final Done
No ratings yet
Fods Final Done
67 pages
TY FDS Workbook
No ratings yet
TY FDS Workbook
56 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
62 pages
Final Fds Manual Print
No ratings yet
Final Fds Manual Print
55 pages
Cs3361 Data Science Laboratory
No ratings yet
Cs3361 Data Science Laboratory
139 pages
Exp1 Ref Doc Installation
No ratings yet
Exp1 Ref Doc Installation
6 pages
Data Ty
No ratings yet
Data Ty
59 pages
Ex - No-1 Installation and Exploration
No ratings yet
Ex - No-1 Installation and Exploration
3 pages
FDS Record Last
No ratings yet
FDS Record Last
61 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
65 pages
Cs3361-Data Science Lab Manual
No ratings yet
Cs3361-Data Science Lab Manual
44 pages
Python Week+1 New
No ratings yet
Python Week+1 New
44 pages
Practical # 8
No ratings yet
Practical # 8
16 pages
Ty B Tech - Bda - Ai315 - Lab Manual
No ratings yet
Ty B Tech - Bda - Ai315 - Lab Manual
52 pages
CS3361 - Data Science
No ratings yet
CS3361 - Data Science
56 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
Grace Python Numpy MB Final
No ratings yet
Grace Python Numpy MB Final
55 pages
Fds Lab 1-3 Exp
No ratings yet
Fds Lab 1-3 Exp
18 pages
Python For Data Science
No ratings yet
Python For Data Science
8 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
Machine Learning With Python-Python EcoSystem
No ratings yet
Machine Learning With Python-Python EcoSystem
19 pages
Essential Python Libraries
100% (1)
Essential Python Libraries
41 pages
Centurion University of Technology and Managament: Topic: Numpy, Pandas, Matplotlib
No ratings yet
Centurion University of Technology and Managament: Topic: Numpy, Pandas, Matplotlib
12 pages
Machine Learning - Python Libraries
No ratings yet
Machine Learning - Python Libraries
12 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
Machine Learning With Python: The Complete Course
No ratings yet
Machine Learning With Python: The Complete Course
16 pages
DS409 DataScience LabManual Jan2021
No ratings yet
DS409 DataScience LabManual Jan2021
41 pages
DSL Rough Draft
No ratings yet
DSL Rough Draft
34 pages
FINAL FDS MANUAL Print
No ratings yet
FINAL FDS MANUAL Print
55 pages
Top 18 Python Libraries
100% (1)
Top 18 Python Libraries
11 pages
FODS Record
No ratings yet
FODS Record
66 pages
BIG DATA Lab Record-2024
No ratings yet
BIG DATA Lab Record-2024
59 pages
Datascience Lab Manual
No ratings yet
Datascience Lab Manual
46 pages
l9 Scientific Python Proc
No ratings yet
l9 Scientific Python Proc
30 pages
Fods (1) - Merged (1) - 1
No ratings yet
Fods (1) - Merged (1) - 1
100 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
82 pages
Python NumPy
No ratings yet
Python NumPy
3 pages
ML Lab (2MCA)
No ratings yet
ML Lab (2MCA)
52 pages

Data Science Lab Manual

Uploaded by

Data Science Lab Manual

Uploaded by

You might also like