04 DS 2023
04 DS 2023
Lab Journal
by
Mr. Suyog Avhad
Roll no : 04
Seat No : 3604
1
S. N. Experiment Name Date Page No.
1 26-02-2023 3
Data Preparation
2 08-03-2023 15
Data Visualization – EDA
3 08-03-2023 22
Data Modeling & Hypothetical Testing
5 Assignment - 01 13-03-2023 43
6 Assignment - 02 20-03-2023 59
Experiment - 01
Data preparation using NumPy and Pandas
Problem Statement:
a. Derive an index field and add it to the data set.
2
b. Find out the missing values.
c. Obtain a listing of all records that are outliers according to any field. Print
out a listing of the 10 largest values for that field.
d. Do the following for any field. i. Standardize the variable. ii. Identify how
many outliers there are and identify the most extreme outlier.
● Data science is the study of data to extract meaningful insights for business.
● NumPy is a Linear Algebra Library for Python
● Pandas is a python library which contains high-level data structures and manipulation
tools designed to make data analysis fast and easy in Python
● A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered
collection of columns, each of which can be a different value type (numeric, string,
boolean, etc.).
● Exploratory data analysis (EDA) is used by data scientists to analyze and investigate
data sets and summarize their main characteristics, often employing data visualization
methods.
2. How to read_csv file (answer this for the platform which you are using)
3
4. How to select one/multiple column/s from dataset
4
6. Use of df.columns, head(),tail(), df.dtypes,value_counts(), isnull(),sum(), df.size,
df.shape, len(df),Hasnans, dropna(),df.count, astype(int), describe(), max(),mean(),
median(), std(), unique()
5
6
7. What is quantile , percentile, meaning of df.count(axis=1).head()
7
8. Syntax of replacing values in column?
8
10. Fillna(), bbfill, ffill
9
10
Link of Execution:https://www.kaggle.com/code/suyog045/ai-ds-exp1
11
Screenshots of Code with Output:
a. Derive an index field and add it to the data set.
c. Obtain a listing of all records that are outliers according to any field. Print out a listing of the
10 largest values for that field.
12
● Outliers
d. Do the following for any field. i. Standardize the variable. ii. Identify how many outliers there
are and identify the most extreme outlier.
13
Conclusion:
Data preparation has been done using Numpy and Pandas.
Experiment - 02
14
Title:
Exp 2 : Data Visualization / Exploratory Data Analysis for the selected data set using Matplotlib
and Seaborn
Problem Statement:
a. Create a bar graph, contingency table using any 2 variables.
b. Create a normalized histogram.
c. Describe what this graph and tables indicate?
Link of Execution:
https://www.kaggle.com/code/suyog045/ai-ds2
Screenshots of Code with Output:
a. Create a bar graph, contingency table using any 2 variables.
● Bar Graph
15
● Contingency table
Variable 1 = Year_Birth
Variable 2 = Marital_Status
16
17
18
● Verifying Contingency Table
19
● Creating Histogram
20
c. Describe what this graph and tables indicate?
● The bar plot graph indicates the plotting of values based on its frequency.
Eg: in our data, we have plotted Marital_Status against Income. As Marital_Status is categorical,
its frequency can be calculated based on the Income each Marital_Status domain gets.
Eg: in our data, we have plotted the histogram for Year_Birth, as it will look through the
frequency of each Year_Birth domain and plot the frequency wise Year_Birth count accordingly.
Conclusion:
Data Visualization / Exploratory Data Analysis for the selected data set using Matplotlib and
Seaborn is performed
Experiment - 03
21
Title:
Exp 3 : Data Modeling
Problem Statement:
a. Partition the data set, for example 75% of the records are included in the training data set and
25% are included in the test data set. Use a bar graph to confirm your proportions.
Dataset : drug200.csv
22
3. Partitioning dataset into training and testing
23
4. Storing x_train and x_test data into all together different column called as ‘isstrain’ by
assigning values as 1 & 0 respectively
24
b. Identify the total number of records in the training data set.
25
c. Validate your partition by performing a two‐sample Z‐test.
Dataset : marketing_AB.csv
1. Loading Dataset
26
2. Dividing data into input and output variable
27
4. Calculating length of x_train and x_test & calculating difference in their means
28
5. Calculating z_score
29
6. Calculating p_score
We reject null hypothesis i.e. means of target variable for both datasets are not equal
else
30
fail to reject null hypothesis i.e means of target variable for both datasets are equal
Thus here p_value for converted and most ads hour > 0.05 thus their null hypothesis is rejected
signifying that means of their target variable for both datasets are equal and p_value for total ads
< 0.05 thus its null hypothesis is accepted signifying their means of their target variable for both
datasets are not equal
Link of Execution:
https://www.kaggle.com/code/suyog045/ai-ds-exp3-1
https://www.kaggle.com/code/suyog045/ai-ds-exp3-2
Conclusion:
Data Modelling is performed
31
AIML Mini - Project
Introduction:
The problem that this project aims to address is the generation of new paintings in the style of the
famous artist Claude Monet, which can be a time-consuming and challenging task for human
artists. Generative Adversarial Networks (GANs) can be used to generate new images that mimic
Monet's style, which can save time and effort in creating new paintings in this style.
Additionally, the generated paintings can serve as a source of inspiration for artists, as well as
being used in various applications, such as interior design, fashion, and advertising. However,
there are challenges in training the GAN model to accurately capture the intricate details of
Monet's style and produce high-quality paintings that are visually appealing and aesthetically
pleasing. This project aims to address these challenges and create a GAN model that can
generate high-quality Monet-style paintings. Since our group is taking this project forward as our
final-year project, this semester we have implemented the concept of DCGAN and generated
Handwritten digits using the MNIST dataset. Through this project, we have learned about the
basic concepts of GANs, and we can now use that knowledge to generate Monet-style paintings.
Algorithms:
There are several algorithms that can be used for image generation in machine learning. Some
popular ones are:
1. Generative Adversarial Networks (GANs): GANs consist of two neural networks, a generator
and a discriminator, that work together to generate new images that look similar to the training
data. The generator learns to create new images, while the discriminator learns to differentiate
between real and fake images.
2. Convolutional Neural Networks (CNNs): CNNs are a type of neural network that are often
used for image classification, but they can also be used for image generation. CNNs can learn to
generate new images by learning the patterns in the input data and generating new images that
follow the same patterns.
3. DCGAN: DCGAN (Deep Convolutional Generative Adversarial Network) is a variant of the
generative adversarial network (GAN) architecture, specifically designed for generating high-
quality images. DCGAN uses convolutional neural networks (CNNs) in both the generator and
discriminator networks. The generator network takes a random noise vector as input and
32
generates an image, while the discriminator network takes an image as input and classifies it as
real or fake. The two networks are trained together in an adversarial manner, where the generator
tries to generate more realistic images to fool the discriminator, and the discriminator tries to
accurately classify real and fake images.
There are also many other algorithms and techniques for image generation in machine learning,
and the choice of algorithm depends on the specific task and dataset.
33
Monet Dataset:
Architecture:
The proposed architecture for this project to create a GAN to generate Monet-style paintings
would involve the following components:
1. Generator network: The generator network would be responsible for generating Monet-style
paintings from random noise vectors. It would consist of multiple layers of convolutional,
34
upsampling, and activation functions, and would output an image with the same dimensions as
the input image.
Figure 1 : Generator
Figure 2 : Discriminator
3. Loss function: The loss function would be responsible for guiding the training process of the
GAN model. It would consist of two parts - the generator loss and the discriminator loss. The
generator loss would encourage the generator network to generate paintings that are similar to
the real Monet paintings, while the discriminator loss would encourage the discriminator
network to correctly distinguish between real and fake paintings. Optimization algorithm: The
optimization algorithm would be responsible for updating the weights of the generator and
35
discriminator networks during training. It would use backpropagation and stochastic gradient
descent techniques to minimize the loss function.
4. Style transfer and image filtering techniques: These techniques would be used to fine-tune
and optimize the GAN model to improve the quality of the generated paintings. Style transfer
techniques would be used to transfer the style of different Monet paintings to the generated
paintings, while image filtering techniques would be used to enhance the visual quality of the
generated paintings.
5. User interface: The user interface would provide a user-friendly interface for generating
Monet-style paintings. It would allow users to select the input parameters like size and style,
display the generated paintings, and save or download the paintings.
6. Deployment platform: The deployment platform would host the GAN model and user
interface as a web application or a standalone desktop application. It would ensure the security
and reliability of the application, and optimize the performance of the application for real-time
use.
36
The second component we need to create is the discriminator.
We will use 3 layers in your discriminator's neural network.
37
We will train our GAN! For each epoch, we will process the entire dataset in batches. For every
batch, we will update the discriminator and generator. Then, we can see DCGAN's results!
38
You Can see the output after every 500 steps
39
40
41
Result:
This project implementing DCGAN to generate MNIST digits was successful in generating
reasonable-quality digit images. The model was able to learn the features of the training data and
generate new samples that closely resemble the real MNIST digits.
The discriminator and generator loss curves show that the model is learning and improving over
time. The discriminator loss decreases as the model becomes better at distinguishing real from
fake images, while the generator loss decreases as the model learns to generate images that better
fool the discriminator.
42
Assignment no : 01
Installation of Python
● Download the current production version of Python (2.7.1) from the Python Download
site.
● Double click on the icon of the file that you just downloaded.
● Accept the default options given to you until you get to the Finish button. Your
installation is complete.
43
● Starting at My Computer go to the following directory C:\Python27. In that folder you
should see all the Python files.
● Copy that address starting with C: and ending with 27 and close that window.
● Click on Start. Right Click on My Computer.
● Click on Properties. Click on Advanced System Settings or Advanced.
● Click on Environment Variables.
● Under System Variables search for the variable Path.
● Select Path by clicking on it. Click on Edit.
44
● Scroll all the way to the right of the field called Variable value using the right arrow.
● Add a semicolon (;) to the end and paste the path (to the Python folder) that you
previously copied. Click OK.
● Create a folder called PythonPrograms on your C:\ drive. You will be storing all your
Python programs in this folder.
● Go to Start and either type Run in the Start Search box at the bottom or click on Run.
● Type in notepad in the field called Open.
45
● In Notepad type in the following program exactly as written:
# File: Hello.py
46
● Type cd PythonPrograms and hit Enter. It should take you to the PythonPrograms folder.
● Type dir and you should see the file Hello.py.
● To run the program, type python Hello.py and hit Enter.
● You should see the line Hello World!
Python comes bundled with Mac OS X. But the version that you have is quite likely an older
version. Download the latest binary version of Python that runs on both Power PC and Intel
systems and install it on your system.
47
● In the empty TextEdit window type in the following program, exactly as given:
# File: Hello.py
● In a Terminal window, type python. This will start the Python shell. The prompt for that
is >>>
● At the Python shell prompt type import idlelib.idle
● This will start the IDLE IDE
● Start IDLE
48
● Type your program in
● Go to the File menu and click on Save. Type in filename.py This will save it as a plain
text file, which can be opened in any editor you choose (like Notepad or TextEdit).
● To run your program go to Run and click Run Module
49
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
downloaded.GetContentFile('resources.csv')
import pandas as pd
xyz = pd.read_csv('resources.csv')
50
print(xyz.head(1))
It is possible to install and run Python/TensorFlow entirely from your own computer. Google
provides TensorFlow for Windows, Mac and Linux. Previously, TensorFlow did not support
Windows. However, as of December 2016, TensorFlow supports Windows for both CPU and
GPU operation.
The first step is to install Python 3.7. As of August 2019, this is the latest version of Python 3. I
recommend using the Miniconda (Anaconda) release of Python, as it already includes many of
the data science related packages that will be needed by this class. Anaconda directly supports:
Windows, Mac and Linux. Miniconda is the minimal set of features from the very large
Anaconda Python distribution. Download Miniconda from the following URL:
· Miniconda
51
Dealing with TensorFlow incompatibility with Python 3.7
*Note: I will remove this section once all needed libraries add support for Python 3.7.
VERY IMPORTANT Once Miniconda has been downloaded you must create a Python 3.6
environment. Not all TensorFlow 2.0 packages currently (as of August 2019) support Python 3.7.
This is not unusual, usually you will need to stay one version back from the latest Python to
maximize compatibility with common machine learning packages. So you must execute the
following commands:
conda create -y --name tensorflow python=3.6
To enter this environment, you must use the following command (for Windows), this command
must be done every time you open a new Anaconda/Miniconda terminal window:
52
activate tensorflow
Installing Jupyter
53
pip install --exists-action i --upgrade sklearn
pip install --exists-action i --upgrade pandas
pip install --exists-action i --upgrade pandas-datareader
pip install --exists-action i --upgrade matplotlib
pip install --exists-action i --upgrade pillow
pip install --exists-action i --upgrade tqdm
pip install --exists-action i --upgrade requests
pip install --exists-action i --upgrade h5py
pip install --exists-action i --upgrade pyyaml
pip install --exists-action i --upgrade tensorflow_hub
pip install --exists-action i --upgrade bayesian-optimization
pip install --exists-action i --upgrade spacy
pip install --exists-action i --upgrade gensim
pip install --exists-action i --upgrade flask
pip install --exists-action i --upgrade boto3
pip install --exists-action i --upgrade gym
pip install --exists-action i --upgrade tensorflow==2.0.0-beta1
pip install --exists-action i --upgrade keras-rl2 --user
conda update -y --all
54
Notice that I am installing a specific version of TensorFlow. As of the current semester, this is
the latest version of TensorFlow. It is very likely that Google will upgrade this during this
semester. The newer version may have some incompatibilities, so it is important that we start
with this version and end with the same.
You should also link your new tensorflow environment to Jupyter so that you can choose it as a
Kernel. Always make sure to run your Jupyter notebooks from your 3.6 kernel. This is
demonstrated in the video.
python -m ipykernel install --user --name tensorflow --display-name "Python 3.6
(tensorflow)"
Python Introduction
● Anaconda v3.6 Scientific Python Distribution, including:
○ § Scikit-Learn
55
○ § Pandas
○ § Others: csv, json, numpy, scipy
● Jupyter Notebooks
● PyCharm IDE
● Cx_Oracle
● MatPlotLib
Jupyter Notebooks
Even :
Python Versions
● If you see xrange instead of range, you are dealing with Python 2
● If you see print x instead of print(x), you are dealing with Python 2
● This class uses Python 3.6!
In [1]:
3.10.0
import sys
import tensorflow.keras
import pandas as pd
import sklearn as sk
import tensorflow as tf
56
print(f"Tensor Flow Version: {tf.__version__}")
print()
print(f"Python {sys.version}")
print(f"Pandas {pd.__version__}")
print(f"Scikit-Learn {sk.__version__}")
Pandas 0.25.0
Scikit-Learn 0.21.3
57
Assignment no : 02
58
Print the top 10 rows
To print dtypes
59
To print the shape
To print describe
To find the median and filling the missing values with this median
60
To plot the histogram with equal-frequency bins using equalObs
61
To find quartiles and IQR
62
Rescaling data
Normalization
Binarization
63