0% found this document useful (0 votes)

14 views14 pages

Task 1

Uploaded by

syedafatimasajid23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views14 pages

Task 1

Uploaded by

syedafatimasajid23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

1

APTECHSOFT
Task 1

Submited By: Syeda Fatima Sajid

Submited To: Mentor Summayyea Salahuddin
Date:29 July,2024

z
2

Notebook 1:
Overview of the Iris Dataset
Dataset:
In machine learning and AI, a dataset is a collection of data used to train and test algorithms and models,
playing a crucial role in their development. Datasets can be structured, like those in spreadsheets or
databases, or unstructured, such as text or images, which require additional processing. These datasets can
be public, proprietary, or generated for specific training and testing purposes, enabling researchers and
developers to evaluate and enhance AI systems effectively.
Iris Dataset :
The Iris dataset is a classic dataset in the field of machine learning and statistics. The famous Iris
database, first used by Sir R.A. Fisher. The dataset is taken from Fisher's paper.It contains measurements
of iris flowers from three different species: Iris setosa, Iris versicolor, and Iris virginica. It has 5
columns and 150 rows.

Import Scikit-learn library:

For accessing the airis dataset first import Sckit-learn library. The Iris dataset is included in the datasets
module of scikit-learn .Itcan be loaded from the scikit-learn library using datasets.load_iris(), which
returns a dictionary with keys such as 'data', 'target', 'frame', 'target_names', 'DESCR',
'feature_names', 'filename', and 'data_module'. This dataset is provided by the sklearn.datasets.data
module, and its filename is accessible via the 'filename' key.
Dataset Structure:
Description of Iris dataset can be get using print(data["DESCR"]).
The Iris dataset contains 150 samples, each representing a single iris flower and 50 instances in each
class. The target values (species) are encoded as integers:
 0: Iris setosa
 1: Iris versicolor
 2: Iris virginica
Each feature and the corresponding target value help in classifying the type of iris flower.

z
3

Features
The dataset includes the following features:
 Sepal Length (cm)
 Sepal Width (cm)
 Petal Length (cm)
 Petal Width (cm)
Each flower in the dataset is described by these four features.

Summary:
In this notebook, I learned about the following:
 The Iris dataset, its classes, and the number of instances it contains.
 How to import the scikit-learn datasets module to access various built-in datasets for machine
learning.
 How to import the Iris dataset from the scikit-learn library.
 How to print the keys of the data dictionary.
 How to print the filename of the Iris dataset.
 How to print a description of the Iris dataset.

Notebook 2
Sections

z
4

This analysis consists of three main sections:

1. Importing Required Libraries
2. Loading the Dataset and Exploring Features and Targets
3. Creating Pandas DataFrame and Visualizing Data
Data Visualization:
Data visualization is the graphical representation of information and data using visual elements like
charts, graphs, and maps. It helps to see and understand trends, outliers, and patterns in data.
Importance: Data visualization makes data accessible, understandable, and helps in making data-driven
decisions. It is valuable across various fields, enhancing communication and understanding of data among
non-technical audiences.
Data Visualization and Big Data: In the era of Big Data, visualization is crucial for analyzing large
amounts of information. Effective data visualization balances form and function, telling a story by
highlighting useful information and removing noise.
Visualization of the Iris Dataset
For visualization of Iris dataset we have to Importing the necessary libraries.
 Scikit-learn: Provides a wide range of machine learning algorithms and tools for evaluating and
tuning models.
 Pandas: Powerful for data analysis, particularly useful for working with tabular data.
 Matplotlib: Comprehensive library for plotting graphs and visualizing data.
 Seaborn: Builds on top of Matplotlib, providing a more concise and expressive way to create
statistical plots

Load the Dataset, check features, and targets

The Iris dataset from scikit-learn was loaded and inspected, revealing features named :
1.sepal length (cm) 2. sepal width (cm)
3.petal length (cm) 4.petal width (cm)
The target values represent three iris species:
 Setosa
 versicolor
 virginica
With target array consisting of 150 entries. By examining the dataset’s features, target values,
and the shape of the target array, one gains an understanding of the data structure. This
understanding allows for the creation of visualizations, such as scatter plots, to show the
relationships between different features and how they distinguish the three iris species.

z
5

Creating Pandas DataFrame and Visualizing Data

DATAFRAME:
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data
structure with labeled axes (rows and columns). It’s a concept used in several programming
environments and libraries, most notably in Python’s pandas library.

Creating a DataFrame:
df = pd.DataFrame(data[“data”], columns=data[“feature_names”])
Explanation:
pd.DataFrame(…): This is a function from the pandas library used to create a DataFrame, which is a 2-
dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled rows and
columns.
Data[“data”]: This refers to the actual data of the Iris dataset. It is an array containing the feature values
for each sample in the dataset, it is a 2D array with 150 rows and 4 columns .
columns = data[“feature_names”]: This specifies the column names for the DataFrame.
Data[“feature_names”] is a list containing the names of the features.

Summary of the DataFrame:

df.info()
Explanation:
The df.info() method in pandas provides a concise summary of a DataFrame, including the index range,
column names, non-null counts, data types, and memory usage. This summary helps understand the
DataFrame’s structure and data completeness at a glance. This information is essential for understanding
the dataset before performing further analysis.
OUTPUT

Generating Summary Statistics:

df.describe()
The describe() method returns description of the data in the DataFrame.If the DataFrame contains
numerical data, the description contains these information for each column:
count – The number of not-empty values. Mean – The average (mean) value.
std – The standard deviation. Min – the minimum value.

z
6

25% - The 25% percentile. 50% - The 50% percentile.

75% - The 75% percentile*. Max – the maximum value.
Percentile meaning: how many of the values are less than the given percentile.
OUTPUT:

Visualizing Attributes Using Matplotlib:

The image shows a 2x2 grid of line graphs representing the four features of the Iris over 150 samples.
Each subplot displays the variation of a specific feature: sepal length and sepal width show fluctuations
between 4.5-8 cm and 2.0-4.5 cm respectively, while petal length and petal width demonstrate significant
increases around the 50th sample, highlighting patterns and differences among the samples.
Histograms

z
7

A histogram is a chart that plots the distribution of a numeric variable’s values as a series of bars. Each
bar typically covers a range of numeric values called a bin or class; a bar’s height indicates the frequency
of data points with a value within the corresponding bin.

Observations from the histograms:

 Sepal length frequency is highest between 5.5 and 6.
 Sepal width frequency is highest around 3.0 and 3.5.
 Petal length frequency is highest between 1 and 2.
 Petal width frequency is highest between 0.0 and 0.5.

Add Target Column and Count Unique Values

Df[“target”] = data[“target”]
df["target"].value_counts()
The code adds a "target" column to the DataFrame with species labels from the Iris dataset and then
counts the occurrences of each species.
Output:

Visualizing the target column:

z
8

The bar chart shows an equal count of samples for each Iris species: Iris-setosa, Iris-versicolor, and Iris-
virginica, with each species having 50 samples.
Relation between variables:

Observations:
 Setosa has smaller sepal lengths but larger sepal widths.
 Versicolor lies in the middle of the other two species in terms of sepal length and width.
 Virginica has larger sepal lengths but smaller sepal widths.
 Setosa has smaller petal lengths and widths.
 Versicolor lies in the middle of the other two species in terms of petal length and width.
 Virginica has the largest petal lengths and widths.

Summary of DataFrame Structure and Integrity:

The DataFrame consists of five columns: 'sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal
width (cm)', and 'target', all of which are numerical with no categorical columns present. The data types of
the columns are predominantly float64, except for the 'target' column which is int32. There are no missing

z
9

values in any of the columns. The first and last five rows of the DataFrame can be displayed using
df.head() and df.tail() respectively.
Handling Correlation:
The Pearson correlation method is used to calculate the correlation matrix of the DataFrame, revealing the
pairwise correlation between numeric attributes. The correlation values with respect to the target column
are extracted, and the top four correlations, both positive and negative, are identified. The features 'petal
width (cm)' and 'petal length (cm)' show the highest positive correlations with the target, at 0.9565 and
0.9490 respectively, indicating that these features are the most significant for predicting the class label.
Additionally, 'sepal length (cm)' has a positive correlation of 0.7826, while 'sepal width (cm)' has a
negative correlation of -0.4267 with the target.

Notebook 3
Machine Learning with Iris Dataset
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and
study of statistical algorithms that can learn from data and generalize to unseen data and thus perform
tasks without explicit instruction.
Steps:
For applying ML on iris dataset follow these steps:

z
10

1. Import required libraries

2. Load the Dataset, check features, and targets.
3. Make Pandas DataFrame and visualise it using Matplotlib
4. Create a Training/Testing Dataset and Peek into it.
5. Scale features between the range [0,1] and Peek into it
6. Apply Machine Learning Algorithms on Scaled Dataset
7. Optimize the Results
8. Run Predictions on Optimized Model

Create a Training/Testing Dataset and Peek into it:

Prepare the dataset for training and testing by first creating X, a DataFrame containing the four
features by removing the "target" column from the original DataFrame, and y, a Series
representing the target class or label. Then split the data into training and testing sets using
train_test_split with 60% of the data for training and 40% for testing, ensuring reproducibility
with a random seed of 42.Verify that the training set (y_train) contains 90 samples and the testing
set (y_test) contains 60 samples. Finally, print the feature values of the training set (X_train) to
peek into the training data.

Scale features between the range [0,1] and Peek into it

Feature Scaling:
“Feature scaling is a technique used to standardize the range of independent variables or features
in a dataset. It is also called data normalization and is usually done as part of data
preprocessing.”
We use the MinMaxScaler to scale the feature values to a range between 0 and 1, improving the
performance of many machine learning algorithms. First, we create an instance of the MinMaxScaler.
Then, we fit the scaler on the training data (X_train) and transform it to the specified range. Next, we use
the same scaling parameters to transform the testing data (X_test), ensuring consistent scaling.

z
11

Apply Machine Learning Algorithms on Scaled Dataset

1.Logistic Regression:
“Logistic regression is a supervised machine learning algorithm that accomplishes binary classification
tasks by predicting the probability of an outcome, event, or observation. The model delivers a binary or
dichotomous outcome limited to two possible outcomes: yes/no, 0/1, or true/false.”
Logistic regression model is applied to the Iris dataset:
1. Create Model Instance: An instance of the LogisticRegression class is created with the
multi_class parameter set to "multinomial" to handle multi-class classification problems.
2. Train Model: The model is trained using the training data (X_train and y_train), allowing it to
learn the relationship between features and target labels.
3. Make Predictions: The trained model predicts labels for the test data (X_test), generating
predictions for evaluation.
4. Calculate Accuracy: The accuracy of the model is calculated by comparing predicted labels
(pred) with actual labels (y_test), resulting in an accuracy score of 91.67%.
5. Confusion Matrix: A confusion matrix is generated to show counts of true positive, true
negative, false positive, and false negative predictions for each class, providing a detailed
performance breakdown.
6. Classification Report: A classification report is printed, including precision, recall, F1-score, and
support for each class, offering a comprehensive assessment of model performance.
Overall, the model achieves high accuracy and performs well across the three classes in the Iris dataset,
with detailed metrics showing excellent performance for Class 0, reasonable performance for Class 1, and
slightly less accuracy for Class 2.

z
12

2.Support Vector Machine (SVM)

To evaluate a Support Vector Machine (SVM) for classifying the Iris dataset, we start by creating a Linear
Support Vector Classifier (LinearSVC) to find the best boundary separating different classes. We train this
model with our training data and then use it to predict labels for the test data. We measure the model's
performance by calculating and printing its accuracy, confusion matrix, and classification report. Initially,
the model achieved 90% accuracy, which improved to 98.33% after fine-tuning with Grid Search. The
SVM is particularly useful for complex datasets with many features.
Random Forest:
“Random forest is a machine learning algorithm that creates an ensemble of multiple decision trees to
reach a singular, more accurate prediction or result.”

 Creating the model using scikit-learn's RandomForestClassifier.

 Training it with data, where X_train holds features and y_train holds labels, to understand the link
between features and the target labels.
 Using the trained model to predict labels for new data (X_test).
 Evaluating the model's performance through accuracy (percentage of correct predictions), a
confusion matrix (detailing true/false positives/negatives), and a classification report (providing
precision, recall, and F1-score for each class).
NOTE:
 Achieved the highest accuracy of approximately 98.33%.
 An ensemble learning method using multiple decision trees to improve classification
performance.
Results Optimization
Optimization was performed using Grid Search on the Linear SVC model, improving its accuracy from
90% to 98.33%.
Summary
Among the three machine learning techniques applied, Random Forest outperformed the others with the
highest accuracy, making it the most effective model for this dataset.
References:
https://en.wikipedia.org/wiki/Machine_learning
https://www.mygreatlearning.com/blog/what-is-machine-learning/
https://www.atoti.io/articles/when-to-perform-a-feature-scaling/
https://www.spiceworks.com/tech/artificial-intelligence/articles/what-is-logistic-regression/
#:~:text=Logistic%20regression%20is%20a%20supervised%20machine%20learning
%20algorithm%20that%20accomplishes,1%2C%20or%20true%2Ffalse
https://www.spiceworks.com/tech/big-data/articles/what-is-support-vector-machine/

z
13

https://www.tableau.com/learn/articles/data-visualization#:~:text=Data%20visualization%20is%20the
%20graphical,outliers%2C%20and%20patterns%20in%20data.
https://www.javatpoint.com/python-pandas-dataframe
https://www.w3schools.com/python/pandas/ref_df_describe.asp#:~:text=The%20describe()%20method
%20returns,The%20average%20(mea
https://www.atlassian.com/data/charts/histogram-complete-guide
https://www.geeksforgeeks.org/exploratory-data-analysis-on-iris-dataset/
https://encord.com/glossary/datasets-definition/

DEEP LRARNIING
 History of AI
o Early attempts at AI aimed to achieve human-level intelligence but were limited by
computational resources.
o Neural networks, inspired by human brains, started in the 1950s but were initially
outperformed by Von Neumann Architecture.
Data Augmentation and Deployment
 Expert Systems
o These were complex systems built by many engineers, with rules programmed by
humans.
o They had limitations, as the computer could only do as much as a human could program.

o Deep learning uses large datasets to learn patterns on its own, similar to how children
learn.
Pre-Trained Models
 The Deep Learning Revolution

z
14

o Two major factors: availability of data and increased computing power (GPUs).

o Deep learning flips traditional programming by letting the model learn the rules instead
of being explicitly programmed.
 When to Choose Deep Learning
o Use traditional programming for clear and straightforward tasks.

o Use deep learning for complex tasks where rules are hard to define.

Applications of Deep Learning

 Computer Vision
o Object detection, self-driving cars, robotics, and manufacturing.

 Natural Language Processing

o Real-time translation, voice recognition, virtual assistants.

 Recommender Systems
o Content curation, targeted advertising, shopping recommendations.

 Reinforcement Learning
o AI beating human experts in games like Go and video games.

 What Python libraries are used in this notebook for data analysis and visualization?
 How do you display the first five rows of a DataFrame?
 What is the shape of the target array in the Iris dataset?
 Write the code to create a Pandas DataFrame from the Iris dataset with feature names as column
names.
 . What is the Pearson correlation coefficient, and how is it useful in this analysis
 ? : How can you generate summary statistics for each numeric column in a DataFrame?
7. Write a code snippet to display a 2x2 grid of histograms for each of the four attributes of the Iris
dataset.
8. Explain the significance of the correlation values between the features and the target in the Iris dataset.
9. How can you create a pairplot of all columns in the DataFrame, distinguishing different target classes
using Seaborn?
10. Based on the scatterplots of Sepal Length vs. Sepal Width and Petal Length vs. Petal Width, what can
you infer about the relationship between these features and the different species of the Iris dataset?

Filipino 9 (Kwarter 2)
33% (6)
Filipino 9 (Kwarter 2)
165 pages
Introduction To Orange: Data Analytics Core
50% (2)
Introduction To Orange: Data Analytics Core
33 pages
Python For Kids
No ratings yet
Python For Kids
19 pages
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
No ratings yet
Dimensionality - Reduction - Principal - Component - Analysis - Ipynb at Master Llsourcell - Dimensionality - Reduction GitHub
14 pages
2021 - 2 - YBM ( ) - 5 - (13) Grammar Build Up - OK (20220630 )
No ratings yet
2021 - 2 - YBM ( ) - 5 - (13) Grammar Build Up - OK (20220630 )
35 pages
Shadowdark Class Template
No ratings yet
Shadowdark Class Template
4 pages
Paper II LDC DMR
No ratings yet
Paper II LDC DMR
9 pages
Edexcel International GCSE 9 1 Mathematics Practice Book Third Edition - 7
No ratings yet
Edexcel International GCSE 9 1 Mathematics Practice Book Third Edition - 7
1 page
The Magic of The Pen: Select Miniatures From The Khamsa of Nizami Ganjavi
No ratings yet
The Magic of The Pen: Select Miniatures From The Khamsa of Nizami Ganjavi
276 pages
Yasser Auda CCIEv5 IPv4 Multicast Study Guide PDF
No ratings yet
Yasser Auda CCIEv5 IPv4 Multicast Study Guide PDF
50 pages
Area Under The Curve PDF
No ratings yet
Area Under The Curve PDF
51 pages
G. Gevorgyan, M.markosyan+ C++
No ratings yet
G. Gevorgyan, M.markosyan+ C++
229 pages
NUMPY-case Study
100% (1)
NUMPY-case Study
4 pages
9 .ML Programs
No ratings yet
9 .ML Programs
95 pages
ML LabReport Final Index Edited
No ratings yet
ML LabReport Final Index Edited
35 pages
7th English
No ratings yet
7th English
162 pages
WJ IV Battery Introduction
No ratings yet
WJ IV Battery Introduction
102 pages
I, S, V D: Mporting Ummarizing AND Isualizing ATA
No ratings yet
I, S, V D: Mporting Ummarizing AND Isualizing ATA
18 pages
Mlpy 2
No ratings yet
Mlpy 2
18 pages
Iris Flower Classification
No ratings yet
Iris Flower Classification
47 pages
Chap5 - Wei - Ipynb - Colab
No ratings yet
Chap5 - Wei - Ipynb - Colab
29 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
ML Project Assigment
No ratings yet
ML Project Assigment
32 pages
EXPERIMENT
No ratings yet
EXPERIMENT
16 pages
Academic (1-Board of Studies) Section: If Ji D
No ratings yet
Academic (1-Board of Studies) Section: If Ji D
21 pages
‎⁨قيمة الزمن⁩
No ratings yet
‎⁨قيمة الزمن⁩
28 pages
Math Test Mix Up Worksheets RAZ
No ratings yet
Math Test Mix Up Worksheets RAZ
3 pages
Iris Dataset Project Report - Compress
No ratings yet
Iris Dataset Project Report - Compress
16 pages
Lab Manual
No ratings yet
Lab Manual
32 pages
Univariate and Multivariate Data Exploration
No ratings yet
Univariate and Multivariate Data Exploration
26 pages
EDA AnalysisA
No ratings yet
EDA AnalysisA
15 pages
ML N PY Programs
No ratings yet
ML N PY Programs
17 pages
Exno 4
No ratings yet
Exno 4
13 pages
Ass-1 Prac
No ratings yet
Ass-1 Prac
23 pages
Batch1 Ds
No ratings yet
Batch1 Ds
15 pages
Dsbda Lab - 3 - 1737952797670
No ratings yet
Dsbda Lab - 3 - 1737952797670
9 pages
Exploratory Data Analysis - Iris Dataset - by Pranshu Sharma - Analytics Vidhya - Medium
No ratings yet
Exploratory Data Analysis - Iris Dataset - by Pranshu Sharma - Analytics Vidhya - Medium
24 pages
Altair
No ratings yet
Altair
64 pages
Pandas Exercises
No ratings yet
Pandas Exercises
15 pages
DS Assignment
No ratings yet
DS Assignment
12 pages
Data Exploration and Visualisation With R: Yanchang Zhao
No ratings yet
Data Exploration and Visualisation With R: Yanchang Zhao
45 pages
Hades and The Underworld
No ratings yet
Hades and The Underworld
5 pages
Introduction To Data Visualization in Python - by Gilbert Tanner - Towards Data Science
No ratings yet
Introduction To Data Visualization in Python - by Gilbert Tanner - Towards Data Science
22 pages
Practical No - 1
No ratings yet
Practical No - 1
5 pages
2016 Imotc
No ratings yet
2016 Imotc
3 pages
Lab 5 &6
No ratings yet
Lab 5 &6
6 pages
Task 1 Iris Flower Classification Using Machine Learning
No ratings yet
Task 1 Iris Flower Classification Using Machine Learning
10 pages
2.1 Exploratory Data Analysis Using Python
No ratings yet
2.1 Exploratory Data Analysis Using Python
12 pages
Ads Exp 3
No ratings yet
Ads Exp 3
7 pages
Data Science: Objectives
No ratings yet
Data Science: Objectives
10 pages
Python (Visualization)
No ratings yet
Python (Visualization)
3 pages
Wlin35 ITMD536 ResearchPaper-V2
No ratings yet
Wlin35 ITMD536 ResearchPaper-V2
15 pages
10
No ratings yet
10
7 pages
Dataset Iris Flower. Final
No ratings yet
Dataset Iris Flower. Final
7 pages
Executive Assistant Skills Test
No ratings yet
Executive Assistant Skills Test
5 pages
Data Preprocessing Report
No ratings yet
Data Preprocessing Report
6 pages
Assignment 5'
No ratings yet
Assignment 5'
4 pages
Introduction To Matlab: Template By: İ.Yücel Özbek
No ratings yet
Introduction To Matlab: Template By: İ.Yücel Özbek
33 pages
Assignment
No ratings yet
Assignment
7 pages
Iris Classification
No ratings yet
Iris Classification
6 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
3 pages
AI Lab Exercise 3
No ratings yet
AI Lab Exercise 3
1 page
Experiment-2-1-Ml Kritika
No ratings yet
Experiment-2-1-Ml Kritika
11 pages
LESSON PLAN 6 Present P S Vs Past Simple
No ratings yet
LESSON PLAN 6 Present P S Vs Past Simple
4 pages
Data Visualization 3
No ratings yet
Data Visualization 3
3 pages
Ass 10 DSBDL
No ratings yet
Ass 10 DSBDL
9 pages
DSBDA Lab Assignment No 10
No ratings yet
DSBDA Lab Assignment No 10
3 pages
Dataviz Cheatsheet
No ratings yet
Dataviz Cheatsheet
9 pages
Recuperación Activities N 1
No ratings yet
Recuperación Activities N 1
5 pages
Howto Recover Mikrotik ADMIN Account Forgotten Password - Syed Jahanzaib Personnel Blog To Share Knowledge !
No ratings yet
Howto Recover Mikrotik ADMIN Account Forgotten Password - Syed Jahanzaib Personnel Blog To Share Knowledge !
13 pages
Lab 1
No ratings yet
Lab 1
8 pages
Assigntment 3 Python Lab
No ratings yet
Assigntment 3 Python Lab
1 page
Machine Learning in Python
No ratings yet
Machine Learning in Python
5 pages
Lab 3 - SciKitLearn ML
No ratings yet
Lab 3 - SciKitLearn ML
2 pages
Summary Data
No ratings yet
Summary Data
2 pages
Peer Evaluation-Pronunciation HGMT
No ratings yet
Peer Evaluation-Pronunciation HGMT
2 pages
OBEDIENCE
No ratings yet
OBEDIENCE
5 pages
Module 2 Iris Data Set
No ratings yet
Module 2 Iris Data Set
1 page
A Complete Guide To The Iris Dataset in R
No ratings yet
A Complete Guide To The Iris Dataset in R
3 pages
School - Based Reading Programs: An Overview
No ratings yet
School - Based Reading Programs: An Overview
5 pages
Grammar File4
No ratings yet
Grammar File4
2 pages
Iris Dataset Analysis - Notebook by Swapnil Gupta (Swapnilg4u) - Jovian
No ratings yet
Iris Dataset Analysis - Notebook by Swapnil Gupta (Swapnilg4u) - Jovian
1 page
Part A Assignment 10
No ratings yet
Part A Assignment 10
3 pages
FLCT, Cheska G.-Bsed-Eng-4-Activity
No ratings yet
FLCT, Cheska G.-Bsed-Eng-4-Activity
2 pages
FP1 U0S Grammar Practice Plus
No ratings yet
FP1 U0S Grammar Practice Plus
1 page
English 10 3rd Midterm Exam
No ratings yet
English 10 3rd Midterm Exam
1 page
Import Pandas As PD From Pandas - Tools.plotting Import Scatter - Matrix %matplotlib Inline
No ratings yet
Import Pandas As PD From Pandas - Tools.plotting Import Scatter - Matrix %matplotlib Inline
2 pages
Interview Guidelines
No ratings yet
Interview Guidelines
1 page
Efficient String Processing with Trie Structures: Definitive Reference for Developers and Engineers
From Everand
Efficient String Processing with Trie Structures: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet

Task 1

Uploaded by

Task 1

Uploaded by

1

Submited By: Syeda Fatima Sajid

Import Scikit-learn library:

This analysis consists of three main sections:

Load the Dataset, check features, and targets

Creating Pandas DataFrame and Visualizing Data

Summary of the DataFrame:

Generating Summary Statistics:

25% - The 25% percentile*. 50% - The 50% percentile*.

Visualizing Attributes Using Matplotlib:

Observations from the histograms:

Add Target Column and Count Unique Values

Visualizing the target column:

Summary of DataFrame Structure and Integrity:

1. Import required libraries

Create a Training/Testing Dataset and Peek into it:

Scale features between the range [0,1] and Peek into it

Apply Machine Learning Algorithms on Scaled Dataset

2.Support Vector Machine (SVM)

 Creating the model using scikit-learn's RandomForestClassifier.

Applications of Deep Learning

 Natural Language Processing

You might also like

25% - The 25% percentile. 50% - The 50% percentile.