0% found this document useful (0 votes)

5 views19 pages

Lab 3 ML

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views19 pages

Lab 3 ML

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Lab 3 ML

# Update boto3 and SageMaker

%pip install --upgrade boto3 sagemaker

%reset -f
import subprocess
import os
# Check whether the file is already in the desired path or if it needs to be downloaded
# File downloaded from source : https://archive.ics.uci.edu/ml/machine-learning-
databases/spambase/spambase.data

base_path = '/home/ec2-user/SageMaker/data/'
file_path = 'spambase.data'

if not os.path.isfile(base_path + file_path):

subprocess.run(['mkdir', '-p', base_path])
subprocess.run(['aws', 's3', 'cp',
's3://aws-tc-largeobjects/ILT-TF-200-MLDWTS/lab3/',
base_path,'--recursive'])
else:
print('File already downloaded!')

Lab 3 explanation

The provided code snippet appears to be a set of Python commands that accomplish
the following tasks:
1. Update Boto3 and SageMaker Libraries:
 The %pip install --upgrade boto3 sagemaker command is used to
upgrade the boto3 and sagemaker Python libraries to their latest
versions. These libraries are commonly used in AWS (Amazon Web
Services) for tasks like interacting with AWS services and working with
SageMaker, a service for building, training, and deploying machine
learning models on AWS.
2. Reset Python Environment:
 The %reset -f command is used to reset the Python environment, clearing
all variables and namespaces.
3. Import Necessary Libraries:
 The code imports necessary Python libraries, such as subprocess and
os, for interacting with the system and running shell commands.
4. Check File Existence and Download if Needed:
 This part of the code checks whether a file named "spambase.data" exists
in a specified path (base_path). The file is expected to be located in the
SageMaker data directory.
 If the file doesn't exist, it creates the directory specified by base_path
using the mkdir command.
 It then uses the aws s3 cp command to download files recursively from an
S3 bucket ('s3://aws-tc-largeobjects/ILT-TF-200-MLDWTS/lab3/') and
stores them in the base_path.
 The purpose of this section is to ensure that the required data file is
available locally for further processing.
5. Print Confirmation Message:
 If the file already exists in the specified path, it prints a message indicating
that the file has already been downloaded.
In summary, this code snippet is setting up the Python environment for working with
AWS SageMaker and checking whether a specific data file ("spambase.data") exists
locally. If the file is not found, it downloads it from an S3 bucket. This is a common setup
when preparing data for machine learning experiments in SageMaker, ensuring that the
necessary data is available in the SageMaker environment.

import csv
import numpy as np
data = []

f = open('/home/ec2-user/SageMaker/data/spambase.data')
reader = csv.reader(f)
next(reader, None)
for row in reader:
data.append(row)
f.close()

The provided code snippet is a Python script that reads data from a CSV (Comma-
Separated Values) file named "spambase.data" located at the specified path and stores
it in a list called "data." Here's a step-by-step explanation of what each part of the code
does:
1. Import Necessary Libraries:
 The code begins by importing two Python libraries:
 csv: This library provides functionality for reading and writing CSV
files.
 numpy (imported as np): NumPy is a popular library for numerical
computations in Python. It's often used for working with arrays and
matrices.
2. Initialize an Empty List for Data:
 An empty list named "data" is created. This list will be used to store the
data read from the CSV file.
3. Open and Read the CSV File:
 The code opens the CSV file located at
/home/ec2-user/SageMaker/data/spambase.data using the open
function. It assigns the file object to the variable f.
 A CSV reader is created by passing the file object f to csv.reader. This
reader will allow us to iterate through the rows of the CSV file.
 The next(reader, None) command is used to skip the header row of the
CSV file, assuming that the first row contains column headers. This
ensures that the data starts from the second row.
4. Iterate Through CSV Rows and Store Data:
 The code then enters a loop that iterates through each row in the CSV file
using for row in reader:.
 For each row, it appends the row as a list to the "data" list. This effectively
collects all the rows of data from the CSV file into the "data" list.
5. Close the CSV File:
 After all the data has been read and stored, the code closes the CSV file
using f.close() to release system resources associated with the file.
Once this code has been executed, the "data" list will contain the data from the CSV file,
with each row of data represented as a list of values. You can then proceed to analyze
or process this data as needed, potentially using libraries like NumPy for further
computations or machine learning tasks.

import pandas as pd

df = pd.DataFrame(data=np.array(data).astype(np.float), columns=["word_freq_make",
"word_freq_address",

The provided code snippet is using the pandas library in Python to create a DataFrame
from the data that was previously read from the CSV file and stored in the "data" list.
This DataFrame is used to structure and organize the data in a tabular format. Let's
break down what this code is doing:
1. Import Necessary Library:
 The code imports the pandas library and assigns it the alias pd. pandas
is a powerful library for data manipulation and analysis in Python.
2. Create a DataFrame:
 The code creates a DataFrame by calling the pd.DataFrame()
constructor.
3. Data for DataFrame Initialization:
 The data for initializing the DataFrame is provided within the constructor:
 data=np.array(data).astype(np.float): This part converts the
"data" list, which contains rows of data as lists of strings, into a
NumPy array of floating-point numbers (np.float). This conversion
ensures that the data is in a numerical format suitable for
DataFrame creation.
4. Column Names:
 The columns parameter is used to specify the names of the DataFrame
columns. In this case, column names are provided as a list of strings:
 ["word_freq_make", "word_freq_address", ...]: The list contains
column names that correspond to the data attributes in the
DataFrame.
5. DataFrame Structure:
 The resulting DataFrame df will have the same number of columns as
there are column names provided in the "columns" parameter. Each
column will contain the data from the "data" list, and the column names
will match the names specified in the "columns" list.
6. DataFrame Usage:
 Once the DataFrame is created, you can use various pandas methods
and operations to perform data analysis, exploration, and manipulation on
the structured data.
Overall, this code snippet is transforming the raw data from the CSV file into a
structured DataFrame, making it easier to work with and analyze the data using pandas
capabilities. The provided column names correspond to the attributes of the data,
allowing for more meaningful and organized data handling.

df.head()
he df.head() command is used to display the first few rows of a DataFrame in Python,
specifically the DataFrame named df that you created earlier using pandas. When you
execute df.head(), it returns a preview of the DataFrame with its top rows. By default, it
shows the first 5 rows, but you can specify a different number of rows to display by
passing an argument to the head() method.
Df.describe

The df.describe() method in pandas is used to generate descriptive statistics of the

numeric (or continuous) columns within a DataFrame. When you execute df.describe(),
it provides summary statistics for each of these numeric columns.

from sklearn.model_selection import train_test_split

# Get the feature values until the target column (not included)
X = df.values[:, :-1].astype(np.float32)

# Get the target column

y = df.values[:, -1].astype(np.float32)

# Get 80% of the data for training; the remaining 20% will be for validation and test
train_features, test_features, train_labels, test_labels = train_test_split(X, y,
test_size=0.2)

# Split the remaining 20% of data as 10% test and 10% validation
test_features, val_features, test_labels, val_labels = train_test_split(test_features,
test_labels, test_size=0.5)

print(f"Length of train_features is: {train_features.shape}")

print(f"Length of train_labels is: {train_labels.shape}")
print(f"Length of val_features is: {val_features.shape}")
print(f"Length of val_labels is: {val_labels.shape}")
print(f"Length of test_features is: {test_features.shape}")
print(f"Length of test_labels is: {test_labels.shape}")
Certainly! The provided code is all about splitting a dataset into three parts: training,
validation, and testing. Let me explain it in simple terms:
1. Importing Libraries:
 The code starts by importing a library called train_test_split from scikit-
learn (sklearn). This library helps in dividing the dataset into parts for
training and testing.
2. Preparing the Data:
 The dataset (df) contains both the features (the things we use to make
predictions) and the target column (the thing we want to predict).
 X is created to hold the feature values. It takes all the columns except the
last one (target column) and converts them into floating-point numbers.
 y is created to hold the target column and also converts it into floating-
point numbers.
3. Splitting the Data:
 The dataset is divided into three parts: training, validation, and test.
 train_test_split function is used to do this splitting.
 Initially, 80% of the data is set aside for training (train_features and
train_labels), and the remaining 20% is for testing (test_features and
test_labels).
4. Further Splitting for Validation:
 The 20% of data set aside for testing is further split into two equal parts:
validation and final testing.
 This is done to have a portion of the data for fine-tuning the model
(validation) and another portion to evaluate the model after fine-tuning
(final testing).
 The code achieves this by using train_test_split again on the testing
data.
5. Displaying Data Lengths:
 Finally, the code prints the lengths (number of rows) of each of these
datasets to show how the data has been divided.
Here's a summary:
 The data is divided into three parts: training, validation, and testing.
 Training data is used to teach the model.
 Validation data is used to adjust and improve the model.
 Testing data is used to evaluate how well the model performs after all the
adjustments.
This splitting process helps ensure that the model is trained on one set of data, fine-
tuned on another set, and tested on a completely separate set to assess its real-world
performance.

import sagemaker

# Call the LinearLearner estimator object

binary_estimator = sagemaker.LinearLearner(role=sagemaker.get_execution_role(),
instance_count=1,
instance_type='ml.m4.xlarge',
predictor_type='binary_classifier')

Certainly! The provided code is configuring and creating an Amazon SageMaker Linear
Learner model for binary classification. Let's break down what each part of the code
does:
1. Import SageMaker Library:
 The code begins by importing the sagemaker library. Amazon SageMaker
is a cloud service that allows you to build, train, and deploy machine
learning models, and this library provides the necessary tools and
functions to work with SageMaker services in Python.
2. Create a Linear Learner Estimator:
 The main part of the code involves creating a Linear Learner estimator
object. An estimator in SageMaker represents a machine learning model
that can be trained and deployed.
 The sagemaker.LinearLearner class is used to create this estimator.
3. Configuration Parameters:
 Several configuration parameters are passed to the
sagemaker.LinearLearner constructor to configure the estimator:
 role: This parameter specifies the IAM (Identity and Access
Management) role that SageMaker should assume when
performing operations. It is obtained using
sagemaker.get_execution_role(), which retrieves the SageMaker
execution role associated with your environment.
 instance_count: This parameter specifies the number of Amazon
EC2 instances to use for training the model. In this case, it's set to
1 instance.
 instance_type: This parameter specifies the type of Amazon EC2
instance to use for training. Here, it's set to 'ml.m4.xlarge', which
represents a specific type of compute instance optimized for
machine learning workloads.
 predictor_type: This parameter indicates that the Linear Learner
will be used as a binary classifier, making predictions for binary
classification tasks (e.g., spam detection, fraud detection).
4. Estimator Object:
 The binary_estimator variable now holds the Linear Learner estimator
object, fully configured and ready to be trained.
In summary, this code sets up a SageMaker Linear Learner estimator for binary
classification. It defines various configuration parameters, such as the instance type,
instance count, and role, which are necessary for training and deploying the model on
Amazon SageMaker. Once configured, you can proceed to train the model using data
and deploy it for making predictions on new data.

train_records = binary_estimator.record_set(train_features, train_labels, channel='train')

val_records = binary_estimator.record_set(val_features, val_labels,
channel='validation')
test_records = binary_estimator.record_set(test_features, test_labels, channel='test')

The provided code snippet is preparing the data for training a machine learning model
using the Amazon SageMaker Linear Learner estimator. It involves converting the
training, validation, and testing datasets into a specific data format called "RecordSet"
that is compatible with SageMaker. Let's break down what each line of code does:
1. train_records Line:
 train_records = binary_estimator.record_set(train_features,
train_labels, channel='train')
 This line creates a "RecordSet" from the training data.
 train_features and train_labels are the feature data (input) and target
labels (output) for training, respectively.
 channel='train' specifies that this RecordSet is associated with the
training channel. In SageMaker, a "channel" is a named data source that
you can use to organize and manage data for different purposes, such as
training or validation.
2. val_records Line:
 val_records = binary_estimator.record_set(val_features, val_labels,
channel='validation')
 Similar to the previous line, this line creates a RecordSet, but for the
validation data.
 val_features and val_labels represent the feature data and target labels
for validation.
 channel='validation' specifies that this RecordSet is associated with the
validation channel.
3. test_records Line:
 test_records = binary_estimator.record_set(test_features,
test_labels, channel='test')
 This line creates a RecordSet for the test data.
 test_features and test_labels contain the feature data and target labels
for testing.
 channel='test' indicates that this RecordSet is associated with the test
channel.
In summary, these lines of code prepare the data in a format that SageMaker can work
with during model training. The data is organized into RecordSets, each associated with
a specific data channel (train, validation, or test). This separation helps SageMaker
manage and use the data appropriately during the training process, ensuring that the
model is trained on the training data, validated on the validation data, and tested on the
test data.

binary_estimator.fit([train_records, val_records, test_records])

The provided code is instructing Amazon SageMaker to fit (train) a machine learning
model using the binary_estimator on the prepared data, which includes training,
validation, and testing datasets in RecordSet format. Let's break down what this code
does:
1. binary_estimator.fit([...]) Line:
 This line invokes the fit method on the binary_estimator object. The fit
method is used to train a machine learning model using SageMaker.
 Inside the fit method, a list [train_records, val_records, test_records] is
passed as an argument. This list contains the RecordSets for the training,
validation, and testing data.
Here's what happens when you call binary_estimator.fit([...]):
 SageMaker uses the Linear Learner estimator (binary_estimator) to train a
binary classification model.
 During training, the model learns patterns and relationships in the training data to
make predictions on binary classification tasks.
 The training process typically involves multiple iterations or epochs, where the
model adjusts its internal parameters to minimize a loss function and improve its
predictive accuracy.
 The validation data (val_records) is used periodically during training to evaluate
the model's performance on data it hasn't seen during training. This helps in
monitoring the model's generalization and preventing overfitting.
 The testing data (test_records) is not used during training but is kept separate
for evaluating the model's performance after training is complete. It provides an
unbiased assessment of how well the model will perform on new, unseen data.
In summary, the binary_estimator.fit([...]) line is the key step for training a binary
classification model using SageMaker. It uses the provided RecordSets for training,
validation, and testing to create and fine-tune the model, ensuring that it can make
accurate binary classification predictions.

sagemaker.analytics.TrainingJobAnalytics(binary_estimator._current_job_name,
metric_names = ['test:binary_classification_accuracy',
'test:precision',
'test:recall']
).dataframe()

The provided code is used to obtain training job analytics data from an Amazon
SageMaker training job. It fetches specific metrics related to the performance of the
machine learning model trained using the binary_estimator. Let's break down what this
code does:
1. sagemaker.analytics.TrainingJobAnalytics(...) Line:
 This line of code creates an instance of the TrainingJobAnalytics class
from SageMaker's analytics module. This class is used to retrieve
analytics data from a SageMaker training job.
 The TrainingJobAnalytics constructor takes two main arguments:
 binary_estimator._current_job_name: This represents the name
of the current SageMaker training job associated with the
binary_estimator. It's used to specify which training job's analytics
data you want to retrieve.
 metric_names: This is a list of specific metrics that you want to
retrieve from the training job's analytics data. In this case, the code
is interested in three metrics: binary classification accuracy,
precision, and recall.
2. .dataframe() Method:
 After creating the TrainingJobAnalytics instance, the .dataframe()
method is called on it.
 This method retrieves the analytics data from the specified training job and
returns it as a Pandas DataFrame. The DataFrame will contain the
requested metrics along with other relevant information.
The purpose of this code is to fetch performance metrics from a SageMaker training job.
These metrics are essential for evaluating how well the machine learning model trained
by binary_estimator performs on the test data. The specific metrics chosen, such as
accuracy, precision, and recall, provide insights into the model's predictive performance,
especially in binary classification tasks.
Once you execute this code, you will have access to the training job analytics data in a
DataFrame format, allowing you to further analyze and visualize the model's
performance metrics.

binary_predictor = binary_estimator.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge')

The provided code is used to deploy the trained Amazon SageMaker Linear Learner
model as an endpoint for making predictions on new data. Let's break down what each
part of the code does:
1. binary_predictor Variable Assignment:
 The code assigns the result of the deployment operation to the
binary_predictor variable. This variable will be used to interact with the
deployed model for making predictions.
2. binary_estimator.deploy(...) Line:
 This line invokes the deploy method on the binary_estimator object. The
deploy method is used to create an endpoint for serving the trained
machine learning model.
3. Arguments for Deployment:
 The deploy method takes two important arguments:
 initial_instance_count: This specifies the number of Amazon EC2
instances to launch for hosting the endpoint. In this case, it's set to
1, indicating that a single instance will be used initially.
 instance_type: This specifies the type of Amazon EC2 instance to
use for hosting the model endpoint. Here, it's set to 'ml.m4.xlarge',
which represents a specific type of compute instance optimized for
machine learning workloads.
4. Deployment Process:
 When you execute this line of code, SageMaker will perform the following
actions:
 Create a new SageMaker endpoint using the trained Linear Learner
model.
 Allocate the specified resources (e.g., EC2 instances) to host the
endpoint.
 Load the model into the deployed endpoint, making it ready to
receive inference requests.
5. binary_predictor as the Endpoint Interface:
 After deployment, the binary_predictor variable can be used as an
interface to the deployed endpoint. You can use it to send new data to the
endpoint and receive predictions from the trained model.
In summary, this code deploys the trained Linear Learner model as a SageMaker
endpoint, allowing you to use it for real-time predictions on new data. The
binary_predictor variable represents the deployed model, and you can interact with it
to obtain predictions from the model hosted on SageMaker.

import matplotlib.pyplot as plt

import seaborn as sns

# Get predictions for each batch with size of 25

def predict_batches(predictor, features, labels):
prediction_batches = [predictor.predict(batch) for batch in np.array_split(features, 25)]
# Parse protobuf responses to extract predicted labels
extract_label = lambda x: x.label['predicted_label'].float32_tensor.values
preds = np.concatenate([np.array([extract_label(x) for x in batch]) for batch in
prediction_batches])
preds = preds.reshape((-1,))

# Calculate accuracy, precision, and recall

accuracy = (preds == labels).sum() / len(labels)
print(f'Accuracy: {accuracy}')

# Calculate precision
precision = (preds[preds == 1] == labels[preds == 1]).sum() / len(preds[preds == 1])
print(f'Precision: {precision}')

# Calculate recall
recall = (preds[preds == 1] == labels[preds == 1]).sum() / len(labels[labels == 1])
print(f'Recall: {recall}')

confusion_matrix = pd.crosstab(index=labels, columns=np.round(preds),

rownames=['True'], colnames=['predictions']).astype(int)
plt.figure(figsize = (5,5))
sns.heatmap(confusion_matrix, annot=True, fmt='.2f',
cmap="YlGnBu").set_title('Confusion Matrix')

The provided code snippet defines a function called predict_batches that is used for
making predictions using a SageMaker endpoint and then calculating and visualizing
evaluation metrics. Let's break down what this code does step by step:
1. Import Libraries:
 The code imports the matplotlib.pyplot and seaborn libraries for data
visualization.
2. Define predict_batches Function:
 This function takes the following three arguments:
 predictor: The SageMaker predictor (endpoint) that will be used to
make predictions.
 features: The input features for which predictions will be made.
 labels: The true labels corresponding to the input features.
3. Predict in Batches:
 The function splits the features into batches of size 25 using
np.array_split.
 It then iterates through these batches and uses the predictor.predict()
method to obtain predictions for each batch.
 The predictions are collected and processed to extract the predicted
labels.
4. Calculate Evaluation Metrics:
 The following evaluation metrics are calculated:
 Accuracy: It measures the overall correctness of the predictions by
comparing them to the true labels.
 Precision: It measures the proportion of true positive predictions
out of all positive predictions.
 Recall: It measures the proportion of true positive predictions out of
all actual positive instances in the dataset.
 These metrics are printed to the console.
5. Create and Display a Confusion Matrix:
 A confusion matrix is generated using pd.crosstab to show the
distribution of predicted and true labels.
 The sns.heatmap function from Seaborn is used to create a heatmap
visualization of the confusion matrix.
 The heatmap provides a visual representation of the model's performance
in terms of true positives, true negatives, false positives, and false
negatives.
In summary, this code defines a function that takes a SageMaker predictor, input
features, and true labels, and then it uses the predictor to make predictions, calculates
evaluation metrics (accuracy, precision, recall), and visualizes the confusion matrix. This
can be helpful for assessing the performance of a binary classification model deployed
on SageMaker.

predict_batches(binary_predictor, train_features, train_labels)

The predict_batches function is being used to make predictions on the training data
using the binary_predictor, which is the SageMaker endpoint where the trained model
is deployed. Let's recap what happens when you call
predict_batches(binary_predictor, train_features, train_labels):
1. Predictions on Training Data:
 The function uses the binary_predictor (the SageMaker endpoint) to
make predictions on the train_features, which are the input features of
the training data.
 These predictions are generated in batches to efficiently process the data.
2. Evaluation Metrics Calculation:
 After obtaining the predictions, the function calculates three evaluation
metrics:
 Accuracy: This metric measures how accurately the model's
predictions match the true labels in the training data. It gives you an
overall sense of the model's performance.
 Precision: Precision measures the proportion of true positive
predictions out of all positive predictions. It helps assess the
model's ability to correctly identify positive cases.
 Recall: Recall measures the proportion of true positive predictions
out of all actual positive instances in the training data. It helps
assess the model's ability to capture all positive cases.
3. Confusion Matrix Visualization:
 A confusion matrix is generated based on the true labels and predicted
labels. This matrix provides a detailed breakdown of true positives, true
negatives, false positives, and false negatives.
 The confusion matrix is visualized as a heatmap using the Seaborn library.
It allows you to see how well the model performs in different prediction
categories.
4. Printing Metrics:
 The calculated metrics (accuracy, precision, and recall) are printed to the
console.
By calling predict_batches(binary_predictor, train_features, train_labels), you are
essentially assessing how well the deployed machine learning model performs on the
training data. This can provide insights into the model's initial performance and its ability
to learn patterns from the training dataset.

predict_batches(binary_predictor, test_features, test_labels)

The predict_batches function is being used to make predictions on the test data using
the binary_predictor, which is the SageMaker endpoint where the trained model is
deployed. This allows you to assess the model's performance on data it hasn't seen
during training. Here's what happens when you call predict_batches(binary_predictor,
test_features, test_labels):
1. Predictions on Test Data:
 The function uses the binary_predictor (the SageMaker endpoint) to
make predictions on the test_features, which are the input features of the
test data.
 These predictions are generated in batches to efficiently process the data.
2. Evaluation Metrics Calculation:
 After obtaining the predictions, the function calculates three evaluation
metrics:
 Accuracy: This metric measures how accurately the model's
predictions match the true labels in the test data. It gives you an
overall sense of the model's performance on unseen data.
 Precision: Precision measures the proportion of true positive
predictions out of all positive predictions. It helps assess the
model's ability to correctly identify positive cases in the test data.
 Recall: Recall measures the proportion of true positive predictions
out of all actual positive instances in the test data. It helps assess
the model's ability to capture all positive cases in the test data.
3. Confusion Matrix Visualization:
 A confusion matrix is generated based on the true labels and predicted
labels for the test data. This matrix provides a detailed breakdown of true
positives, true negatives, false positives, and false negatives.
 The confusion matrix is visualized as a heatmap using the Seaborn library.
This visualization allows you to see how well the model performs in
different prediction categories for the test data.
4. Printing Metrics:
 The calculated metrics (accuracy, precision, and recall) are printed to the
console, providing a quantitative assessment of the model's performance
on the test data.
By calling predict_batches(binary_predictor, test_features, test_labels), you are
evaluating how well the deployed machine learning model generalizes to new, unseen
data. This assessment helps you understand how the model performs in a real-world
scenario and whether it maintains its predictive accuracy on data it hasn't encountered
during training.

Pandas Library Documentation
No ratings yet
Pandas Library Documentation
16 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
12.1 - 12.9 Introduction To Modules - Libraries For DataScience
No ratings yet
12.1 - 12.9 Introduction To Modules - Libraries For DataScience
54 pages
Python GTU Study Material E-Notes 3 16012021061619AM
No ratings yet
Python GTU Study Material E-Notes 3 16012021061619AM
36 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
ML LAB
No ratings yet
ML LAB
46 pages
Final Lli 2 PDF
No ratings yet
Final Lli 2 PDF
606 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Ai - Phase 3
No ratings yet
Ai - Phase 3
9 pages
Informatics Practices: A Project Work On
88% (8)
Informatics Practices: A Project Work On
29 pages
Unit 4_Working With Graphs _python
No ratings yet
Unit 4_Working With Graphs _python
49 pages
Pandas 1
No ratings yet
Pandas 1
64 pages
Chapter 3 Python For Data Science
No ratings yet
Chapter 3 Python For Data Science
81 pages
13 Boost Your Data Analysis With Pandas
No ratings yet
13 Boost Your Data Analysis With Pandas
21 pages
ccs346 Eda Lab Manual
No ratings yet
ccs346 Eda Lab Manual
41 pages
PP Manual Exp no. 07
No ratings yet
PP Manual Exp no. 07
9 pages
DataFrame.docx
No ratings yet
DataFrame.docx
95 pages
Project file 12
No ratings yet
Project file 12
22 pages
BAET Record
No ratings yet
BAET Record
19 pages
Jashan ML
No ratings yet
Jashan ML
20 pages
GR py 14
No ratings yet
GR py 14
8 pages
Ba Ca
No ratings yet
Ba Ca
10 pages
dv_lab_manual_modified
No ratings yet
dv_lab_manual_modified
31 pages
assiment1
No ratings yet
assiment1
11 pages
final dev record
No ratings yet
final dev record
49 pages
Class Xii Information Practices Ppt on Data Handling Using Pandas-i
No ratings yet
Class Xii Information Practices Ppt on Data Handling Using Pandas-i
64 pages
nRQgi8EgDUNFS451K4xQXA
No ratings yet
nRQgi8EgDUNFS451K4xQXA
61 pages
Employee Data Analysis System ( Ip Class 12 ) ( 2024-25 )
No ratings yet
Employee Data Analysis System ( Ip Class 12 ) ( 2024-25 )
30 pages
UV-Vis Analyst Software Manual
100% (7)
UV-Vis Analyst Software Manual
45 pages
FODS Using Python Practical File
No ratings yet
FODS Using Python Practical File
18 pages
1 Density and Relative Density
No ratings yet
1 Density and Relative Density
32 pages
1 Data Handling Using Pandas 1
No ratings yet
1 Data Handling Using Pandas 1
63 pages
Rest of the Ip Project
No ratings yet
Rest of the Ip Project
26 pages
CS 3362 FDS
No ratings yet
CS 3362 FDS
53 pages
Fds Unit - III
No ratings yet
Fds Unit - III
58 pages
Ip Project - Docx1
100% (4)
Ip Project - Docx1
22 pages
INFORMATIC Complete Project
No ratings yet
INFORMATIC Complete Project
27 pages
Chap1_DataHandling Using Pandas
No ratings yet
Chap1_DataHandling Using Pandas
1 page
DEV Manual - ESEC
No ratings yet
DEV Manual - ESEC
27 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
Utf-8''libraries Data Management
No ratings yet
Utf-8''libraries Data Management
9 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
EMPLOYEE DATA ANALYSIS SYSTEM (IP CLASS XII)
No ratings yet
EMPLOYEE DATA ANALYSIS SYSTEM (IP CLASS XII)
26 pages
Project On Netflix Data Analysis
100% (1)
Project On Netflix Data Analysis
22 pages
2D Line Grid
No ratings yet
2D Line Grid
20 pages
Experiment 678910
No ratings yet
Experiment 678910
12 pages
Kunj Project 2
No ratings yet
Kunj Project 2
31 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
11 pages
Beer, How Do Mathematics and Music Relate To Each Other
100% (1)
Beer, How Do Mathematics and Music Relate To Each Other
13 pages
Server Hosting Management System (Ip Class 12) (2024-25)
No ratings yet
Server Hosting Management System (Ip Class 12) (2024-25)
21 pages
Assignment1
No ratings yet
Assignment1
2 pages
FM Ii
No ratings yet
FM Ii
66 pages
PMP Group Activity Booklet_Official PMI_v0.1 (1)
No ratings yet
PMP Group Activity Booklet_Official PMI_v0.1 (1)
23 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
Experiment 1 solution
No ratings yet
Experiment 1 solution
5 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Real Python Part 1 PDF
No ratings yet
Real Python Part 1 PDF
234 pages
pandas data frame
No ratings yet
pandas data frame
11 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Exp_1_Introduction to Data Analytics and Python fundamentals_sdk_ok
No ratings yet
Exp_1_Introduction to Data Analytics and Python fundamentals_sdk_ok
9 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
23 pages
Python Institute
No ratings yet
Python Institute
6 pages
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
No ratings yet
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
6 pages
To Determine The Mass Moment of The Inertia of Flywheel
No ratings yet
To Determine The Mass Moment of The Inertia of Flywheel
8 pages
CSE304 - Design & Analysis of Algorithms: Articulation Points, Bridges & Biconnected Components
No ratings yet
CSE304 - Design & Analysis of Algorithms: Articulation Points, Bridges & Biconnected Components
19 pages
What is pandas
No ratings yet
What is pandas
9 pages
3-Day IT Architecture Business Technology Requirement Architecture Training Course Plan
No ratings yet
3-Day IT Architecture Business Technology Requirement Architecture Training Course Plan
2 pages
HANDOUT
No ratings yet
HANDOUT
13 pages
Lecture 1 Matrices and Determinants
No ratings yet
Lecture 1 Matrices and Determinants
14 pages
Labs_ER_v2
No ratings yet
Labs_ER_v2
23 pages
Homework Sheets Second Grade
100% (1)
Homework Sheets Second Grade
8 pages
Case Studies ML
No ratings yet
Case Studies ML
21 pages
Engineering Structures: Islam Ezzeldin, Hany El Naggar
No ratings yet
Engineering Structures: Islam Ezzeldin, Hany El Naggar
15 pages
01 Analisa Hidrologi Cemara (Tugas Periode Ulang)
No ratings yet
01 Analisa Hidrologi Cemara (Tugas Periode Ulang)
34 pages
Outline Splunk
No ratings yet
Outline Splunk
3 pages
Sukkur Iba University: Course: Signals and Systems
No ratings yet
Sukkur Iba University: Course: Signals and Systems
24 pages
Problem 1.3
No ratings yet
Problem 1.3
3 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
10 pages
Group 4 - Activity
No ratings yet
Group 4 - Activity
17 pages
Microsoft PL600 Activity Booklet Day 3
No ratings yet
Microsoft PL600 Activity Booklet Day 3
4 pages
Chapter 2
No ratings yet
Chapter 2
40 pages
The Modern Digital Design Flow
No ratings yet
The Modern Digital Design Flow
6 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
CSV New
No ratings yet
CSV New
4 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
CFD Mid1 Exam
No ratings yet
CFD Mid1 Exam
12 pages
Module Preprocesing_MLPipeline
No ratings yet
Module Preprocesing_MLPipeline
7 pages
Progress and Update Term 3 2024_2025
No ratings yet
Progress and Update Term 3 2024_2025
8 pages
03 Wind LoadsSSS
No ratings yet
03 Wind LoadsSSS
4 pages
Math - WS 4D
No ratings yet
Math - WS 4D
4 pages
Academic Year 25_26
No ratings yet
Academic Year 25_26
3 pages
Grade 10 1st Quarter Assessment TOS
No ratings yet
Grade 10 1st Quarter Assessment TOS
3 pages
Dynamics of Rigid Bodies Part 1 (Edited)
No ratings yet
Dynamics of Rigid Bodies Part 1 (Edited)
1 page
terraform_lambda
No ratings yet
terraform_lambda
5 pages
PMI Official_Group_PMPv3
No ratings yet
PMI Official_Group_PMPv3
5 pages
Microssoft PL600 Activity Booklet Day 1
No ratings yet
Microssoft PL600 Activity Booklet Day 1
5 pages
Module 2 Decision Making
No ratings yet
Module 2 Decision Making
19 pages
Linux
No ratings yet
Linux
4 pages
Job Design (Job Analysis & Job Description) (1 Day)
No ratings yet
Job Design (Job Analysis & Job Description) (1 Day)
4 pages
Training Outline_Partyrock
No ratings yet
Training Outline_Partyrock
4 pages
Final Exam D.E
No ratings yet
Final Exam D.E
2 pages
Garner March 23 Lesson Plan
No ratings yet
Garner March 23 Lesson Plan
4 pages
AWS_SPARK
No ratings yet
AWS_SPARK
3 pages
1st Math 4 TOS
No ratings yet
1st Math 4 TOS
5 pages
Attach EI
No ratings yet
Attach EI
3 pages
OUDREY THOMAS ASSIGNMENT AWS_Oudrey
No ratings yet
OUDREY THOMAS ASSIGNMENT AWS_Oudrey
2 pages
Problem+Formulation+Exercise+Solutions
No ratings yet
Problem+Formulation+Exercise+Solutions
3 pages
Sagemaker Studio_EMR_Glue_macarious
No ratings yet
Sagemaker Studio_EMR_Glue_macarious
2 pages
mb300andER__sent
No ratings yet
mb300andER__sent
2 pages
mb300andER
No ratings yet
mb300andER
2 pages
MySql Student companion_day 3
No ratings yet
MySql Student companion_day 3
2 pages
Geodesic Sphere
No ratings yet
Geodesic Sphere
2 pages
Chang_Si_Ju
No ratings yet
Chang_Si_Ju
2 pages
[PO] IT Architecture for JKT
No ratings yet
[PO] IT Architecture for JKT
1 page
Assignment Bedawi
No ratings yet
Assignment Bedawi
1 page
Aidel Azhar Assignment
No ratings yet
Aidel Azhar Assignment
1 page
Dinellie D_Assignment
No ratings yet
Dinellie D_Assignment
1 page
Elliana Yasmin_Assignment
No ratings yet
Elliana Yasmin_Assignment
1 page
Firebase Storage for Angular: A reliable file upload solution for your applications
From Everand
Firebase Storage for Angular: A reliable file upload solution for your applications
Abdelfattah Ragab
No ratings yet
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
From Everand
Rust Package 100 Knocks: One-Hour Mastery Series 2024 Edition
Kanto
No ratings yet

Lab 3 ML

Uploaded by

Lab 3 ML

Uploaded by

Lab 3 ML

# Update boto3 and SageMaker

%pip install --upgrade boto3 sagemaker

if not os.path.isfile(base_path + file_path):

The df.describe() method in pandas is used to generate descriptive statistics of the

from sklearn.model_selection import train_test_split

# Get the target column

print(f"Length of train_features is: {train_features.shape}")

# Call the LinearLearner estimator object

train_records = binary_estimator.record_set(train_features, train_labels, channel='train')

binary_estimator.fit([train_records, val_records, test_records])

import matplotlib.pyplot as plt

# Get predictions for each batch with size of 25

# Calculate accuracy, precision, and recall

confusion_matrix = pd.crosstab(index=labels, columns=np.round(preds),

predict_batches(binary_predictor, train_features, train_labels)

predict_batches(binary_predictor, test_features, test_labels)

You might also like