0% found this document useful (0 votes)

54 views5 pages

Assignment 2

Uploaded by

lavanyagowdau

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views5 pages

Assignment 2

Uploaded by

lavanyagowdau

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1. Dataset Selection.

We'll analyze the Titanic dataset, which lists passengers from the Titanic, including whether or
not they survived.

2. Data Loading and Cleaning

# Importing necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

# Load the Titanic dataset

url =
"https://raw.githubusercontent.com/datasciencedojo/datasets/master/
titanic.csv"
df = pd.read_csv(url)

# Display the first few rows of the dataset

print(df.head())

# Clean the data: Handling missing values

df['Age'].fillna(df['Age'].median(), inplace=True) # Filling missing
Age with median
df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True) #
Filling missing Embarked with mode

# Dropping 'Cabin' since it's too sparse and 'Name' since we'll
extract titles
df.drop(columns=['Cabin', 'Name'], inplace=True)

# Convert 'Sex' and 'Embarked' into numerical values

df['Sex'] = df['Sex'].map({'female': 0, 'male': 1})
df['Embarked'] = df['Embarked'].map({'C': 0, 'Q': 1, 'S': 2})

# Display cleaned data

print(df.info())

PassengerId Survived Pclass \

0 1 0 3
1 2 1 1
2 3 1 3
3 4 1 1
4 5 0 3

Name Sex Age

SibSp \
0 Braund, Mr. Owen Harris male 22.0
1
1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0
1
2 Heikkinen, Miss. Laina female 26.0
0
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0
1
4 Allen, Mr. William Henry male 35.0
0

Parch Ticket Fare Cabin Embarked

0 0 A/5 21171 7.2500 NaN S
1 0 PC 17599 71.2833 C85 C
2 0 STON/O2. 3101282 7.9250 NaN S
3 0 113803 53.1000 C123 S
4 0 373450 8.0500 NaN S
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Sex 891 non-null int64
4 Age 891 non-null float64
5 SibSp 891 non-null int64
6 Parch 891 non-null int64
7 Ticket 891 non-null object
8 Fare 891 non-null float64
9 Embarked 891 non-null int64
dtypes: float64(2), int64(7), object(1)
memory usage: 69.7+ KB
None

C:\Users\Dell\AppData\Local\Temp\ipykernel_8564\483482650.py:16:
FutureWarning: A value is trying to be set on a copy of a DataFrame or
Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never
work because the intermediate object on which we are setting values
always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try

using 'df.method({col: value}, inplace=True)' or df[col] =
df[col].method(value) instead, to perform the operation inplace on the
original object.

df['Age'].fillna(df['Age'].median(), inplace=True) # Filling

missing Age with median
C:\Users\Dell\AppData\Local\Temp\ipykernel_8564\483482650.py:17:
FutureWarning: A value is trying to be set on a copy of a DataFrame or
Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never
work because the intermediate object on which we are setting values
always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try

using 'df.method({col: value}, inplace=True)' or df[col] =
df[col].method(value) instead, to perform the operation inplace on the
original object.

df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True) #
Filling missing Embarked with mode

3. String Manipulation
# Example of string manipulation (if applicable)
# In this dataset, we did not keep the 'Name' column, but if we had,
we could do:
# df['Title'] = df['Name'].str.extract(' ([A-Za-z]+)\.') # Extracting
titles like Mr, Mrs
# df['Title'] = df['Title'].str.lower() # Convert to lowercase

4. Using NumPy for Basic Statistics

# Convert relevant columns to NumPy arrays
age_array = df['Age'].to_numpy()
fare_array = df['Fare'].to_numpy()

# Calculate basic statistics

print(f"Mean Age: {np.mean(age_array)}, Median Age:
{np.median(age_array)}")
print(f"Mean Fare: {np.mean(fare_array)}, Median Fare:
{np.median(fare_array)}")

Mean Age: 29.36158249158249, Median Age: 28.0

Mean Fare: 32.204207968574636, Median Fare: 14.4542

5. Data Splitting
# Define features and target variable
X = df[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare', 'Embarked']]
y = df['Survived'] # Target variable

# Splitting the dataset into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
6. Build a Model
We'll use Logistic Regression since it works well with binary outcomes.

# Build the model

model = LogisticRegression()

# Train the model on the training set

model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
confusion = confusion_matrix(y_test, y_pred)

print(f"Accuracy of the model: {accuracy*100:.2f}%")

print("Confusion Matrix:")
print(confusion)

Accuracy of the model: 81.01%

Confusion Matrix:
[[90 15]
[19 55]]

REPORT:

Title: Analysis of Titanic Dataset for Survival Prediction

Objective: The aim of this analysis was to predict passenger survival
from the Titanic dataset using machine learning techniques. The
primary focus was on data cleaning, string manipulation, and model
building.

Dataset: The Titanic dataset was selected from Kaggle,

containing information about passengers, including
features like age, gender, ticket class, and whether they
survived.
[1] Data Cleaning:

Missing values were addressed: median age filled in for missing Age, and mode for missing
Embarked. Preprocessing included dropping irrelevant columns and converting categorical
variables (Sex, Embarked) into numerical format. String Manipulation: Although the Name
column was dropped, typical string manipulations could involve extracting titles for gender and
class analysis.

[2] Statistical Analysis:

Basic statistics were performed using NumPy, revealing that the average age of passengers was
approximately 29.7 years, while the average fare was about 32.2. Model Building: We employed
logistic regression for modeling the survival of passengers. The dataset was split into training
(80%) and testing (20%) sets.

[3] Results: The model achieved an accuracy of approximately 80%, indicating a reasonably
good prediction capability given the structured features. The confusion matrix provided further
insight into the classification performance.

[4] Conclusion: This analysis demonstrates how data preprocessing and machine learning
techniques can be applied to derive insights from historical datasets. Future work could explore
hyperparameter tuning and alternative models for better accuracy.

Titanic Dataset Preprocessing Guide
No ratings yet
Titanic Dataset Preprocessing Guide
5 pages
Assignment Data Science
No ratings yet
Assignment Data Science
2 pages
Experiment 1
No ratings yet
Experiment 1
2 pages
Titanic Survival Analysis
100% (2)
Titanic Survival Analysis
13 pages
Dspracticalexternak 23 Aug
No ratings yet
Dspracticalexternak 23 Aug
8 pages
Assignment 5
No ratings yet
Assignment 5
14 pages
Data Cleaning & Analysis Guide
No ratings yet
Data Cleaning & Analysis Guide
11 pages
ML Dataset Performance
No ratings yet
ML Dataset Performance
11 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
5 pages
Prac3 23bme053
No ratings yet
Prac3 23bme053
5 pages
Titanic Dataset Analysis and Insights
No ratings yet
Titanic Dataset Analysis and Insights
17 pages
SML - Lab03 - Colab
No ratings yet
SML - Lab03 - Colab
11 pages
Data Mining
No ratings yet
Data Mining
59 pages
LOGISTIC - REGRESSION - Jupyter Notebook
No ratings yet
LOGISTIC - REGRESSION - Jupyter Notebook
18 pages
Machine Learning Lab: Titanic PCA & ID3 Decision Tree
No ratings yet
Machine Learning Lab: Titanic PCA & ID3 Decision Tree
19 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
28 pages
Dataset Visualization Basic Ml-1
No ratings yet
Dataset Visualization Basic Ml-1
12 pages
7 8 - Missing Value Handling
No ratings yet
7 8 - Missing Value Handling
4 pages
Titanic Survival Classification Analysis
100% (1)
Titanic Survival Classification Analysis
7 pages
Pyt Manual 1
No ratings yet
Pyt Manual 1
85 pages
Titanic Data Analysis
No ratings yet
Titanic Data Analysis
14 pages
Aiml Lab04&5 - Output
No ratings yet
Aiml Lab04&5 - Output
18 pages
Titanic ML for Data Scientists
No ratings yet
Titanic ML for Data Scientists
36 pages
Titanic Survival Prediction Analysis
No ratings yet
Titanic Survival Prediction Analysis
15 pages
Titanic Survival Prediction Guide
No ratings yet
Titanic Survival Prediction Guide
16 pages
Data Cleaning by Manish Batra 1697684636
No ratings yet
Data Cleaning by Manish Batra 1697684636
30 pages
A09Ass01 - Jupyter Notebook
No ratings yet
A09Ass01 - Jupyter Notebook
8 pages
Data Cleaning and Manipulation in Python
No ratings yet
Data Cleaning and Manipulation in Python
33 pages
Experiment 9
No ratings yet
Experiment 9
7 pages
Pandas Data Imputation Guide
No ratings yet
Pandas Data Imputation Guide
12 pages
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
No ratings yet
Loading The Dataset: ## The Matplotlib and Seaborn Library For Result Visualization and Analysis
13 pages
Project Report
No ratings yet
Project Report
7 pages
ML - Lab 03.ipynb Colab
No ratings yet
ML - Lab 03.ipynb Colab
4 pages
Day 20
No ratings yet
Day 20
5 pages
23L-2589 Lab 10
No ratings yet
23L-2589 Lab 10
17 pages
AM19 EDA Assignment1
No ratings yet
AM19 EDA Assignment1
13 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
33 pages
Advanced Python for Data Scientists
No ratings yet
Advanced Python for Data Scientists
19 pages
Titanic
No ratings yet
Titanic
22 pages
Titanic Data Analysis and Visualization
No ratings yet
Titanic Data Analysis and Visualization
11 pages
Titanic Eda
No ratings yet
Titanic Eda
17 pages
Practice Assignment 1
No ratings yet
Practice Assignment 1
2 pages
Rajat DM
No ratings yet
Rajat DM
54 pages
PANDAS Groupby Continues 2
No ratings yet
PANDAS Groupby Continues 2
5 pages
Titanic Data Analysis & Modeling
No ratings yet
Titanic Data Analysis & Modeling
12 pages
Programs Week7
No ratings yet
Programs Week7
14 pages
9914 ML Lab3
No ratings yet
9914 ML Lab3
6 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
DL Assignment 1
No ratings yet
DL Assignment 1
7 pages
Titanic Data Analysis in Colab
No ratings yet
Titanic Data Analysis in Colab
4 pages
Logistic Regression on Titanic Data
No ratings yet
Logistic Regression on Titanic Data
6 pages
Titanic
No ratings yet
Titanic
6 pages
Pandas PD: Import As
No ratings yet
Pandas PD: Import As
19 pages
ML 3
No ratings yet
ML 3
9 pages
9924 ML Lab3
No ratings yet
9924 ML Lab3
9 pages
Dev Assignment - 1
No ratings yet
Dev Assignment - 1
6 pages
TITANIC CLASSIFICATION - Task1
No ratings yet
TITANIC CLASSIFICATION - Task1
2 pages
Associate Members of FCAN: Class A
100% (1)
Associate Members of FCAN: Class A
13 pages
Dolby Atmos Home Theater Installation Guidelines
No ratings yet
Dolby Atmos Home Theater Installation Guidelines
43 pages
How to Start a Startup Guide
No ratings yet
How to Start a Startup Guide
19 pages
Questions? We Have The Answers! Howdoigetmy Activation Code?
No ratings yet
Questions? We Have The Answers! Howdoigetmy Activation Code?
7 pages
Installation Guide LS-DYNA-971 R4 2 1
0% (2)
Installation Guide LS-DYNA-971 R4 2 1
48 pages
Peachtree Complete Accounting 2010 - P2
No ratings yet
Peachtree Complete Accounting 2010 - P2
2 pages
فلود 4 - 250409 - 160945
No ratings yet
فلود 4 - 250409 - 160945
7 pages
Wa0007.
No ratings yet
Wa0007.
7 pages
Forcon ENN SPA Salon
No ratings yet
Forcon ENN SPA Salon
5 pages
Authentication - Professional Identification Card
No ratings yet
Authentication - Professional Identification Card
1 page
Lean Mtech Project Introduction
No ratings yet
Lean Mtech Project Introduction
11 pages
Epc Tag Standard
No ratings yet
Epc Tag Standard
210 pages
DEA351 A-Series Lighting Control Panel Catalog
No ratings yet
DEA351 A-Series Lighting Control Panel Catalog
8 pages
Factors Influencing Bird Nest Exports
No ratings yet
Factors Influencing Bird Nest Exports
13 pages
Exalted Caste Book Twilight PDF
No ratings yet
Exalted Caste Book Twilight PDF
2 pages
Raychem Im h57754 Misurfacesnowmeltingcom en - tcm432 55954 PDF
No ratings yet
Raychem Im h57754 Misurfacesnowmeltingcom en - tcm432 55954 PDF
100 pages
Job Vacancies in Qatar - March 2014
No ratings yet
Job Vacancies in Qatar - March 2014
6 pages
Remote Network Penetration Via NetBios Hack On A Windows PC
No ratings yet
Remote Network Penetration Via NetBios Hack On A Windows PC
13 pages
Samsung 18000 BTU AC Specifications
No ratings yet
Samsung 18000 BTU AC Specifications
3 pages
Parts Manual: Generator Set
No ratings yet
Parts Manual: Generator Set
78 pages
Edward Said's Critical Legacy
No ratings yet
Edward Said's Critical Legacy
25 pages
An Analysis of VSWR Error and VSWR Threshold
No ratings yet
An Analysis of VSWR Error and VSWR Threshold
20 pages
Ashrae Handbook 2000 CH
No ratings yet
Ashrae Handbook 2000 CH
8 pages
SeriesC Catalog 2 GREENHECK
No ratings yet
SeriesC Catalog 2 GREENHECK
56 pages
Quick Start: Downloaded From Manuals Search Engine
No ratings yet
Quick Start: Downloaded From Manuals Search Engine
20 pages
Swaminathan K
No ratings yet
Swaminathan K
4 pages
River Training Works: Ganga & Sharda Case Studies
No ratings yet
River Training Works: Ganga & Sharda Case Studies
72 pages
Power Factor How Effects Bill
No ratings yet
Power Factor How Effects Bill
37 pages
Cv-Nour Edin Alhabal-En
No ratings yet
Cv-Nour Edin Alhabal-En
5 pages
Electromagnetic Brakes
100% (3)
Electromagnetic Brakes
30 pages

Assignment 2

Uploaded by

Assignment 2

Uploaded by

1. Dataset Selection.

2. Data Loading and Cleaning

# Load the Titanic dataset

# Display the first few rows of the dataset

# Clean the data: Handling missing values

# Convert 'Sex' and 'Embarked' into numerical values

# Display cleaned data

PassengerId Survived Pclass \

Name Sex Age

Parch Ticket Fare Cabin Embarked

For example, when doing 'df[col].method(value, inplace=True)', try

df['Age'].fillna(df['Age'].median(), inplace=True) # Filling

For example, when doing 'df[col].method(value, inplace=True)', try

4. Using NumPy for Basic Statistics

# Calculate basic statistics

Mean Age: 29.36158249158249, Median Age: 28.0

# Build the model

# Train the model on the training set

# Make predictions on the test set

# Evaluate the model

print(f"Accuracy of the model: {accuracy*100:.2f}%")

Accuracy of the model: 81.01%

Title: Analysis of Titanic Dataset for Survival Prediction

Dataset: The Titanic dataset was selected from Kaggle,

[2] Statistical Analysis:

You might also like