0% found this document useful (0 votes)
27 views20 pages

PRINCIPLES OF DATA SCIENCE Lab

The document outlines a series of experiments for a course on Data Science, focusing on Python programming, data manipulation, and analysis using packages like NumPy and Pandas. It includes detailed procedures for implementing basic data types, working with files, performing computations on arrays, and manipulating CSV files and images. Additionally, it provides example programs and outputs for various tasks, emphasizing practical applications in data science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views20 pages

PRINCIPLES OF DATA SCIENCE Lab

The document outlines a series of experiments for a course on Data Science, focusing on Python programming, data manipulation, and analysis using packages like NumPy and Pandas. It includes detailed procedures for implementing basic data types, working with files, performing computations on arrays, and manipulating CSV files and images. Additionally, it provides example programs and outputs for various tasks, emphasizing practical applications in data science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

24MS104 PRINCIPLES OF DATA SCIENCE

List of Experiments

1. Implement basic data types and user defined functions using Python.

2. Develop python programs for packages and working on files.

3. Perform basic computations and manipulation on arrays using the NumPy package.

4. Perform manipulations on the CSV files and images using NumPy package.

5. Perform data manipulation on the input CSV file using Pandas Package.

6. Perform data preparation and visualization on the input CSV file using Pandas

matplotlibpackages.

7. Perform Exploratory Data Analysis on a CSV file. Import any CSV file and

performExploratory Data Analysis: Data summarization.

8. Compute the following for the given data set using the NumPy package. Compute

Mean,Variance, Standard Deviation, Sampling, Covariance, Correlation.

9. Download data from UCI Machine Learning Repository or Kaggle and perform the

business analytics for the application “Predictive analytics in healthcare”.

10. Download data from UCI Machine Learning Repository or Kaggle and perform the

business analytics for the application “Weather predictions in agriculture sector”.


EX:NO:1

Implement basic data types and user defined functions using Python.

Aim:

To implement basic data types and user defined functions using Python.

Procedure:
1. Basic Data Types:
a. Integer, Float, String, Boolean, List, Tuple, Dictionary, Set
2. User-Defined Functions:
a. Functions with parameters
b. Functions with return values
c. Functions without return values

Program:
Basic Data Types in Python

# Integer
age = 25
print("Age:", age)

# Float
temperature = 25.6
print("Temperature:", temperature)

# String
name = "John Doe"
print("Name:", name)

# Boolean
is_student = True
print("Is student:", is_student)

# List (ordered collection of items)


fruits = ["apple", "banana", "cherry"]
print("Fruits:", fruits)

# Tuple (immutable collection of items)


coordinates = (10, 20, 30)
print("Coordinates:", coordinates)
# Dictionary (collection of key-value pairs)
student = {"name": "John", "age": 25, "is_student": True}
print("Student info:", student)

# Set (unordered collection of unique items)


numbers = {1, 2, 3, 4, 5}
print("Numbers:", numbers)

Output:
Age: 25
Temperature: 25.6
Name: John Doe
Is student: True
Fruits: ['apple', 'banana', 'cherry']
Coordinates: (10, 20, 30)
Student info: {'name': 'John', 'age': 25, 'is_student': True}
Numbers: {1, 2, 3, 4, 5}

Function with Parameters:


This function accepts parameters and performs a task.
# Function to greet a person with their name
def greet(name):
print(f"Hello, {name}!")

greet("Alice") # Calling the function with the argument "Alice"

Function with Return Value:


This function returns a result after performing some operation.
# Function to add two numbers

def add_numbers(a, b):


return a + b

result = add_numbers(5, 10)


print("Sum:", result)

Function without Return Value:


This function performs an operation but doesn't return a result. It simply prints or
performs an action.
# Function to print a message
def print_message(message):
print(message)

print_message("This is a simple message.")

Function with Default Parameters:


A function can have default values for parameters, so you don't have to provide them every
time.

# Function to greet with a default message

def greet_with_default(name, greeting="Hello"):


print(f"{greeting}, {name}!")

greet_with_default("Bob") # Uses default greeting "Hello"


greet_with_default("Charlie", "Good Morning") # Uses custom greeting

Complex function

import math
# Function to calculate area of circle
def calculate_area(radius):
if radius <= 0:
return "Radius must be greater than zero"
area = math.pi * radius ** 2
return area

radius = 5
area = calculate_area(radius)
print(f"Area of circle with radius {radius}: {area}")

Using Lists and loops

A function that processes a list of numbers and returns their square values:
# Function to return square of each number in a list
def square_numbers(numbers):
squared = []
for num in numbers:
squared.append(num ** 2)
return squared

numbers = [1, 2, 3, 4, 5]
result = square_numbers(numbers)
print("Squared numbers:", result)

Functions can take parameters, return values, and allow you to organize and reuse code.
Functions can also have default parameters and can be used to process data like lists or
perform calculations.
EX:NO:2

Develop python programs for packages and working on files.

Aim:

To develop python programs for packages and working on files.

Procedure:
1. Working with Python Packages – Importing and using external libraries.
 import and use external libraries (like NumPy) for various tasks such as
array manipulation, mathematical operations, etc.
2. Working with Files – Reading from and writing to text files.
Perform various file operations like writing, reading, and appending to text files using
Python's built-in file handling functions
 Writing to the file: We use the 'w' mode to write to the file, which will create
the file if it doesn't exist and overwrite it if it does.
 Reading from the file: The 'r' mode is used to open the file in read-only
mode.
 Appending to the file: The 'a' mode allows appending new data at the end of
the file without overwriting the existing content.
3. Working with CSV Files using the csv Package
i. Writing to CSV: We open the file in write mode ('w') and use the
csv.writer() object to write rows to the CSV file.
ii. Reading from CSV: We open the file in read mode ('r') and use
csv.reader() to read the rows of the CSV file.

Working with Python Packages

Python has a vast range of external packages (libraries) that help you work efficiently in
various domains like data analysis, machine learning, file handling, etc. We will use the
NumPy package for array manipulation as an example.

Example 1: Using the NumPy Package


pip install numpy

Develop a Python program using NumPy for array manipulation.

# Import the numpy package

import numpy as np
# Create a 2D numpy array
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Print the array


print("Original Array:")
print(array_2d)

# Perform basic operations


sum_array = np.sum(array_2d) # Sum of all elements
mean_array = np.mean(array_2d) # Mean of all elements
transpose_array = np.transpose(array_2d) # Transpose the array

print("\nSum of elements:", sum_array)


print("Mean of elements:", mean_array)
print("Transposed Array:")
print(transpose_array)

OUTPUT:

Original Array:
[[1 2 3]
[4 5 6]
[7 8 9]]

Sum of elements: 45
Mean of elements: 5.0
Transposed Array:
[[1 4 7]
[2 5 8]
[3 6 9]]

This program demonstrates how to use a package (NumPy) to create an array and perform
basic operations like sum, mean, and transpose.

Working with Files


Python provides built-in support for working with files. You can read from and write to
files in various formats, such as text files (.txt), CSV files (.csv), and JSON files (.json).
Reading from and Writing to a Text File
Creating, Writing, Reading, and Appending to a Text File

# Writing to a text file


with open('sample_file.txt', 'w') as file:
file.write("This is the first line.\n")
file.write("This is the second line.\n")

# Reading from the text file


with open('sample_file.txt', 'r') as file:
content = file.read()
print("File Content (After Writing):")
print(content)

# Appending to the text file


with open('sample_file.txt', 'a') as file:
file.write("This is an appended line.\n")

# Reading from the file again after appending


with open('sample_file.txt', 'r') as file:
updated_content = file.read()
print("File Content (After Appending):")
print(updated_content)

OUTPUT:

File Content (After Writing):


This is the first line.
This is the second line.

File Content (After Appending):


This is the first line.
This is the second line.
This is an appended line.

3. Working with CSV Files using the csv Package


Python also provides a csv module for working with CSV (Comma-Separated Values)
files. Here's an example:

import csv

# Writing to a CSV file


header = ['Name', 'Age', 'City']
data = [['John', 28, 'New York'],
['Emma', 22, 'London'],
['Sophia', 25, 'Paris']]

with open('people.csv', 'w', newline='') as file:


writer = csv.writer(file)
writer.writerow(header) # Writing the header
writer.writerows(data) # Writing the data rows

# Reading from the CSV file


with open('people.csv', 'r') as file:
reader = csv.reader(file)
print("CSV File Content:")
for row in reader:
print(row)

OUTPUT:

CSV File Content:

['Name', 'Age', 'City']

['John', '28', 'New York']

['Emma', '22', 'London']

['Sophia', '25', 'Paris']

Ex :No: 3

Perform basic computations and manipulation on arrays using the NumPy package.

Aim:

To write a python program to perform basic computations and manipulation on arrays


using the NumPy package.

Algorithm:

1. Import the NumPy Package: Import the numpy library to use its functions.
2. Create Arrays: Initialize arrays using np.array().
a. We create two 1D arrays array1 and array2.
3. Perform Basic Computations:
 Addition, Subtraction, Multiplication, Division: Apply these arithmetic
operations on arrays.
 Element-wise Operations: Use NumPy’s vectorized operations for efficient
computation.
4. Manipulate Arrays:
 Reshaping: Use reshape() to change the shape of an array.
a. The reshape() method changes the shape of array1 to a 2x2 matrix,
and slicing extracts the second and third elements of array1.
 Slicing: Extract parts of an array using indexing or slicing.
 Concatenation: Combine arrays using np.concatenate().
a. The np.concatenate() function merges array1 and array2 into one
array.
5. Display Results: Output the results of computations and manipulations.

Program

import numpy as np

# 1. Create arrays
array1 = np.array([1, 2, 3, 4])
array2 = np.array([5, 6, 7, 8])

# 2. Basic computations
addition = array1 + array2
subtraction = array1 - array2
multiplication = array1 * array2
division = array1 / array2

# 3. Array manipulation
reshaped_array = array1.reshape(2, 2) # Reshape to a 2x2 array
sliced_array = array1[1:3] # Slice the array (elements 1 to 2)

# 4. Concatenate arrays
concatenated_array = np.concatenate((array1, array2))

# 5. Output results
print("Array 1:", array1)
print("Array 2:", array2)
print("Addition of arrays:", addition)
print("Subtraction of arrays:", subtraction)
print("Multiplication of arrays:", multiplication)
print("Division of arrays:", division)
print("Reshaped Array:", reshaped_array)
print("Sliced Array:", sliced_array)
print("Concatenated Array:", concatenated_array)

Output

Array 1: [1 2 3 4]
Array 2: [5 6 7 8]
Addition of arrays: [ 6 8 10 12]
Subtraction of arrays: [-4 -4 -4 -4]
Multiplication of arrays: [ 5 12 21 32]
Division of arrays: [0.2 0.33333333 0.42857143 0.5]
Reshaped Array: [[1 2]
[3 4]]
Sliced Array: [2 3]
Concatenated Array: [1 2 3 4 5 6 7 8]

Result:

Ex:No:5

Perform manipulations on the CSV files and images using NumPy package.

Aim:
To write a python to Perform manipulations on the CSV files and images using
NumPy package.

Algorithm:
1. CSV File Manipulation:
 Import necessary libraries (numpy and pandas).
 Load a CSV file using np.genfromtxt() or pandas.read_csv().
a. np.genfromtxt() loads a CSV file, where dtype=None allows automatic
detection of the data type, and names=True uses the first row as
column names.
 Perform operations (e.g., basic arithmetic, filtering, slicing).
a. Here, we increased the "Salary" column values by 10% using simple
arithmetic operations.
 Save or export the manipulated data.
a. The manipulated data is saved to a new CSV file using np.savetxt().
2. Image Manipulation:
 Load an image using imageio.imread() or PIL.Image and convert it to a
NumPy array.
a. mageio.imread() reads the image file, converting it into a NumPy
array.
b. Grayscale Conversion: The image is converted to grayscale by
averaging the RGB channels.
 Manipulate the image array (e.g., adjust brightness, rotate, crop).
a. The pixel values are multiplied by 1.5 (brightening the image), and
np.clip() ensures values stay within the range [0, 255].
 Save the image using imageio.imwrite() or PIL.Image.save().

Program:

Manipulating CSV Files:

import numpy as np
import pandas as pd

# Load CSV file


# Example CSV: data.csv with columns: Name, Age, Salary
csv_data = np.genfromtxt('data.csv', delimiter=',', dtype=None, names=True, encoding='utf-
8')

# Display the CSV data


print("Original CSV Data:")
print(csv_data)

# Perform a basic operation (e.g., increasing salary by 10%)


csv_data['Salary'] = csv_data['Salary'] * 1.10

# Save the manipulated data back to CSV


np.savetxt('manipulated_data.csv', csv_data, delimiter=',', header="Name, Age, Salary",
fmt='%s', comments='')

print("Manipulated Data saved to manipulated_data.csv")

Manipulating Images:
import numpy as np
import imageio
from PIL import Image

# Load an image and convert it to a NumPy array


image = imageio.imread('image.jpg') # Replace with your image file path
image_array = np.array(image)

# Display basic image information


print(f"Original Image Shape: {image_array.shape}")
print(f"Image Data Type: {image_array.dtype}")

# Example manipulation: Convert image to grayscale (average of RGB channels)


gray_image = np.mean(image_array, axis=2).astype(np.uint8)

# Example manipulation: Increase brightness by multiplying with a factor


bright_image = np.clip(image_array * 1.5, 0, 255).astype(np.uint8)

# Save the manipulated image


imageio.imwrite('gray_image.jpg', gray_image)
imageio.imwrite('bright_image.jpg', bright_image)

print("Manipulated Images saved as gray_image.jpg and bright_image.jpg")

Output

Assume the CSV data (data.csv)

Name, Age, Salary


Alice, 30, 50000
Bob, 25, 45000
Charlie, 35, 60000

After applying the operation of increasing the salary by 10%, the new manipulated_data.csv
would be:

Name, Age, Salary


Alice, 30, 55000.0
Bob, 25, 49500.0
Charlie,35, 66000.0
Create a CSV file on your system named data.csv and populate it with the data above
using any text editor (e.g., Notepad or VS Code) or using Python

import csv

header = ['Name', 'Age', 'Salary']


rows = [
['Alice', 30, 50000],
['Bob', 25, 45000],
['Charlie', 35, 60000],
['David', 40, 75000],
['Eve', 29, 52000]
]

# Writing to a CSV file


with open('data.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(header) # Write the header
writer.writerows(rows) # Write the rows

Result :

Ex:No:6

Perform data manipulation on the input CSV file using Pandas Package

Aim:
To write a python to Perform data manipulation on the input CSV file using Pandas
Package.

Algorithm:
Pandas is a powerful library for data manipulation and analysis in Python. Below is an
example of how to manipulate an input CSV file using the Pandas package.
1. Import necessary libraries: Import pandas and numpy for data manipulation.
2. Load the CSV file: Use pandas.read_csv() to load the CSV file into a DataFrame.
a. The pd.read_csv() function is used to load the CSV file into a
DataFrame.
3. Manipulate the data:
 Filter rows based on conditions.
a. We filter the rows where the age is greater than 30 using
df[df['Age'] > 30].
 Add or modify columns.
a. 'Salary Increase', which increases the salary by 10%.
 Sort the data based on a column.
a. sort the DataFrame by the 'Salary' column in descending order using
sort_values().
 Perform basic arithmetic operations (e.g., increase salary by 10%).
a. df['Salary Increase'] = df['Salary'] * 1.10

4. Save the manipulated data: Save the modified DataFrame back to a CSV file using
to_csv().
 The to_csv() method is used to save the modified DataFrame back into a new
CSV file called 'manipulated_data.csv'.

Program

import pandas as pd

# Load CSV file into a DataFrame


df = pd.read_csv('data.csv')

# Display the original data


print("Original Data:")
print(df)

# 1. Filter rows where Age is greater than 30


filtered_df = df[df['Age'] > 30]

# 2. Add a new column 'Salary Increase' that increases salary by 10%


df['Salary Increase'] = df['Salary'] * 1.10

# 3. Sort the data by Salary in descending order


sorted_df = df.sort_values(by='Salary', ascending=False)

# 4. Save the manipulated data to a new CSV file


df.to_csv('manipulated_data.csv', index=False)

# Display the results after manipulation


print("\nFiltered Data (Age > 30):")
print(filtered_df)
print("\nData with Salary Increase:")
print(df)

print("\nSorted Data by Salary:")


print(sorted_df)

Input CSV File (data.csv):


Name, Age, Salary
Alice, 30 ,50000
Bob, 25, 45000
Charlie,35, 60000
David, 40, 75000
Eve, 29, 52000

Output:

Filtered Data (Age > 30):


This will display the rows where the Age is greater than 30.
Filtered Data (Age > 30):
Name Age Salary Salary Increase
2 Charlie 35 60000 66000
3 David 40 75000 82500

Data with Salary Increase (10% increase):


This shows the DataFrame with the added Salary Increase column.
Data with Salary Increase:
Name Age Salary Salary Increase
0 Alice 30 50000 55000
1 Bob 25 45000 49500
2 Charlie 35 60000 66000
3 David 40 75000 82500
4 Eve 29 52000 57200

Sorted Data by Salary (Descending Order):


This shows the DataFrame sorted by salary in descending order.

Sorted Data by Salary:


Name Age Salary Salary Increase
3 David 40 75000 82500
2 Charlie 35 60000 66000
0 Alice 30 50000 55000
4 Eve 29 52000 57200
1 Bob 25 45000 49500
Result:

Ex:No:6

Perform data preparation and visualization on the input CSV file using Pandas
matplotlib packages.

Aim:
To write a python program to perform data preparation and visualization on the
input CSV file using Pandas matplotlib packages.

Algorithm:

In this task, we will perform data preparation (such as cleaning, filtering, and transforming
the data) and visualization using Pandas and Matplotlib.

Steps:
1. Import necessary libraries: Import pandas for data handling and matplotlib.pyplot
for visualization.
2. Load the CSV file: Use pandas.read_csv() to load the CSV data into a DataFrame.

 The CSV file (data.csv) is loaded into a DataFrame using pd.read_csv().


 The data is then printed for the initial view.

3. Prepare the data: Perform data cleaning, transformation, and filtering (e.g., handle
missing values, create new columns).

 Handling Missing Data: If there are any missing values (NaN), we use
fillna(0) to replace them with 0.
 Adding a New Column: A new column, Salary Increase, is created by
multiplying the Salary column by 1.10 (10% increase).
 Filtering: The DataFrame is filtered to show individuals whose age is greater
than 30 using df[df['Age'] > 30].

4. Visualize the data: Create different plots (e.g., bar plot, line plot, pie chart) to
visualize the prepared data.

 Bar Plot: We use plt.bar() to create a bar chart that compares the original
salary and the salary after the increase for each individual.
 Pie Chart: We create a pie chart using plt.pie() to show the distribution of the
total salary increase across individuals.
 Line Plot: We use plt.plot() to plot the salary increase against the age of each
individual, showing the trend.

Program:

import pandas as pd
import matplotlib.pyplot as plt

# Load CSV file into DataFrame


df = pd.read_csv('data.csv')

# Display the original data


print("Original Data:")
print(df)

# Data Preparation:
# 1. Handle missing values (if any)
df.fillna(0, inplace=True) # Replace NaN with 0 (or any appropriate value)
# 2. Add a new column 'Salary Increase' with a 10% increase
df['Salary Increase'] = df['Salary'] * 1.10

# 3. Filter data (e.g., filter Age > 30)


filtered_df = df[df['Age'] > 30]

# Display the manipulated data


print("\nFiltered Data (Age > 30):")
print(filtered_df)

# Visualization:
# 1. Bar Plot: Show salary and salary increase for individuals
plt.figure(figsize=(10, 6))
plt.bar(df['Name'], df['Salary'], label='Original Salary', alpha=0.6)
plt.bar(df['Name'], df['Salary Increase'], label='Salary Increase', alpha=0.6)
plt.xlabel('Name')
plt.ylabel('Salary')
plt.title('Original Salary vs Salary Increase')
plt.legend()
plt.xticks(rotation=45)
plt.show()

# 2. Pie Chart: Show distribution of individuals' salary (based on Salary Increase)


salary_distribution = df['Salary Increase'].groupby(df['Name']).sum()
plt.figure(figsize=(8, 8))
plt.pie(salary_distribution, labels=salary_distribution.index, autopct='%1.1f%%',
startangle=140)
plt.title('Salary Distribution by Name (After Increase)')
plt.show()

# 3. Line Plot: Show Salary Increase Trend by Age


plt.figure(figsize=(10, 6))
plt.plot(df['Age'], df['Salary Increase'], marker='o', linestyle='-', color='b')
plt.xlabel('Age')
plt.ylabel('Salary Increase')
plt.title('Salary Increase Trend by Age')
plt.grid(True)
plt.show()

Output:

Original Data (data.csv):

Name, Age, Salary


Alice, 30, 50000
Bob, 25, 45000
Charlie,35, 60000
David, 40, 75000
Eve, 29, 52000

Filtered Data (Age > 30):


Name Age Salary Salary Increase
2 Charlie 35 60000 66000
3 David 40 75000 82500

Graph:

Result

You might also like