PRINCIPLES OF DATA SCIENCE Lab
PRINCIPLES OF DATA SCIENCE Lab
List of Experiments
1. Implement basic data types and user defined functions using Python.
3. Perform basic computations and manipulation on arrays using the NumPy package.
4. Perform manipulations on the CSV files and images using NumPy package.
5. Perform data manipulation on the input CSV file using Pandas Package.
6. Perform data preparation and visualization on the input CSV file using Pandas
matplotlibpackages.
7. Perform Exploratory Data Analysis on a CSV file. Import any CSV file and
8. Compute the following for the given data set using the NumPy package. Compute
9. Download data from UCI Machine Learning Repository or Kaggle and perform the
10. Download data from UCI Machine Learning Repository or Kaggle and perform the
Implement basic data types and user defined functions using Python.
Aim:
To implement basic data types and user defined functions using Python.
Procedure:
1. Basic Data Types:
a. Integer, Float, String, Boolean, List, Tuple, Dictionary, Set
2. User-Defined Functions:
a. Functions with parameters
b. Functions with return values
c. Functions without return values
Program:
Basic Data Types in Python
# Integer
age = 25
print("Age:", age)
# Float
temperature = 25.6
print("Temperature:", temperature)
# String
name = "John Doe"
print("Name:", name)
# Boolean
is_student = True
print("Is student:", is_student)
Output:
Age: 25
Temperature: 25.6
Name: John Doe
Is student: True
Fruits: ['apple', 'banana', 'cherry']
Coordinates: (10, 20, 30)
Student info: {'name': 'John', 'age': 25, 'is_student': True}
Numbers: {1, 2, 3, 4, 5}
Complex function
import math
# Function to calculate area of circle
def calculate_area(radius):
if radius <= 0:
return "Radius must be greater than zero"
area = math.pi * radius ** 2
return area
radius = 5
area = calculate_area(radius)
print(f"Area of circle with radius {radius}: {area}")
A function that processes a list of numbers and returns their square values:
# Function to return square of each number in a list
def square_numbers(numbers):
squared = []
for num in numbers:
squared.append(num ** 2)
return squared
numbers = [1, 2, 3, 4, 5]
result = square_numbers(numbers)
print("Squared numbers:", result)
Functions can take parameters, return values, and allow you to organize and reuse code.
Functions can also have default parameters and can be used to process data like lists or
perform calculations.
EX:NO:2
Aim:
Procedure:
1. Working with Python Packages – Importing and using external libraries.
import and use external libraries (like NumPy) for various tasks such as
array manipulation, mathematical operations, etc.
2. Working with Files – Reading from and writing to text files.
Perform various file operations like writing, reading, and appending to text files using
Python's built-in file handling functions
Writing to the file: We use the 'w' mode to write to the file, which will create
the file if it doesn't exist and overwrite it if it does.
Reading from the file: The 'r' mode is used to open the file in read-only
mode.
Appending to the file: The 'a' mode allows appending new data at the end of
the file without overwriting the existing content.
3. Working with CSV Files using the csv Package
i. Writing to CSV: We open the file in write mode ('w') and use the
csv.writer() object to write rows to the CSV file.
ii. Reading from CSV: We open the file in read mode ('r') and use
csv.reader() to read the rows of the CSV file.
Python has a vast range of external packages (libraries) that help you work efficiently in
various domains like data analysis, machine learning, file handling, etc. We will use the
NumPy package for array manipulation as an example.
import numpy as np
# Create a 2D numpy array
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
OUTPUT:
Original Array:
[[1 2 3]
[4 5 6]
[7 8 9]]
Sum of elements: 45
Mean of elements: 5.0
Transposed Array:
[[1 4 7]
[2 5 8]
[3 6 9]]
This program demonstrates how to use a package (NumPy) to create an array and perform
basic operations like sum, mean, and transpose.
OUTPUT:
import csv
OUTPUT:
Ex :No: 3
Perform basic computations and manipulation on arrays using the NumPy package.
Aim:
Algorithm:
1. Import the NumPy Package: Import the numpy library to use its functions.
2. Create Arrays: Initialize arrays using np.array().
a. We create two 1D arrays array1 and array2.
3. Perform Basic Computations:
Addition, Subtraction, Multiplication, Division: Apply these arithmetic
operations on arrays.
Element-wise Operations: Use NumPy’s vectorized operations for efficient
computation.
4. Manipulate Arrays:
Reshaping: Use reshape() to change the shape of an array.
a. The reshape() method changes the shape of array1 to a 2x2 matrix,
and slicing extracts the second and third elements of array1.
Slicing: Extract parts of an array using indexing or slicing.
Concatenation: Combine arrays using np.concatenate().
a. The np.concatenate() function merges array1 and array2 into one
array.
5. Display Results: Output the results of computations and manipulations.
Program
import numpy as np
# 1. Create arrays
array1 = np.array([1, 2, 3, 4])
array2 = np.array([5, 6, 7, 8])
# 2. Basic computations
addition = array1 + array2
subtraction = array1 - array2
multiplication = array1 * array2
division = array1 / array2
# 3. Array manipulation
reshaped_array = array1.reshape(2, 2) # Reshape to a 2x2 array
sliced_array = array1[1:3] # Slice the array (elements 1 to 2)
# 4. Concatenate arrays
concatenated_array = np.concatenate((array1, array2))
# 5. Output results
print("Array 1:", array1)
print("Array 2:", array2)
print("Addition of arrays:", addition)
print("Subtraction of arrays:", subtraction)
print("Multiplication of arrays:", multiplication)
print("Division of arrays:", division)
print("Reshaped Array:", reshaped_array)
print("Sliced Array:", sliced_array)
print("Concatenated Array:", concatenated_array)
Output
Array 1: [1 2 3 4]
Array 2: [5 6 7 8]
Addition of arrays: [ 6 8 10 12]
Subtraction of arrays: [-4 -4 -4 -4]
Multiplication of arrays: [ 5 12 21 32]
Division of arrays: [0.2 0.33333333 0.42857143 0.5]
Reshaped Array: [[1 2]
[3 4]]
Sliced Array: [2 3]
Concatenated Array: [1 2 3 4 5 6 7 8]
Result:
Ex:No:5
Perform manipulations on the CSV files and images using NumPy package.
Aim:
To write a python to Perform manipulations on the CSV files and images using
NumPy package.
Algorithm:
1. CSV File Manipulation:
Import necessary libraries (numpy and pandas).
Load a CSV file using np.genfromtxt() or pandas.read_csv().
a. np.genfromtxt() loads a CSV file, where dtype=None allows automatic
detection of the data type, and names=True uses the first row as
column names.
Perform operations (e.g., basic arithmetic, filtering, slicing).
a. Here, we increased the "Salary" column values by 10% using simple
arithmetic operations.
Save or export the manipulated data.
a. The manipulated data is saved to a new CSV file using np.savetxt().
2. Image Manipulation:
Load an image using imageio.imread() or PIL.Image and convert it to a
NumPy array.
a. mageio.imread() reads the image file, converting it into a NumPy
array.
b. Grayscale Conversion: The image is converted to grayscale by
averaging the RGB channels.
Manipulate the image array (e.g., adjust brightness, rotate, crop).
a. The pixel values are multiplied by 1.5 (brightening the image), and
np.clip() ensures values stay within the range [0, 255].
Save the image using imageio.imwrite() or PIL.Image.save().
Program:
import numpy as np
import pandas as pd
Manipulating Images:
import numpy as np
import imageio
from PIL import Image
Output
After applying the operation of increasing the salary by 10%, the new manipulated_data.csv
would be:
import csv
Result :
Ex:No:6
Perform data manipulation on the input CSV file using Pandas Package
Aim:
To write a python to Perform data manipulation on the input CSV file using Pandas
Package.
Algorithm:
Pandas is a powerful library for data manipulation and analysis in Python. Below is an
example of how to manipulate an input CSV file using the Pandas package.
1. Import necessary libraries: Import pandas and numpy for data manipulation.
2. Load the CSV file: Use pandas.read_csv() to load the CSV file into a DataFrame.
a. The pd.read_csv() function is used to load the CSV file into a
DataFrame.
3. Manipulate the data:
Filter rows based on conditions.
a. We filter the rows where the age is greater than 30 using
df[df['Age'] > 30].
Add or modify columns.
a. 'Salary Increase', which increases the salary by 10%.
Sort the data based on a column.
a. sort the DataFrame by the 'Salary' column in descending order using
sort_values().
Perform basic arithmetic operations (e.g., increase salary by 10%).
a. df['Salary Increase'] = df['Salary'] * 1.10
4. Save the manipulated data: Save the modified DataFrame back to a CSV file using
to_csv().
The to_csv() method is used to save the modified DataFrame back into a new
CSV file called 'manipulated_data.csv'.
Program
import pandas as pd
Output:
Ex:No:6
Perform data preparation and visualization on the input CSV file using Pandas
matplotlib packages.
Aim:
To write a python program to perform data preparation and visualization on the
input CSV file using Pandas matplotlib packages.
Algorithm:
In this task, we will perform data preparation (such as cleaning, filtering, and transforming
the data) and visualization using Pandas and Matplotlib.
Steps:
1. Import necessary libraries: Import pandas for data handling and matplotlib.pyplot
for visualization.
2. Load the CSV file: Use pandas.read_csv() to load the CSV data into a DataFrame.
3. Prepare the data: Perform data cleaning, transformation, and filtering (e.g., handle
missing values, create new columns).
Handling Missing Data: If there are any missing values (NaN), we use
fillna(0) to replace them with 0.
Adding a New Column: A new column, Salary Increase, is created by
multiplying the Salary column by 1.10 (10% increase).
Filtering: The DataFrame is filtered to show individuals whose age is greater
than 30 using df[df['Age'] > 30].
4. Visualize the data: Create different plots (e.g., bar plot, line plot, pie chart) to
visualize the prepared data.
Bar Plot: We use plt.bar() to create a bar chart that compares the original
salary and the salary after the increase for each individual.
Pie Chart: We create a pie chart using plt.pie() to show the distribution of the
total salary increase across individuals.
Line Plot: We use plt.plot() to plot the salary increase against the age of each
individual, showing the trend.
Program:
import pandas as pd
import matplotlib.pyplot as plt
# Data Preparation:
# 1. Handle missing values (if any)
df.fillna(0, inplace=True) # Replace NaN with 0 (or any appropriate value)
# 2. Add a new column 'Salary Increase' with a 10% increase
df['Salary Increase'] = df['Salary'] * 1.10
# Visualization:
# 1. Bar Plot: Show salary and salary increase for individuals
plt.figure(figsize=(10, 6))
plt.bar(df['Name'], df['Salary'], label='Original Salary', alpha=0.6)
plt.bar(df['Name'], df['Salary Increase'], label='Salary Increase', alpha=0.6)
plt.xlabel('Name')
plt.ylabel('Salary')
plt.title('Original Salary vs Salary Increase')
plt.legend()
plt.xticks(rotation=45)
plt.show()
Output:
Graph:
Result