0% found this document useful (0 votes)
2 views

Data Visualizationyuo

The document contains multiple Python scripts demonstrating various data manipulation techniques using pandas and matplotlib. It covers computing Fibonacci numbers, generating numeric triangle patterns, performing statistical operations on the Iris dataset, modifying DataFrames, and visualizing data with line charts. Each section includes code snippets and expected outputs for clarity.

Uploaded by

ansarisshadan748
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data Visualizationyuo

The document contains multiple Python scripts demonstrating various data manipulation techniques using pandas and matplotlib. It covers computing Fibonacci numbers, generating numeric triangle patterns, performing statistical operations on the Iris dataset, modifying DataFrames, and visualizing data with line charts. Each section includes code snippets and expected outputs for clarity.

Uploaded by

ansarisshadan748
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Question 1: Write a python script to compute- Nth fibonacci number- Numeric

Triangle patterns:

def fibonacci(n):

a, b = 0, 1

for _ in range(n – 1):

a, b = b, a + b

return a

def numeric_triangle(rows):

for i in range(1, rows + 1):

print(“ “.join(str(num) for num in range(1, i + 1)))

n = int(input(“enter the value of n for fibonacci: “))

rows = int(input(“enter the number of rows for numeric triangle: “))

print(f”\n{n}th fibonacci number:”, fibonacci(n))

print(“\nnumeric triangle pattern:”)

numeric_triangle(rows)

Output:

enter the value of n for fibonacci: 10

enter the number of rows for numeric triangle: 5

10th fibonacci number: 34

numeric triangle pattern:

12

123

1234

12345

Question 2 : On the Iris Data set perform basic statistical operations,


sampling, find unique valuesand valuecounts.
import pandas as pd

df = pd.read_csv("iris.csv") # Ensure this file is in your working directory

print("🔹 First 5 rows of the dataset:") print(df.head())

print("\n🔹 Summary Statistics:") print(df.describe())

print("\n🔹 Data Types and Null Info:") print(df.info())

print("\n🔹 Random Sample (5 rows):") print(df.sample(n=5,


random_state=42))

print("\n🔹 Stratified Sampling (5 rows per species):") if 'species' in


df.columns: stratified_sample = df.groupby('species').sample(n=5,
random_state=42) print(stratified_sample) else: print("The column 'species'
is not found in the dataset.")

if 'species' in df.columns: print("\n🔹 Unique Species:")


print(df['species'].unique())

if 'species' in df.columns: print("\n🔹 Species Value Counts:")


print(df['species'].value_counts())

Output:

Sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

Count 150.000000 150.000000 150.000000 150.000000

Mean 5.843333 3.057333 3.758000 1.199333

Std 0.828066 0.435866 1.765298 0.762238

Min 4.300000 2.000000 1.000000 0.100000

25% 5.100000 2.800000 1.600000 0.300000

50% 5.800000 3.000000 4.350000 1.300000

75% 6.400000 3.300000 5.100000 1.800000

Max 7.900000 4.400000 6.900000 2.500000

Random Sample (5 Rows – Values Will Vary Each Time)


Sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

37 4.9 3.6 1.4 0.1

89 5.5 2.5 4.0 1.3

143 6.8 3.2 5.9 2.3

25 5.0 3.0 1.6 0.2

97 6.2 2.9 4.3 1.3

Unique Values & Value Counts (Example for Sepal Length):

Unique values in sepal length (cm): [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.4, 4.8, 4.3,
5.8, …]

Value counts in sepal length (cm):

5.0 10

5.1 9

6.3 7

Question 3 : On the Iris Data Set Show the addition of new columns, perform
filtering based on acolumn value and show the use of group by function.

import pandas as pd

df = pd.read_csv("iris.csv") # Ensure iris.csv is in the same directory or


provide full path

print("🔹 First 5 rows of the dataset:") print(df.head())


if 'sepal_length' in df.columns and 'sepal_width' in df.columns:
df['sepal_area'] = df['sepal_length'] * df['sepal_width'] print("\n🔹 Added new
column 'sepal_area':") print(df[['sepal_length', 'sepal_width',
'sepal_area']].head()) else: print("\n❗ Column names 'sepal_length' and/or
'sepal_width' not found in the dataset.")

if 'petal_length' in df.columns: filtered_df = df[df['petal_length'] > 1.5]


print("\n🔹 Rows where petal_length > 1.5:") print(filtered_df.head()) else:
print("\n❗ Column 'petal_length' not found in the dataset.")

if 'species' in df.columns: grouped =


df.groupby('species').mean(numeric_only=True) print("\n🔹 Average
measurements grouped by species:") print(grouped) else: print("\n❗ Column
'species' not found in the dataset.")

Output :

Sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
petal area

0 5.1 3.5 1.4 0.2 0.28

1 4.9 3.0 1.4 0.2 0.28

2 4.7 3.2 1.3 0.2 0.26

3 4.6 3.1 1.5 0.2 0.30

4 5.0 3.6 1.4 0.2 0.28

Filtered Data (Sepal Length > 6.0)

Sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
petal area

10 6.4 3.2 4.5 1.5 6.75

11 6.9 3.1 4.9 1.5 7.35

12 6.5 3.0 4.6 1.5 6.90

Grouped Data by Sepal Width (Mean Values)

Sepal length (cm) petal length (cm) petal width (cm) petal area
Sepal width (cm)

2.0 5.075 3.250 1.000 2.875

2.2 5.325 3.575 1.175 4.500

2.3 5.700 4.400 1.400 6.825

2.4 5.875 3.625 1.075 4.268

2.5 6.075 4.100 1.337 5.688

Question 4: On the Iris DataSet Compute correlation between two columns,


perform modification,deletion of columns, perform grouping based on
multiple columns and computestatistics by groups.

import pandas as pd

df = pd.read_csv("iris.csv") # Ensure iris.csv is in your working directory

print("🔹 First 5 rows:") print(df.head())

if 'sepal_length' in df.columns and 'petal_length' in df.columns: correlation =


df['sepal_length'].corr(df['petal_length']) print("\n🔹 Correlation between
'sepal_length' and 'petal_length':", correlation) else: print("\n❗ Required
columns not found for correlation.")
if 'sepal_length' in df.columns: df['sepal_length_rounded'] =
df['sepal_length'].round(1) print("\n🔹 Modified 'sepal_length' ->
'sepal_length_rounded':") print(df[['sepal_length',
'sepal_length_rounded']].head())

if 'sepal_length_rounded' in df.columns: df.drop('sepal_length_rounded',


axis=1, inplace=True) print("\n🔹 Dropped column 'sepal_length_rounded'")

if 'petal_length' in df.columns: df['petal_size'] = pd.cut(df['petal_length'],


bins=[0, 2, 5, float('inf')], labels=['small', 'medium', 'large']) print("\n🔹 Added
new column 'petal_size' based on 'petal_length'")

if 'species' in df.columns and 'petal_size' in df.columns: grouped_stats =


df.groupby(['species', 'petal_size']).mean(numeric_only=True) print("\n🔹
Grouped statistics by 'species' and 'petal_size':") print(grouped_stats)

Output :

Sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
petal area

0 5.1 3.5 1.4 0.2 0.28

1 4.9 3.0 1.4 0.2 0.28

2 4.7 3.2 1.3 0.2 0.26

3 4.6 3.1 1.5 0.2 0.30

4 5.0 3.6 1.4 0.2 0.28

Filtered Data (Sepal Length > 6.0)

Sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
petal area

10 6.4 3.2 4.5 1.5 6.75


11 6.9 3.1 4.9 1.5 7.35

12 6.5 3.0 4.6 1.5 6.90

Grouped Data by Sepal Width (Mean Values)

Sepal length (cm) petal length (cm) petal width (cm) petal area

Sepal width (cm)

2.0 5.075 3.250 1.000 2.875

2.2 5.325 3.575 1.175 4.500

2.3 5.700 4.400 1.400 6.825

2.4 5.875 3.625 1.075 4.268

2.5 6.075 4.100 1.337 5.688

Question 5: On the height - weight DataSet print the top 5, bottom 5, and
random rows. Group bythe height column and update the height of students
in the group having no. ofstudents greater than 70.

import pandas as pd

df = pd.read_csv(“height_weight.csv”) # update file path if needed

print(“\ntop 5 rows:\n”, df.head())

print(“\nbottom 5 rows:\n”, df.tail())

print(“\nrandom sample rows:\n”, df.sample(5))

height_groups = df.groupby(“height”).size().reset_index(name=”student
count”)

print(“\ngrouped by height (student count):\n”, height_groups.head())

heights_to_update = height_groups[height_groups[“student count”] > 70]


[“height”]

df.loc[df[“height”].isin(heights_to_update), “height”] += 0.5


print(“\nupdated data sample after modification:\n”, df.head())

Output:

Height Weight

0 150.0 50.0

1 160.2 65.5

2 155.5 58.3

3 170.0 75.0

4 165.1 68.4

Bottom 5 Rows

Height Weight

95 165.0 68.0

96 155.8 59.0

97 172.5 78.3

98 167.0 70.2

99 160.0 65.0

Random Sample Rows

Height Weight

42 172.0 77.5

67 158.5 60.8

23 161.0 64.2

89 153.0 55.1

12 170.0 74.0

Grouped by Height (Student Count)

Height Student Count

0 150.0 10

1 155.5 25

2 160.0 80
3 165.0 95

4 170.0 60

Updated Heights Where Student Count > 70

Height Weight

0 150.0 50.0

1 160.5 65.5 # Updated from 160.0 to 160.5

2 155.5 58.3

3 170.0 75.0

4 165.5 68.4 # Updated from 165.0 to 165.5

Question 6: Show the use of shape, size, type, dtypes, columns and info
properties of a DataFrame.

import pandas as pd

data = {

"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],

"Age": [25, 30, 22, 35, 28],

"Salary": [50000, 60000, 55000, 70000, 65000],

"Department": ["HR", "IT", "Finance", "IT", "HR"]

df = pd.DataFrame(data)

print("\nShape of DataFrame:", df.shape) # Output: (5, 4)

print("\nSize of DataFrame:", df.size) # Output: 20 (5 rows * 4 columns)

print("\nType of DataFrame:", type(df)) # Output: <class


'pandas.core.frame.DataFrame'>

print("\nData Types of Columns:\n", df.dtypes)


print("\nColumns in DataFrame:", df.columns)

print("\nDataFrame Info:")

df.info()

Output:

shape (rows & columns)

shape of dataframe: (5, 4)

size (total elements)

size of dataframe: 20

type of the dataframe

type of dataframe: <class ‘pandas.core.frame.dataframe’>

data types of each column

data types of columns:

name object

age int64

salary int64

department object

dtype: object

columns in dataframe

columns in dataframe: index([‘name’, ‘age’, ‘salary’, ‘department’],


dtype=’object’)

summary info

dataframe info:

<CLASS ‘PANDAS.CORE.FRAME.DATAFRAME’>

rangeindex: 5 entries, 0 to 4

data columns (total 4 columns):

# column non-null count dtype

--- ------ -------------- -----


0 name 5 non-null object

1 age 5 non-null int64

2 salary 5 non-null int64

3 department 5 non-null object

dtypes: int64(2), object(2)

memory usage: 288.0 bytes

Question 7: Draw a line chart exploring its styling properties like figsize,
xlabel, ylabel, title,subtitle, color, marker, linestyle, linewidth.

import matplotlib.pyplot as plt

years = [2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024]

sales = [10, 15, 12, 20, 18, 25, 30, 35, 40, 38] # sales in millions

plt.figure(figsize=(10, 5)) # set figure size

plt.plot(years, sales,

color=’blue’, # line color

marker=’o’, # data point marker

linestyle=’—‘, # dashed line style

linewidth=2) # line width

plt.xlabel(“year”, fontsize=12, color=’darkred’) # x-axis label

plt.ylabel(“sales (in millions)”, fontsize=12, color=’darkgreen’) # y-axis


label

plt.title(“company sales growth”, fontsize=14, fontweight=’bold’) # main


title
plt.suptitle(“analysis of sales from 2015 to 2024”, fontsize=10, color=’gray’)
# subtitle

plt.grid(true) # add grid for better readability

plt.show()

Output :
Question 8: Draw a scatter plot exploring its properties like color, alpha, size,
labels.

import matplotlib.pyplot a advertising

budget = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

sales = [15, 25, 35, 30, 55, 60, 70, 85, 90, 100]

plt.figure(figsize=(8, 5)) # set figure size

plt.scatter(budget, sales,

color=’red’, # set marker color

alpha=0.7, # set transparency level

s=100, # set marker size

edgecolors=’black’) # add black border to markers

plt.xlabel(“advertising budget ($1000s)”, fontsize=12, color=’darkblue’)

plt.ylabel(“sales (in millions)”, fontsize=12, color=’darkgreen’)

plt.title(“impact of advertising budget on sales”, fontsize=14,


fontweight=’bold’)

plt.grid(true) # enable grid


plt.show()

Output:
Question 9: Draw a bar graph with vertical and horizontal orientations.
Explore color, width, heightand other properties.

import matplotlib.pyplot as plt

products = [“product a”, “product b”, “product c”, “product d”, “product e”]

sales = [25, 40, 30, 50, 35]

fig, axs = plt.subplots(1, 2, figsize=(12, 5)) # 1 row, 2 columns

axs[0].bar(products, sales,

color=’blue’, # bar color

edgecolor=’black’, # border color

width=0.5) # bar width

axs[0].set_title(“vertical bar graph”, fontsize=14, fontweight=’bold’)

axs[0].set_xlabel(“products”, fontsize=12, color=’darkred’)

axs[0].set_ylabel(“sales (in millions)”, fontsize=12, color=’darkgreen’)

axs[1].barh(products, sales,

color=’orange’, # bar color

edgecolor=’black’, # border color


height=0.5) # bar height

axs[1].set_title(“horizontal bar graph”, fontsize=14, fontweight=’bold’)

axs[1].set_xlabel(“sales (in millions)”, fontsize=12, color=’darkred’)

axs[1].set_ylabel(“products”, fontsize=12, color=’darkgreen’)

plt.tight_layout()

plt.show()
Output :
QQuestion10: Draw a histogram exploring properties like Bins, colors, alpha,
labels, legend and fontsize.

import matplotlib.pyplot as plt

import numpy as np

data = np.random.randn(500) * 10 + 50 # mean = 50, std dev = 10

plt.figure(figsize=(8, 5)) # set figure size

plt.hist(data,

bins=10, # number of bins

color=’skyblue’, # bar color

alpha=0.7, # transparency level

edgecolor=’black’, # border color

label=”distribution of values”) # legend label

plt.xlabel(“value range”, fontsize=12, color=’darkblue’)

plt.ylabel(“frequency”, fontsize=12, color=’darkgreen’)

plt.title(“histogram of sample data”, fontsize=14, fontweight=’bold’)

plt.legend(fontsize=12)

plt.grid(true) # enable grid for better readability

plt.show()
Output :

Question11: Draw a pie chart exploring its properties like labels, colors,
radius, explode, shadow,autopct.
import matplotlib.pyplot as plt

labels = [“company a”, “company b”, “company c”, “company d”]

sizes = [30, 25, 20, 25] # percentage share

colors = [“blue”, “orange”, “green”, “red”] # custom colors

explode = (0.1, 0, 0, 0) # explode first slice

plt.figure(figsize=(7, 7)) # set figure size

plt.pie(sizes,

labels=labels, # assign labels

colors=colors, # assign colors

autopct=”%1.1f%%”, # show percentages (1 decimal place)

explode=explode, # highlight first slice

shadow=true, # add shadow effect

radius=1.2, # adjust pie size

startangle=140) # rotate start angle

plt.title(“market share distribution”, fontsize=14, fontweight=’bold’)

plt.show()
Output:

Question 12 : Draw line chart, Scatter plot, histogram on the iris data set
with styling.
import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")

plt.figure(figsize=(10, 5))

plt.plot(df['sepal_length'],

color='blue',

marker='o',

linestyle='--',

linewidth=2,

label='Sepal Length')

plt.title("Line Chart - Sepal Length", fontsize=14, fontweight='bold')

plt.xlabel("Index", fontsize=12)

plt.ylabel("Sepal Length (cm)", fontsize=12)

plt.grid(True, linestyle='--', alpha=0.5)

plt.legend()

plt.tight_layout()

plt.show()

plt.figure(figsize=(8, 5))

plt.scatter(df['petal_length'],

df['petal_width'],

color='green',
alpha=0.6,

s=80,

edgecolor='black')

plt.title("Scatter Plot - Petal Length vs Petal Width", fontsize=14,


fontweight='bold')

plt.xlabel("Petal Length (cm)", fontsize=12)

plt.ylabel("Petal Width (cm)", fontsize=12)

plt.grid(True, linestyle='--', alpha=0.5)

plt.tight_layout()

plt.show()

plt.figure(figsize=(8, 5))

plt.hist(df['sepal_width'],

bins=12

color='purple',

alpha=0.7,

edgecolor='black',

label='Sepal Width')

plt.title("Histogram - Sepal Width", fontsize=14, fontweight='bold')

plt.xlabel("Sepal Width (cm)", fontsize=12)

plt.ylabel("Frequency", fontsize=12)

plt.legend()

plt.grid(axis='y', linestyle='--', alpha=0.5)


plt.tight_layout()

plt.show()

Output :
Question 13 : Draw boxplot with the properties like facecolor, colors,
capprops like color andlinewidth. Show how the box plot can be used to
detect outliers. Add two outlier rowsmanually

import pandas as pd

import matplotlib.pyplot as plt

df = pd.read_csv("iris.csv")

outliers = pd.DataFrame({

'sepal_length': [15.0, 16.5], # much higher than typical max ~7.9

'sepal_width': [3.0, 3.1],

'petal_length': [5.1, 5.9],

'petal_width': [1.8, 2.5],

'species': ['setosa', 'versicolor'] # arbitrary species

})

df = pd.concat([df, outliers], ignore_index=True)


plt.figure(figsize=(8, 6))

box = plt.boxplot(df['sepal_length'],

patch_artist=True, # To style the facecolor

boxprops=dict(facecolor='lightblue', color='blue', linewidth=2),

capprops=dict(color='darkgreen', linewidth=2),

whiskerprops=dict(color='gray', linewidth=2, linestyle='--'),

flierprops=dict(marker='o', markerfacecolor='red', markersize=8,


linestyle='none'),

medianprops=dict(color='black', linewidth=2)

plt.title("Boxplot of Sepal Length (With Outliers Added)")

plt.ylabel("Sepal Length (cm)")

plt.grid(True, linestyle='--', alpha=0.5)

plt.tight_layout()

plt.show()

Output:

You might also like