0% found this document useful (0 votes)
5 views

Data Visualization

The document contains a series of Python scripts that demonstrate various data manipulation and visualization techniques using the Iris dataset and a height-weight dataset. It covers computing Fibonacci numbers, generating numeric triangle patterns, performing statistical operations, filtering and grouping data, and visualizing data with line charts, scatter plots, and bar graphs. Each section includes code snippets, expected outputs, and explanations of the operations performed.

Uploaded by

ansarisshadan748
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Data Visualization

The document contains a series of Python scripts that demonstrate various data manipulation and visualization techniques using the Iris dataset and a height-weight dataset. It covers computing Fibonacci numbers, generating numeric triangle patterns, performing statistical operations, filtering and grouping data, and visualizing data with line charts, scatter plots, and bar graphs. Each section includes code snippets, expected outputs, and explanations of the operations performed.

Uploaded by

ansarisshadan748
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Question 1: Write a python script to compute- Nth fibonacci number- Numeric

Triangle patterns:

Def fibonacci(n):

A, b = 0, 1

For _ in range(n – 1):

A, b = b, a + b

Return a

Def numeric_triangle(rows):

For I in range(1, rows + 1):

Print(“ “.join(str(num) for num in range(1, I + 1)))

N = int(input(“Enter the value of N for Fibonacci: “))

Rows = int(input(“Enter the number of rows for Numeric Triangle: “))

Print(f”\n{n}th Fibonacci Number:”, fibonacci(n))

Print(“\nNumeric Triangle Pattern:”)

Numeric_triangle(rows)

Output:

Enter the value of N for Fibonacci: 10

Enter the number of rows for Numeric Triangle: 5

10th Fibonacci Number: 34

Numeric Triangle Pattern:

12

123

1234

12345

Question 2 : On the Iris Data set perform basic statistical operations,


sampling, find unique valuesand valuecounts.
Import pandas as pd

From sklearn.datasets import load_iris

Iris = load_iris()

Df = pd.DataFrame(iris.data, columns=iris.feature_names)

Print(“\nBasic Statistics:\n”,

Print(“\nRandom Sample:\n”, df.sample(5))

For col in df.columns:

Print(f”\nUnique values in {col}: {df[col].unique()}”)

Print(f”Value counts in {col}:\n{df[col].value_counts()}”)

Output:

Sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

Count 150.000000 150.000000 150.000000 150.000000

Mean 5.843333 3.057333 3.758000 1.199333

Std 0.828066 0.435866 1.765298 0.762238

Min 4.300000 2.000000 1.000000 0.100000

25% 5.100000 2.800000 1.600000 0.300000

50% 5.800000 3.000000 4.350000 1.300000

75% 6.400000 3.300000 5.100000 1.800000

Max 7.900000 4.400000 6.900000 2.500000

Random Sample (5 Rows – Values Will Vary Each Time)

Sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

37 4.9 3.6 1.4 0.1

89 5.5 2.5 4.0 1.3

143 6.8 3.2 5.9 2.3

25 5.0 3.0 1.6 0.2

97 6.2 2.9 4.3 1.3

Unique Values & Value Counts (Example for Sepal Length):


Unique values in sepal length (cm): [5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.4, 4.8, 4.3,
5.8, …]

Value counts in sepal length (cm):

5.0 10

5.1 9

6.3 7

Question 3 : On the Iris Data Set Show the addition of new columns, perform
filtering based on acolumn value and show the use of group by function.

Import pandas as pd

From sklearn.datasets import load_iris

Iris = load_iris()

Df = pd.DataFrame(iris.data, columns=iris.feature_names)

Df[“petal area”] = df[“petal length (cm)”] * df[“petal width (cm)”]

Print(“\nData with New Column (Petal Area):\n”, df.head())

Filtered_df = df[df[“sepal length (cm)”] > 6.0]

Print(“\nFiltered Data (Sepal Length > 6.0):\n”, filtered_df.head())

Grouped_df = df.groupby(“sepal width (cm)”).mean()

Print(“\nGrouped Data by Sepal Width (Mean Values):\n”, grouped_df.head())

Output :

Sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
petal area

0 5.1 3.5 1.4 0.2 0.28

1 4.9 3.0 1.4 0.2 0.28

2 4.7 3.2 1.3 0.2 0.26

3 4.6 3.1 1.5 0.2 0.30

4 5.0 3.6 1.4 0.2 0.28

Filtered Data (Sepal Length > 6.0)


Sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
petal area

10 6.4 3.2 4.5 1.5 6.75

11 6.9 3.1 4.9 1.5 7.35

12 6.5 3.0 4.6 1.5 6.90

Grouped Data by Sepal Width (Mean Values)

Sepal length (cm) petal length (cm) petal width (cm) petal area

Sepal width (cm)

2.0 5.075 3.250 1.000 2.875

2.2 5.325 3.575 1.175 4.500

2.3 5.700 4.400 1.400 6.825

2.4 5.875 3.625 1.075 4.268

2.5 6.075 4.100 1.337 5.688

Question 4: On the Iris DataSet Compute correlation between two columns,


perform modification,deletion of columns, perform grouping based on
multiple columns and computestatistics by groups.

Import pandas as pd

From sklearn.datasets import load_iris

Iris = load_iris()

Df = pd.DataFrame(iris.data, columns=iris.feature_names)

Df[“petal area”] = df[“petal length (cm)”] * df[“petal width (cm)”]

Print(“\nData with New Column (Petal Area):\n”, df.head())

Filtered_df = df[df[“sepal length (cm)”] > 6.0]

Print(“\nFiltered Data (Sepal Length > 6.0):\n”, filtered_df.head())

Grouped_df = df.groupby(“sepal width (cm)”).mean()

Print(“\nGrouped Data by Sepal Width (Mean Values):\n”, grouped_df.head())


Output :

Sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
petal area

0 5.1 3.5 1.4 0.2 0.28

1 4.9 3.0 1.4 0.2 0.28

2 4.7 3.2 1.3 0.2 0.26

3 4.6 3.1 1.5 0.2 0.30

4 5.0 3.6 1.4 0.2 0.28

Filtered Data (Sepal Length > 6.0)

Sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
petal area

10 6.4 3.2 4.5 1.5 6.75

11 6.9 3.1 4.9 1.5 7.35

12 6.5 3.0 4.6 1.5 6.90

Grouped Data by Sepal Width (Mean Values)

Sepal length (cm) petal length (cm) petal width (cm) petal area

Sepal width (cm)

2.0 5.075 3.250 1.000 2.875

2.2 5.325 3.575 1.175 4.500

2.3 5.700 4.400 1.400 6.825

2.4 5.875 3.625 1.075 4.268

2.5 6.075 4.100 1.337 5.688

Question 5: On the height - weight DataSet print the top 5, bottom 5, and
random rows. Group bythe height column and update the height of students
in the group having no. ofstudents greater than 70.
Import pandas as pd

Df = pd.read_csv(“height_weight.csv”) # Update file path if needed

Print(“\nTop 5 Rows:\n”, df.head())

Print(“\nBottom 5 Rows:\n”, df.tail())

Print(“\nRandom Sample Rows:\n”, df.sample(5))

Height_groups = df.groupby(“Height”).size().reset_index(name=”Student
Count”)

Print(“\nGrouped by Height (Student Count):\n”, height_groups.head())

Heights_to_update = height_groups[height_groups[“Student Count”] > 70]


[“Height”]

Df.loc[df[“Height”].isin(heights_to_update), “Height”] += 0.5

Print(“\nUpdated Data Sample After Modification:\n”, df.head())

Output:

Height Weight

0 150.0 50.0

1 160.2 65.5

2 155.5 58.3

3 170.0 75.0

4 165.1 68.4

Bottom 5 Rows

Height Weight

95 165.0 68.0

96 155.8 59.0

97 172.5 78.3

98 167.0 70.2

99 160.0 65.0
Random Sample Rows

Height Weight

42 172.0 77.5

67 158.5 60.8

23 161.0 64.2

89 153.0 55.1

12 170.0 74.0

Grouped by Height (Student Count)

Height Student Count

0 150.0 10

1 155.5 25

2 160.0 80

3 165.0 95

4 170.0 60

Updated Heights Where Student Count > 70

Height Weight

0 150.0 50.0

1 160.5 65.5 # Updated from 160.0 to 160.5

2 155.5 58.3

3 170.0 75.0

4 165.5 68.4 # Updated from 165.0 to 165.5

Question 6: Show the use of shape, size, type, dtypes, columns and info
properties of a DataFrame.

import pandas as pd

data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],

"Age": [25, 30, 22, 35, 28],

"Salary": [50000, 60000, 55000, 70000, 65000],

"Department": ["HR", "IT", "Finance", "IT", "HR"]

df = pd.DataFrame(data)

print("\nShape of DataFrame:", df.shape) # Output: (5, 4)

print("\nSize of DataFrame:", df.size) # Output: 20 (5 rows * 4 columns)

print("\nType of DataFrame:", type(df)) # Output: <class


'pandas.core.frame.DataFrame'>

print("\nData Types of Columns:\n", df.dtypes)

print("\nColumns in DataFrame:", df.columns)

print("\nDataFrame Info:")

df.info()

Output:

Shape (Rows & Columns)

Shape of DataFrame: (5, 4)

Size (Total Elements)

Size of DataFrame: 20

Type of the DataFrame

Type of DataFrame: <class ‘pandas.core.frame.DataFrame’>

Data Types of Each Column

Data Types of Columns:

Name object

Age int64

Salary int64

Department object
Dtype: object

Columns in DataFrame

Columns in DataFrame: Index([‘Name’, ‘Age’, ‘Salary’, ‘Department’],


dtype=’object’)

Summary Info

DataFrame Info:

<class ‘pandas.core.frame.DataFrame’>

RangeIndex: 5 entries, 0 to 4

Data columns (total 4 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Name 5 non-null object

1 Age 5 non-null int64

2 Salary 5 non-null int64

3 Department 5 non-null object

Dtypes: int64(2), object(2)

Memory usage: 288.0 bytes

Question 7: Draw a line chart exploring its styling properties like figsize,
xlabel, ylabel, title,subtitle, color, marker, linestyle, linewidth.

Import matplotlib.pyplot as plt

Years = [2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024]

Sales = [10, 15, 12, 20, 18, 25, 30, 35, 40, 38] # Sales in millions

Plt.figure(figsize=(10, 5)) # Set figure size

Plt.plot(years, sales,

Color=’blue’, # Line color

Marker=’o’, # Data point marker


Linestyle=’—‘, # Dashed line style

Linewidth=2) # Line width

Plt.xlabel(“Year”, fontsize=12, color=’darkred’) # X-axis label

Plt.ylabel(“Sales (in millions)”, fontsize=12, color=’darkgreen’) # Y-axis


label

Plt.title(“Company Sales Growth”, fontsize=14, fontweight=’bold’) # Main


title

Plt.suptitle(“Analysis of Sales from 2015 to 2024”, fontsize=10, color=’gray’)


# Subtitle

Plt.grid(True) # Add grid for better readability

Plt.show()

Output :--‐-------

Question 8: Draw a scatter plot exploring its properties like color, alpha, size,
labels.

Import matplotlib.pyplot a Advertising

Budget = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

Sales = [15, 25, 35, 30, 55, 60, 70, 85, 90, 100]

Plt.figure(figsize=(8, 5)) # Set figure size

Plt.scatter(budget, sales,

Color=’red’, # Set marker color

Alpha=0.7, # Set transparency level

S=100, # Set marker size

Edgecolors=’black’) # Add black border to markers


Plt.xlabel(“Advertising Budget ($1000s)”, fontsize=12, color=’darkblue’)

Plt.ylabel(“Sales (in millions)”, fontsize=12, color=’darkgreen’)

Plt.title(“Impact of Advertising Budget on Sales”, fontsize=14,


fontweight=’bold’)

Plt.grid(True) # Enable grid

Plt.show()

Output: ------

Question 9: Draw a bar graph with vertical and horizontal orientations.


Explore color, width, heightand other properties.

Import matplotlib.pyplot as plt

Products = [“Product A”, “Product B”, “Product C”, “Product D”, “Product E”]

Sales = [25, 40, 30, 50, 35]

Fig, axs = plt.subplots(1, 2, figsize=(12, 5)) # 1 row, 2 columns

Axs[0].bar(products, sales,

Color=’blue’, # Bar color

Edgecolor=’black’, # Border color

Width=0.5) # Bar width

Axs[0].set_title(“Vertical Bar Graph”, fontsize=14, fontweight=’bold’)

Axs[0].set_xlabel(“Products”, fontsize=12, color=’darkred’)

Axs[0].set_ylabel(“Sales (in millions)”, fontsize=12, color=’darkgreen’)

Axs[1].barh(products, sales,

Color=’orange’, # Bar color

Edgecolor=’black’, # Border color


Height=0.5) # Bar height

Axs[1].set_title(“Horizontal Bar Graph”, fontsize=14, fontweight=’bold’)

Axs[1].set_xlabel(“Sales (in millions)”, fontsize=12, color=’darkred’)

Axs[1].set_ylabel(“Products”, fontsize=12, color=’darkgreen’)

Plt.tight_layout()

Plt.show()

Output : ------

QQuestion10: Draw a histogram exploring properties like Bins, colors, alpha,


labels, legend and fontsize.

# Import necessary libraries

Import matplotlib.pyplot as plt

Import numpy as np

# Generate sample data (Random Normally Distributed Data)

Data = np.random.randn(500) * 10 + 50 # Mean = 50, Std Dev = 10

# Create the histogram

Plt.figure(figsize=(8, 5)) # Set figure size

Plt.hist(data,

Bins=10, # Number of bins

Color=’skyblue’, # Bar color

Alpha=0.7, # Transparency level

Edgecolor=’black’, # Border color

Label=”Distribution of Values”) # Legend label


# Labels and Title

Plt.xlabel(“Value Range”, fontsize=12, color=’darkblue’)

Plt.ylabel(“Frequency”, fontsize=12, color=’darkgreen’)

Plt.title(“Histogram of Sample Data”, fontsize=14, fontweight=’bold’)

# Add legend

Plt.legend(fontsize=12)

# Show the plot

Plt.grid(True) # Enable grid for better readability

Plt.show()

Output :-------

QQuestion11: Draw a pie chart exploring its properties like labels, colors,
radius, explode, shadow,autopct.

# Import necessary libraries

Import matplotlib.pyplot as plt

# Sample Data (Market Share of Different Companies)

Labels = [“Company A”, “Company B”, “Company C”, “Company D”]

Sizes = [30, 25, 20, 25] # Percentage share

Colors = [“blue”, “orange”, “green”, “red”] # Custom colors

Explode = (0.1, 0, 0, 0) # Explode first slice

# Create the pie chart


Plt.figure(figsize=(7, 7)) # Set figure size

Plt.pie(sizes,

Labels=labels, # Assign labels

Colors=colors, # Assign colors

Autopct=”%1.1f%%”, # Show percentages (1 decimal place)

Explode=explode, # Highlight first slice

Shadow=True, # Add shadow effect

Radius=1.2, # Adjust pie size

Startangle=140) # Rotate start angle

# Add Title

Plt.title(“Market Share Distribution”, fontsize=14, fontweight=’bold’)

# Show the plot

Plt.show()

Output:

Question 12 : Draw line chart, Scatter plot, histogram on the iris data set
with styling.

# Import necessary libraries

Import pandas as pd

Import matplotlib.pyplot as plt

Import seaborn as sns

From sklearn.datasets import load_iris


# Load the Iris dataset

Iris = load_iris()

Df = pd.DataFrame(iris.data, columns=iris.feature_names)

# Create a figure with 3 subplots

Fig, axes = plt.subplots(1, 3, figsize=(18, 5)) # 1 row, 3 columns

### 1️⃣LINE CHART – Sepal Length Trend ###

Axes[0].plot(df.index, df[“sepal length (cm)”],

Color=’blue’, linestyle=’—‘, linewidth=2, marker=’o’,

Markerfacecolor=’red’, label=”Sepal Length”)

Axes[0].set_title(“Sepal Length Trend”, fontsize=14, fontweight=’bold’)

Axes[0].set_xlabel(“Sample Index”, fontsize=12, color=’darkred’)

Axes[0].set_ylabel(“Sepal Length (cm)”, fontsize=12, color=’darkgreen’)

Axes[0].legend(fontsize=12)

Axes[0].grid(True)

### 2️⃣SCATTER PLOT – Sepal Length vs Sepal Width ###

Axes[1].scatter(df[“sepal length (cm)”], df[“sepal width (cm)”],

Color=’purple’, alpha=0.6, s=80, edgecolors=’black’)

Axes[1].set_title(“Sepal Length vs Width”, fontsize=14, fontweight=’bold’)

Axes[1].set_xlabel(“Sepal Length (cm)”, fontsize=12, color=’darkblue’)

Axes[1].set_ylabel(“Sepal Width (cm)”, fontsize=12, color=’darkgreen’)

Axes[1].grid(True)
### 3️⃣HISTOGRAM – Petal Length Distribution ###

Axes[2].hist(df[“petal length (cm)”], bins=10, color=’orange’,

Alpha=0.7, edgecolor=’black’, label=”Petal Length”)

Axes[2].set_title(“Petal Length Distribution”, fontsize=14, fontweight=’bold’)

Axes[2].set_xlabel(“Petal Length (cm)”, fontsize=12, color=’darkred’)

Axes[2].set_ylabel(“Frequency”, fontsize=12, color=’darkgreen’)

Axes[2].legend(fontsize=12)

Axes[2].grid(True)

# Adjust layout and show the plots

Plt.tight_layout()

Plt.show()

Output :

Question 13 : Draw boxplot with the properties like facecolor, colors,


capprops like color andlinewidth. Show how the box plot can be used to
detect outliers. Add two outlier rowsmanually

# Import necessary libraries

Import pandas as pd

Import matplotlib.pyplot as plt

Import seaborn as sns

From sklearn.datasets import load_iris

# Load the Iris dataset

Iris = load_iris()
Df = pd.DataFrame(iris.data, columns=iris.feature_names)

# Manually adding two outlier rows

Outlier_1 = pd.DataFrame([[15, 4, 8, 2]], columns=df.columns) # Extreme


high values

Outlier_2 = pd.DataFrame([[16, 5, 9, 3]], columns=df.columns) # Another


extreme high

Df = pd.concat([df, outlier_1, outlier_2], ignore_index=True) # Append


outliers

# Create a figure for the boxplot

Plt.figure(figsize=(10, 6))

# Boxplot with styling

Box = sns.boxplot(data=df,

Palette=”pastel”, # Soft colors for boxes

Linewidth=2, # Border thickness

Flierprops={“marker”: “o”, “markerfacecolor”: “red”,


“markersize”: 8}, # Outliers

Boxprops={“facecolor”: “lightblue”, “edgecolor”: “black”}, #


Box facecolor

Capprops={“color”: “black”, “linewidth”: 2}, # Cap properties

Whiskerprops={“color”: “black”, “linewidth”: 2}, # Whisker


properties

Medianprops={“color”: “red”, “linewidth”: 2}) # Median line

# Titles and labels


Plt.title(“Boxplot of Iris Dataset (Detecting Outliers)”, fontsize=14,
fontweight=’bold’)

Plt.xlabel(“Features”, fontsize=12, color=’darkblue’)

Plt.ylabel(“Values”, fontsize=12, color=’darkgreen’)

# Show the plot

Plt.grid(True)

Plt.show()

You might also like