0% found this document useful (0 votes)
11 views3 pages

Hello

The document outlines an experiment conducted by Kumar Prathmesh, which involves analyzing a dataset of real estate properties using Python in Google Colab. It includes loading data from an Excel file, visualizing the relationship between area and price with a scatter plot, checking for missing values, and calculating statistical measures such as mean, median, and standard deviation. Additionally, it provides a summary of the dataset's statistics and structure.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views3 pages

Hello

The document outlines an experiment conducted by Kumar Prathmesh, which involves analyzing a dataset of real estate properties using Python in Google Colab. It includes loading data from an Excel file, visualizing the relationship between area and price with a scatter plot, checking for missing values, and calculating statistical measures such as mean, median, and standard deviation. Additionally, it provides a summary of the dataset's statistics and structure.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

NAME- kumar prathmesh roll no- 2412res69 experiment-1

from google.colab import drive

drive.mount('/content/gdrive')

Mounted at /content/gdrive

import pandas as pd

Data = pd.read_excel(r'/content/gdrive/My Drive/Practice.xlsx')

display(pd.DataFrame(Data))

Number of Rooms Area (sq ft) Price ($)

0 1 500 50000

1 2 800 80000

2 3 1200 120000

3 4 1500 150000

4 2 750 75000

5 3 1100 110000

6 1 450 45000

7 4 1600 160000

8 2 700 70000

9 3 1050 105000

10 4 1550 155000

11 1 480 48000

12 2 760 76000

13 3 1150 115000

14 4 1480 148000

15 2 730 73000

16 3 1080 108000

17 1 490 49000

18 4 1520 152000

19 2 740 74000
 

question -2 write the python code to draw any one plot for the data set

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_excel(r'/content/gdrive/My Drive/Practice.xlsx')

plt.figure(figsize=(8, 6))
plt.scatter(data["Area (sq ft)"], data["Price ($)"], color='blue', alpha=0.7)
plt.title("Area vs Price graph made by prathmesh")
plt.xlabel("Area (sq ft)")
plt.ylabel("Price ($)")
plt.grid(True, linestyle='--', alpha=0.5)
plt.show()
 

question 3- write the python code to show the number of missing values

import pandas as pd

my_filelocation = '/content/gdrive/My Drive/Practice.xlsx'


df = pd.read_excel(my_filelocation)

missing_values = df.isnull().sum()

print(missing_values)

Number of Rooms 0
Area (sq ft) 0
Price ($) 0
dtype: int64

question-4 show the mean , median std deviation for the each col

import pandas as pd

my_filelocation = '/content/gdrive/My Drive/Practice.xlsx'


df = pd.read_excel(my_filelocation)

mean_values = df.mean()
median_values = df.median()
std_dev_values = df.std()

print("Mean values:")
print(mean_values)
print("\nMedian values:")
print(median_values)
print("\nStandard Deviation values:")
print(std_dev_values)

Mean values:
Number of Rooms 2.55
Area (sq ft) 981.50
Price ($) 98150.00
dtype: float64

Median values:
Number of Rooms 2.5
Area (sq ft) 925.0
Price ($) 92500.0
dtype: float64

Standard Deviation values:


Number of Rooms 1.099043
Area (sq ft) 394.798750
Price ($) 39479.874953
dtype: float64

question-5 show the full information about the dataset .

import pandas as pd

my_filelocation = '/content/gdrive/My Drive/Practice.xlsx'


df = pd.read_excel(my_filelocation)

print("\nSummary Statistics:")
print(df.describe())

missing_values = df.isnull().sum()
print("\nMissing Values Count:")
print(missing_values)

print("Dataset Information:")
print(df.info())

Summary Statistics:
Number of Rooms Area (sq ft) Price ($)
count 20.000000 20.00000 20.000000
mean 2.550000 981.50000 98150.000000
std 1.099043 394.79875 39479.874953
min 1.000000 450.00000 45000.000000
25% 2.000000 722.50000 72250.000000
50% 2.500000 925.00000 92500.000000
75% 3.250000 1270.00000 127000.000000
max 4.000000 1600.00000 160000.000000

Missing Values Count:


Number of Rooms 0
Area (sq ft) 0
Price ($) 0
dtype: int64
Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Number of Rooms 20 non-null int64
1 Area (sq ft) 20 non-null int64
2 Price ($) 20 non-null int64
dtypes: int64(3)
memory usage: 612.0 bytes
None

You might also like