1 2 Merged
1 2 Merged
To generate a sales report by cleaning missing data, computing total sales, and identifying
top-performing products.
Algorithm:
- Fill missing values in the 'Price' column with the average price of the respective product.
Procedure:
- Open Google Colab and upload the dataset or use sample data.
Code:
import pandas as pd
import matplotlib.pyplot as plt
data = {
'Product': ['Pen', 'Pencil', 'Notebook', 'Pen', 'Pencil', 'Notebook'],
'Quantity': [10, 15, 5, 12, 18, 7],
'Price': [5, None, 20, 5, 3, None]
}
df = pd.DataFrame(data)
df['Price'] = df.groupby('Product')['Price'].transform(lambda x: x.fillna(x.mean()))
df['Total_Sales'] = df['Quantity'] * df['Price']
sales_by_product = df.groupby('Product')['Total_Sales'].sum()
top_product = sales_by_product.idxmax()
sales_by_product.plot(kind='bar', title='Total Sales by Product')
plt.ylabel('Total Sales')
plt.show()
print("Product with highest total sales:", top_product)
Sample Output:
Result:
The program successfully computes and visualizes product-wise sales and identifies the top-selling
item.
Experiment 2: Daily Temperature Tracker
Aim:
To process temperature data, handle missing values, and visualize average temperature trends over
time.
Algorithm:
- Fill missing values in Min_Temp and Max_Temp with their column means.
Procedure:
- Load the dataset with dates, min temp, and max temp.
Code:
import pandas as pd
import matplotlib.pyplot as plt
data = {
'Date': pd.date_range(start='2023-01-01', periods=5),
'Min_Temp': [21, 23, None, 22, 25],
'Max_Temp': [30, None, 35, 31, 34]
}
df = pd.DataFrame(data)
df['Min_Temp'].fillna(df['Min_Temp'].mean(), inplace=True)
df['Max_Temp'].fillna(df['Max_Temp'].mean(), inplace=True)
df['Average_Temp'] = (df['Min_Temp'] + df['Max_Temp']) / 2
hottest_day = df.loc[df['Average_Temp'].idxmax(), 'Date']
plt.plot(df['Date'], df['Average_Temp'], marker='o')
plt.title("Average Temperature Over Time")
plt.xlabel("Date")
plt.ylabel("Average Temp")
plt.grid(True)
plt.show()
print("Date with highest average temperature:", hottest_day.date())
Sample Output:
Result:
The trend line provides a visual representation of temperature changes, and the hottest day is
identified.
Google Cloud Data Analytics Lab Experiments
To analyze COVID-19 daily case data by cleaning missing values and visualizing trends.
Algorithm:
Procedure:
Code:
import pandas as pd
import matplotlib.pyplot as plt
data = {
'Date': pd.date_range(start='2023-01-01', periods=5),
'Cases': [100, None, 250, 400, None]
}
df = pd.DataFrame(data)
df['Cases'].fillna(0, inplace=True)
total_cases = df['Cases'].sum()
average_cases = df['Cases'].mean()
peak_day = df.loc[df['Cases'].idxmax(), 'Date']
plt.plot(df['Date'], df['Cases'], marker='o')
plt.title("COVID-19 Daily Cases")
plt.xlabel("Date")
plt.ylabel("Cases")
plt.grid(True)
plt.show()
print("Total cases:", total_cases)
print("Average daily cases:", average_cases)
print("Date with highest number of cases:", peak_day.date())
Sample Output:
Result:
Correctly shows trends and identifies the peak infection date with a clear graph.
Experiment 4: Movie Ratings Dataset
Aim:
To analyze movie ratings and identify top movies based on viewer feedback.
Algorithm:
Procedure:
Code:
import pandas as pd
import matplotlib.pyplot as plt
data = {
'Movie_Name': ['Movie A', 'Movie B', 'Movie C', 'Movie D', 'Movie E', 'Movie F'],
'Viewer_Rating': [4.5, 4.8, None, 4.2, 4.9, 4.3]
}
df = pd.DataFrame(data)
df.dropna(inplace=True)
average_rating = df['Viewer_Rating'].mean()
top_movies = df.nlargest(3, 'Viewer_Rating')
top_5 = df.nlargest(5, 'Viewer_Rating')
plt.bar(top_5['Movie_Name'], top_5['Viewer_Rating'], color='skyblue')
plt.title("Top 5 Movie Ratings")
plt.ylabel("Rating")
plt.xticks(rotation=45)
plt.show()
print("Average Rating:", average_rating)
print("Top 3 Movies:")
print(top_movies[['Movie_Name', 'Viewer_Rating']])
Sample Output:
Top 3 Movies:
Movie_Name Viewer_Rating
4 Movie E 4.9
1 Movie B 4.8
0 Movie A 4.5
Result:
Identifies and displays the top 3 movies with supporting bar chart visualization.
Experiment 5: Online Course Completion Data
Aim:
Algorithm:
Procedure:
Code:
import pandas as pd
import matplotlib.pyplot as plt
data = {
'Student_ID': [101, 102, 103, 104, 105],
'Completion_Status': ['Yes', None, 'No', 'Yes', None]
}
df = pd.DataFrame(data)
df['Completion_Status'].fillna("No", inplace=True)
completion_count = df['Completion_Status'].value_counts()
plt.pie(completion_count, labels=completion_count.index, autopct='%1.1f%%',
startangle=140)
plt.title("Course Completion vs Non-Completion")
plt.axis('equal')
plt.show()
print("Course Completion Counts:")
print(completion_count)
Sample Output:
Course Completion Counts:
No 3
Yes 2
Result: