Datascience Lab 1-2
Datascience Lab 1-2
A study was conducted to understand the effect of number of hours the students spent studying on
their performance in the final exams. Write a code to plot line chart with number of hours spent
studying on x-axis and score in final exam on y-axis. Use a red ‘*’ as the point character, label the axes
and give the plot a title.
hours = [10,9,2,15,10,16,11,16]
score = [95,80,10,50,45,98,38,93]
plt.grid(True)
plt.show()
For the given dataset mtcars.csv (www.kaggle.com/ruiromanini/mtcars), plot a histogram to check
the frequency distribution of the variable ‘mpg’ (Miles per gallon)
import pandas as pd
plt.ylabel('Frequency')
plt.show()
MODULE-2
import pandas as pd
import numpy as np
# Import the data into a DataFrame
df = pd.read_csv('BL-Flickr-Images-Book.csv')
print("Original DataFrame:")
print(df.head())
# Find and drop the columns which are irrelevant for the book information
df.drop(columns=irrelevant_columns, inplace=True)
df.set_index('Identifier', inplace=True)
# Tidy up fields in the data such as date of publication with the help of simple regular expression
print("\nCleaned DataFrame:")
print(df.head())