0% found this document useful (0 votes)
13 views

Datascience Lab 1-2

DAT ASCIENCE

Uploaded by

Geetha A L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Datascience Lab 1-2

DAT ASCIENCE

Uploaded by

Geetha A L
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

MODULE-1

A study was conducted to understand the effect of number of hours the students spent studying on
their performance in the final exams. Write a code to plot line chart with number of hours spent
studying on x-axis and score in final exam on y-axis. Use a red ‘*’ as the point character, label the axes
and give the plot a title.

import matplotlib.pyplot as plt

hours = [10,9,2,15,10,16,11,16]

score = [95,80,10,50,45,98,38,93]

# Plotting the line chart

plt.plot(hours, score, marker='*', color='red', linestyle='-')

# Adding labels and title

plt.xlabel('Number of Hours Studied')

plt.ylabel('Score in Final Exam')

plt.title('Effect of Hours Studied on Exam Score')

# Displaying the plot

plt.grid(True)

plt.show()
For the given dataset mtcars.csv (www.kaggle.com/ruiromanini/mtcars), plot a histogram to check
the frequency distribution of the variable ‘mpg’ (Miles per gallon)

import pandas as pd

import matplotlib.pyplot as plt

# Load the dataset

mtcars = pd.read_csv('mtcars.csv') # Replace 'path_to_your_mtcars.csv' with the actual path to your


mtcars.csv file

# Plotting the histogram

plt.hist(mtcars['mpg'], bins=10, color='skyblue', edgecolor='black')

# Adding labels and title

plt.xlabel('Miles per gallon (mpg)')

plt.ylabel('Frequency')

plt.title('Histogram of Miles per gallon (mpg)')

# Displaying the plot

plt.show()

MODULE-2

Consider the books dataset BL-Flickr-Images-Book.csv from Kaggle


(https://www.kaggle.com/adeyoyintemidayo/publication-of-books) which contains information
about books.

Write a program to demonstrate the following.

 Import the data into a DataFrame


 Find and drop the columns which are irrelevant for the book information.
 Change the Index of the DataFrame
 Tidy up fields in the data such as date of publication with the help of simple regular
expression.
 Combine str methods with NumPy to clean columns

import pandas as pd

import numpy as np
# Import the data into a DataFrame

df = pd.read_csv('BL-Flickr-Images-Book.csv')

# Display the first few rows of the DataFrame

print("Original DataFrame:")

print(df.head())

# Find and drop the columns which are irrelevant for the book information

irrelevant_columns = ['Edition Statement', 'Corporate Author', 'Corporate Contributors', 'Former owner',


'Engraver', 'Contributors', 'Issuance type', 'Shelfmarks']

df.drop(columns=irrelevant_columns, inplace=True)

# Change the Index of the DataFrame

df.set_index('Identifier', inplace=True)

# Tidy up fields in the data such as date of publication with the help of simple regular expression

df['Date of Publication'] = df['Date of Publication'].str.extract(r'^(\d{4})', expand=False)

# Combine str methods with NumPy to clean columns

df['Place of Publication'] = np.where(df['Place of Publication'].str.contains('London'), 'London', df['Place


of Publication'].str.replace('-', ' '))

# Display the cleaned DataFrame

print("\nCleaned DataFrame:")

print(df.head())

You might also like