Data Visualization_Lab_Manual_2024
Data Visualization_Lab_Manual_2024
Tech
Data Visualization [1 0 3 2]
LABORATORY MANUAL
JANUARY 2025
Page 1 of 13
Table of Contents
1. Course Outcomes (Cos) ………………………………………………………………………………………………………3
2. Assessment Plan…………………………………………………………………………………………………………………3
3. General Instructions……………………………………………………………………………………………………………4
4. Software and Tools………………………………………………………………………………………………………………5
Page 2 of 13
1. COURSE OUTCOMES (COS)
No. of
At the end of this course, the student should be able to: Contact
Marks
Hours
2. ASSESSMENT PLAN
Components Continuous Evaluation End semester Examination
Duration 3 Hours per week 120 Minutes
Page 3 of 13
3. GENERAL INSTRUCTIONS
Step 1: Listen to the faculty demonstrations and instructions on the exercise and datasets.
Step 2: Download the weekly exercise and data sets (if any) from LMS.
Step 3: Create a Python notebook with the following documentation and coding conventions for the
exercise.
1. Write documentation first, the experiment's description, the student's name, the register
number, and the date.
2. Put Imports at the top of the file.
3. Use four spaces per indentation level.
4. Limit all lines to a maximum of 79 characters.
5. Use blank lines to separate logical sections.
6. Add sufficient comment lines in complete sentences. Block comments are indented to the same
level as that code. Inline comments wherever necessary.
7. Use meaningful names for variables, functions and constants.
8. Follow the snake case for naming variables and functions, capitalize the first alphabet of
each word for classes, constants should be in upper case and follow the snake case
convention.
Step 4: On completion submit the python notebook in LMS. Ensure that you submit well within the
deadline.
Step 5: Show the Python notebooks and results to the instructors during program execution check.
• If a student misses a lab class, he/she must ensure the experiment is completed.
• Questions for lab tests and examinations are not necessarily limited to the questions in the manual
but may involve some variations and/or combinations of the questions.
• Since this is an introductory course on Python programming and data visualization, the students
cannot use AI tools to generate code.
• Please do not copy code from others.
Page 4 of 13
4. Software and Tools
Programming language: Python
IDE: Anaconda distribution with Jupyter Notebook
Pre-installed in Anaconda:
NumPy: For numerical computations.
Pandas: For data manipulation and analysis.
Matplotlib: For plotting and visualization.
Additional Package: Seaborn: For advanced visualization
Anaconda is a open-source distribution for Python and R, specifically for data science. It can be used
to create an isolated environment for making data intensive applications. It simplifies the process of
managing packages and dependencies. Anaconda has over 1,500 packages, including essential tools
like Jupyter Notebook, Pandas, NumPy, and Matplotlib, making it a one-stop solution for developers
and researchers. Its package manager, Conda, allows seamless installation, updating, and management
of software, ensuring compatibility and reducing conflicts.
Page 5 of 13
5.3. Getting started with Anaconda, Jupyter notebook on Windows
Step 1: Type Jupyter notebook in the search bar as shown in Figure 1 and click on “Jupyter Notebook
(Anaconda3).
Step 2: Change drive if required. Creating a new folder. Click on New-> Folder as shown in Figure 2.
Page 6 of 13
Step 3: Renaming the new folder. The new folder will be created with the default name “Untitled
folder”. Click on rename and rename the folder appropriately as shown in Figure 3.
Step 4: Creating a new notebook inside the newly created folder. Click on New-> Python
3(ipykernel) as shown in Figure 4.
Page 7 of 13
Step 5: Renaming the python notebook. Initially, the python program will be created with the default
name “Untitled” as shown in Figure 5. Click on it to rename the notebook appropriately.
Step 6: Start running the python program using notebook cells. After renaming the notebook cells, we
can start coding. A sample print statement is displayed in Figure 6. After typing the code, press the run
button to run the appropriate cell. The output is displayed below the cell.
Page 8 of 13
6. Python Basics
Python is a high-level, interpreted programming language with simple and readable syntax. Python is
dynamically typed, and variable types are determined at runtime. Its numerous third-party packages
make it suitable for various applications, such as web development, data analysis, artificial intelligence,
and scientific computing. For data manipulation, libraries like Pandas and NumPy provide efficient
tools for handling structured and unstructured data, performing calculations, and managing large
datasets. Python also offers various data visualization libraries, such as Matplotlib and Seaborn, to
create interactive charts, graphs, and dashboards. Machine learning and statistical modeling can
seamlessly integrate using libraries like Scikit-learn and Statsmodels. Jupyter Notebook enhances the
workflow by allowing live coding, visualization, and storytelling in a single environment. Python's
scalability, active community support, and compatibility with big data frameworks (e.g., PySpark) make
it a go-to language for data-driven projects across industries. Important features of Python are as
follows:
• Python uses whitespace (tabs or spaces) to structure code.
• To add comments to code use the hash mark (pound sign) #.
• Variables in Python have no inherent type associated with them; a variable can refer to a
different type of object simply by doing an assignment.
• Every number, string, data structure, function, class, module etc. exists in the Python interpreter
as a Python object. Each object has an associated type (e.g., integer, string, or function) and
internal data.
• Objects in Python typically have both attributes (other Python objects stored “inside” the
object) and methods (functions associated with an object that can have access to the object’s
internal data).
• In Python, a module is a file with the .py extension containing Python code.
• Python’s data structures like tuples, lists, dictionaries, sets, and sequences, are important
aspects of Python programming.
• Vectorization in Python refers to performing operations on entire arrays or data sequences
without iterating through individual elements using loops. Vectorised operations are supported
in packages like NumPy and Pandas. Python is optimized for vectorized operations and it is
much faster than looping constructs.
• Python provides an array of specialized libraries for creating a variety of data visualizations.
The most important packages are Matplotlib and Seaborn.
Page 9 of 13
7. List of Experiments
Week No TOPICS Course
Outcome
Addressed
Page 10 of 13
2. Create two (3 * 3) matrices using NumPy and print it. Perform
and print the results of the following linear algebra operations
a. Matrix addition
b. Matrix subtraction
c. Matrix multiplication (element-wise and dot product)
d. Transpose of a matrix
e. Determinant and inverse (if applicable)
Week 4 Demo 4: Pandas, Data loading, Storage and File formats CO2
Exercises
1. Create a Series from a list of integers representing daily
temperatures (in Celsius) over a week. Assign index labels as day
of the week.
a. Find and print the average (mean) temperature for the week.
b. Identify and print the maximum and minimum temperatures
and their respective days.
c. Display the temperatures greater than a specific value.
d. Convert all temperatures to Fahrenheit.
e. Print the days had temperatures above the average.
2. Create a data frame with details of 10 students and columns as
Roll Number, Name, Gender, Marks1, Marks2, Marks3.
a. Create a new column with total marks
b. Find the lowest marks in Marks1
c. Find the Highest marks in Marks2
d. Find the average marks in Marks3
e. Find student name with highest average
f. Find how many students failed in Marks2 (<40)
Week 5 Demo 5: Data Cleaning and Preparation CO3
1. Create a CSV file called “Movies.csv” with details of 10 movies-
Movie Name, Language, Genre, Rating, Review.
a. Read CSV file into a dataframe and find the movie with the
highest rating.
b. Write the details of all “Hindi movies into a file
“HindiMovies.csv”.
2. For the CEREALS dataset, perform data preprocessing and
answer the following questions.
a. Create a table with the 5 number summary of all the numeric
attributes.
b. For each of the numeric attributes (proteins upto vitamins) ,
identify and replace all missing data(indicated with -1) with
the arithmetic mean of the attribute.
c. Create a table with the 5 number summary of all the numeric
attributes after treating missing values. Do you think the
strategy used in dealing with missing values was effective?
d. For each numeric attribute (proteins upto vitamins), identify
and replace all noisy data with the median of attribute.
e. Create a table with the 5 number summary of all the numeric
attributes after treating noisy values. Do you think the
strategy used in dealing with noisy values was effective?
Page 11 of 13
Week 6 Demo 6: Data Visualization: context, effective visuals and CO3, CO4
storytelling
Exercise
1. For the MTCARS dataset, answer the specified questions with
summarization and effective visuals.
2. For the CEREALS dataset, answer the specified questions with
summarization and effective visuals.
Week 7 Demo 7: Plotting and Visualization using Matplotlib & Seaborn CO4
Exercise
1. For the IPL dataset, answer the specified questions with
summarization and effective visuals using Matplotlib & Seaborn
libraries
Week 8 Demo 8: Data Aggregation and Group Operations CO3
Exercise
1. For the NORTHWIND dataset, answer the specified questions
with summarization and effective visuals.
Week 9 Demo 9: String Manipulation and Data Wrangling CO3
Exercise
1. For the SENTIMENT dataset, answer the specified questions with
string operations and effective visuals.
Week 10 Discussion of case study and data set. CO5
1. For the case study given, answer the questions with a report
with story, visuals and data summaries.
Page 12 of 13
8. References:
SL.No References
1 Text Book: Wes McKinney , Python for Data Analysis: Data Wrangling with pandas,
NumPy & Jupyter. 3rd edition. O’Reilly Media, 2022.
2 Cole Nussbaumer Knaflic, Storytelling With Data: A Data Visualization Guide for
Business Professionals, John Wiley and Sons, 2015.
3 Jake VanderPlas, Python Data Science Handbook. O'Reilly Media, 2016.
4 Alberto Boschetti and Luca Massaron, Python Data Science Essentials, 3rd edition, Packt
Publishing Ltd. 2018.
5 Manaranjan Pradhan, U Dinesh Kumar, “Machine Learning using Python”, Wiley India,
2019.
6 Python documentation: https://docs.python.org/3/
Page 13 of 13