0% found this document useful (0 votes)
10 views5 pages

Expt-1 Dav

The document outlines a lab course on Data Analytics and Visualization, focusing on libraries in Python and R. It details key libraries such as NumPy, Pandas, and TensorFlow for Python, and dplyr, ggplot2, and Shiny for R, highlighting their features and applications. The document also includes theory questions aimed at understanding the differences between Python and R in data analytics.

Uploaded by

prabhugaurav54
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views5 pages

Expt-1 Dav

The document outlines a lab course on Data Analytics and Visualization, focusing on libraries in Python and R. It details key libraries such as NumPy, Pandas, and TensorFlow for Python, and dplyr, ggplot2, and Shiny for R, highlighting their features and applications. The document also includes theory questions aimed at understanding the differences between Python and R in data analytics.

Uploaded by

prabhugaurav54
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

Lab Code: Course Code: 24AIMLPCC601 Class: T.E./AIML/DS


Lab Name: Data Analytics and Visualisation

AIM: Getting introduced to data analytics libraries in Python and R.


Theory Questions:---
1.What are the key differences between Python and R for data analytics?
2.What are the most popular libraries for data analytics in both languages?
3. What is the difference between NumPy arrays and pandas DataFrames?
4. How do I read and write data in R?

Data analytics libraries in Python:


1. NumPy

NumPy is a free Python software library for numerical computing on data that can be in the form of
large arrays and multi-dimensional matrices. These multidimensional matrices are the main objects
in NumPy where their dimensions are called axes and the number of axes is called
a rank.
Key Features:

• N-dimensional array objects

• Broadcasting functions

• Linear algebra, Fourier transforms, and random number capabilities

2. Pandas

Pandas is one of the best libraries for Python, which is a free software library for data analysis and data handling. In
short, Pandas is perfect for quick and easy data manipulation, data aggregation, reading, and writing the data and data
visualization.

Key Features:
• DataFrame manipulation

• Grouping, joining, and merging datasets

• Time series data handling

• Data cleaning and wrangling

3. Seaborn

Seaborn is a powerful Python data visualization library built on top of Matplotlib, designed to make it
easier to create attractive and informative statistical graphics. Seaborn is widely used by data scientists
due to its ease of use, intuitive syntax, and integration with Pandas, which allows seamless plotting
directly from DataFrames.
Key Features:

• High-level interface for drawing statistical plots

• Supports themes for better aesthetics

• Integrates with Pandas DataFrames

4. TensorFlow

TensorFlow is a free end-to-end open-source platform that has a wide variety of tools, libraries, and
resources for Artificial Intelligence. You can easily build and train Machine Learning models with high-
level APIs such as Keras using TensorFlow. It also provides multiple levels of abstraction so you can
choose the option you need for your model.
Key Features:

• Support for distributed training

• High-level APIs (Keras) for quick prototyping

• Deployable on multiple platforms, including mobile and cloud

5. PyTorch

PyTorch is an open-source deep learning framework that has gained immense popularity among
researchers and developers due to its flexibility and speed. PyTorch offers an intuitive interface and
dynamic computation capabilities, making it a go-to choice for many machine learning practitioners.
Key Features:

• Dynamic computational graph

• Strong community support and active development

• Great for research and production-level applications

6 . Scikit-learn

Scikit-learn is among those libraries for Python that is a free, software library for Machine Learning
coding primarily in the Python programming language. While Scikit-learn is written mainly in Python,
it has also used Cython to write some core algorithms in order to improve performance.
Key Features:
• Implements regression, classification, clustering, and more

• Cross-validation, hyperparameter tuning, and pipeline building

• Easy integration with NumPy and Pandas.

Data analytics libraries in R.


1. dplyr

One of the most widely used libraries for data manipulation, dplyr streamlines working with data frames
and allows users to perform various data wrangling operations. It provides a set of core functions that
make data wrangling faster and more intuitive. These functions can also be combined with the
group_by() function to perform operations on grouped data.
Key Features of dplyr:

• mutate(): Adds new columns based on existing data, allowing for easy feature engineering.

• select(): Picks specific columns by name, making it easy to focus on the most relevant data.

• filter(): Filters rows based on logical conditions, enabling you to subset your data quickly.

• summarise(): Reduces a dataset to summary statistics, great for aggregation and descriptive analysis.

• arrange(): Orders rows based on column values, simplifying sorting.

Best for : Data wrangling, filtering, and summarization

2. ggplot2

ggplot2 is an R data visualization library that is based on The Grammar of Graphics. ggplot2 can
create data visualizations such as bar charts, pie charts, histograms, scatterplots, error charts, etc.
using high-level API. It also allows you to add different types of data visualization components or layers
in a single visualization. Once ggplot2 has been told which variables to map to which aesthetics in the
plot, it does the rest of the work so that the user can focus on interpreting the visualizations and take
less time to create them.
Key Features:

• Easily combine different elements (geoms, stats, scales) in a single plot.

• ggplot2 provides a flexible framework for styling and customizing plots.

• Automatically maps data to visual properties like size, color, and shape.

• Easily create multiple plots based on a factor variable, making it simple to visualize subgroup differences.

Best for : Creating complex, customizable plots

3. Esquisse

Esquisse is a data visualization tool in R that allows you to create detailed data visualizations using the
ggplot2 package. You can create all sorts of scatter plots, histograms, line charts, bar charts, pie charts,
error bars, box plots, multiple axes, sparklines, dendrograms, 3-D charts, etc. using Esquisse and also
export these graphs or access the code for creating these graphs. Esquisse is such a famous and
easily used data visualization tool because of its drag-and-drop ability which makes it popular even
among beginners.
Key Features:

• Drag-and-drop functionality for easy chart creation

• Supports multiple chart types (scatter, bar, line, etc.)

• Export visualizations and view underlying code

Best for : Easy and quick visualizations for beginners

4. Shiny

Shiny is an R package that can be used to build interactive web applications in R. Basically, Shiny gives
a combination of R and the modern web. And you can easily create web applications using Shiny
without needing any special web development skills. Using Shiny, you can embed web applications in
R documents, create standalone applications on a webpage, or even create web visualization
dashboards. You can also deploy the Shiny app to the cloud or on your servers with an open-source
or commercial license.
Key Features:

• Build interactive web apps easily

• Embed apps in R documents or host on the web

• Extend functionality with HTML, CSS, and JavaScript

Best for : Building interactive dashboards and web apps

5. mlr3

mlr3 is an R tool created specifically for Machine Learning. You can implement various Supervised and
Unsupervised Machine learning models on Scikit-learn like Classification, Regression, Support Vector
Machine, Random Forests, Nearest Neighbors, Naive Bayes, Decision Trees, Clustering, etc. with
mlr3. It is also connected to the OpenML R package which is dedicated to supporting machine learning
online.
Key Features:

• Supports a wide range of machine learning models

• Integration with OpenML for online resources

• Improved functionality over its predecessor, mlr

Best for : Implementing machine learning algorithms with hyperparameter tuning

6. Lubridate

Lubridate is an R library that is particularly focused on making date-time easy to handle. Working with
date-time data can be frustrating with R because R commands are unintuitive for this type of data and
can change based on the type of date-time object. There are many new time span classes in Lubridate
as well that help in handling mathematical operations.
Key Features:

• Simplifies date-time manipulation with intuitive functions

• Handles components like seconds, minutes, and years easily

• Offers time span classes for mathematical operations

Best for : Parsing, manipulating, and converting date-time formats

You might also like