0% found this document useful (0 votes)

27 views21 pages

Note 5-7

The document discusses the evolution and significance of R and Python in data analytics, highlighting their historical development, features, and applications. R, developed in the mid-1990s, is known for its statistical capabilities and extensive visualization tools, while Python, created in 1991, has gained popularity for its ease of use and rich ecosystem of libraries. Both languages are widely used across various industries for tasks such as machine learning, data manipulation, and statistical analysis.

Uploaded by

ug092006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views21 pages

Note 5-7

Uploaded by

ug092006

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

WEEK FIVE-SEVEN

3.1 FUNCTIONS OF STATISTICAL SOFTWARE

THE EVOLUTION AND SIGNIFICANCE OF R IN DATA ANALYTICS
R is a powerful open-source programming language and software environment primarily used for
statistical computing, data analysis, and graphical representation. Ross Ihaka and Robert
Gentleman developed it at the University of Auckland in the mid-1990s, and it is an
implementation of the S programming language developed at Bell Laboratories.
3.1.1 Historical Evolution of R
 1970s: Origins in the S Language: R is rooted in the S programming language, created
by John Chambers and others at Bell Labs. S was designed to make data analysis more
interactive and efficient.
 1993–1995: Birth of R: Ross Ihaka and Robert Gentleman began developing R in 1993,
and the project was released to the public in 1995 as free software under the GNU
General Public License.
 2000s: Community Expansion and CRAN: The development of the Comprehensive R
Archive Network (CRAN) significantly enhanced R’s accessibility. CRAN enabled users
to share packages, fostering rapid development in specialized domains like time series,
genetics, and finance.
 2010s: Rise in Popularity: With the explosion of big data and machine learning, R
gained traction in academia, healthcare, marketing, and finance. The RStudio IDE,
launched in 2011, made R more accessible to users from non-programming backgrounds.
 2020s and Beyond: R in the Era of Data Science: R continues to evolve with robust
packages for machine learning (e.g., caret, mlr3), deep learning (keras, tensorflow), and
big data (sparklyr). The verse collection of packages including ggplot2, dplyr, and tidyr
has streamlined data science workflows, making R more user-friendly and visually
intuitive.
3.1.2 Features of R in Data Analytics
 Statistical and Mathematical Modeling: R is purpose-built for advanced statistical
procedures, including linear and nonlinear modeling, time-series analysis, classification,
clustering, and more.
 Extensive Visualization Capabilities: Tools like ggplot2, lattice, and plotly allow users
to create highly customizable and publication-quality graphs.
 Community-Driven Package Ecosystem: With over 19,000 packages on CRAN, R
supports a wide variety of analytics applications—from bioinformatics and social science
to finance and climatology.
 Reproducible Research: Tools like knitr, rmarkdown, and Shiny allow users to produce
dynamic, reproducible documents and interactive dashboards.
 Interoperability: R can interface with other programming languages like Python, C++,
and Java, and connect to databases and big data frameworks like Hadoop and Spark.
3.1.3 Significance of R in Modern Data Analytics
 Academia and Research: R is a standard tool in academic research due to its open-
source nature, flexibility, and high-quality statistical libraries. Many published papers
include R code to ensure reproducibility.
 Data Science and Machine Learning: R supports machine learning workflows through
packages like caret, xgboost, and random Forest, making it competitive with Python in
certain analytical tasks.
 Open Source and Cost-Efficiency
Organizations adopt R to reduce licensing costs without sacrificing analytical power.
3.1.4 Industry Applications
i. Healthcare: Predictive modeling for patient outcomes and clinical trials.
ii. Finance: Risk modeling, time-series forecasting, and portfolio optimization.
iii. Marketing: Customer segmentation, churn prediction, and campaign analytics.
iv. Environmental Science: Climate modeling and ecological data analysis.
3.1.5 Challenges and Limitations
i. Speed: R can be slower than languages like C++ or Python for certain operations,
especially with very large datasets.
ii. Memory Usage: R processes everything in memory, which can be limiting for big data.
iii. Learning Curve: Although packages like tidyverse ease usability, R's syntax and
concepts (e.g., functional programming) can be challenging for beginners.
3.1.6 Functions of R
1. Data Handling and Storage
i. Supports a wide variety of data types: vectors, matrices, arrays, data frames, lists.
ii. Efficient manipulation of large and complex datasets.
iii. Functions like read.csv(), read.table(), readxl::read_excel() for importing data.
iv. Interfaces with databases using packages like DBI, RSQLite, RODBC.
2. Statistical Analysis
Built-in functions for:
 Descriptive statistics: mean(), sd(), summary().
 Inferential statistics: t.test(), chisq.test(), anova().
 Regression analysis: lm() for linear, glm() for generalized linear models.
 Time series: arima(), ts(), forecast().
3. Data Visualization: Base R plotting functions: plot(), hist(), boxplot().
Advanced plotting with:
 ggplot2: Elegant and layered visualizations.
 lattice: Trellis graphics for multivariate data.
 plotly and highcharter: Interactive web-based visualizations.
4. Programming Features: R is a full-fledged programming language:
 Control structures (if, for, while, repeat).
 User-defined functions (function()).
 Functional programming with apply, lapply, mapply, etc.
 Object-oriented programming (S3, S4, and R6 classes).
5. Machine Learning and Data Mining: Rich ecosystem for ML:
 caret: Unified interface to many algorithms.
 randomForest, xgboost, e1071 (SVM), nnet (neural networks).
 mlr3, tidymodels: Modern, modular machine learning frameworks.
6. Text Mining and Natural Language Processing (NLP): Packages like tm, text2vec,
quanteda for:
 Tokenization
 Term frequency–inverse document frequency (TF-IDF)
 Topic modeling
 Sentiment analysis
7. Time Series Analysis
 Built-in classes like ts, zoo, xts for time series data.
 Packages like forecast, tseries, prophet (from Facebook) for:
 Forecasting
 Seasonal decomposition
 Stationarity testing
8. Spatial and Geographic Data Analysis
 GIS functionalities using sf, sp, raster, tmap, leaflet.
 Plotting maps and analyzing spatial patterns and geostatistics.
9. Reproducible Research and Reporting
 R Markdown (rmarkdown): Combine code, output, and narrative in a single document.
 knitr: Dynamic report generation in HTML, PDF, Word.
 Shiny: Build interactive web apps from R scripts.
 Quarto: Next-gen scientific and technical publishing.
10. Integration and Interoperability
R integrates with:
Python: using reticulate.
C/C++: via .Call() or Rcpp.
Java: using rJava.
Connects to big data platforms:
Apache Spark: sparklyr
Hadoop and Hive: RHadoop, RHive
11. Package Development
Create and share your own R packages using tools like devtools, usethis, and roxygen2.
Below are R code examples demonstrating each of the main functionalities. These are short,
practical snippets designed to show how each feature works.
1. Data Handling and Storage
# Load data
data <- read.csv("data.csv")
# View structure
str(data)
# Create a data frame
df <- data.frame(Name = c("A", "B"), Score = c(90, 85))
2. Statistical Analysis
# Descriptive statistics
mean(df$Score)
sd(df$Score)
# T-test
t.test(Score ~ Name, data = df)
# Linear regression
model <- lm(Score ~ Name, data = df)
summary(model)
3. Data Visualization
# Base R
hist(df$Score)
# ggplot2
library(ggplot2)
ggplot(df, aes(x = Name, y = Score)) +
geom_bar(stat = "identity", fill = "steelblue")
4. Programming Features
# Custom function
square <- function(x) { return(x^2) }
square(5)
# Loop
for (i in 1:3) {
print(i^2)}
5. Machine Learning
# Load caret
library(caret)
data(iris)
# Train-test split
set.seed(123)
trainIndex <- createDataPartition(iris$Species, p = .7, list = FALSE)
train <- iris[trainIndex, ]
test <- iris[-trainIndex, ]
# Train model
model <- train(Species ~ ., data = train, method = "rf")
predictions <- predict(model, test)
confusionMatrix(predictions, test$Species)
6. Text Mining (NLP)
library(tm)
texts <- Corpus(VectorSource(c("This is text mining", "Mining text data")))
texts <- tm_map(texts, content_transformer(tolower))
texts <- tm_map(texts, removePunctuation)
dtm <- DocumentTermMatrix(texts)
inspect(dtm)
7. Time Series Analysis
# Time series object
ts_data <- ts(c(100, 110, 105, 120, 130), start = c(2020, 1), frequency = 12)
# Plot
plot(ts_data)
# Forecasting
library(forecast)
fit <- auto.arima(ts_data)
forecast(fit, h = 3)
8. Spatial and Geographic Data
library(sf)
nc <- st_read(system.file("shape/nc.shp", package = "sf"))
plot(nc["BIR74"])
9. Reproducible Research (R Markdown / Shiny)
R Markdown example (in .Rmd file):
markdown
title: "My Report"
output: html_document---
```{r}
summary(cars)
plot(cars)
scss
**Shiny app example:**
```r
library(shiny)
ui <- fluidPage(
sliderInput("num", "Choose a number", 1, 100, 50),
plotOutput("hist")
)
server <- function(input, output) {
output$hist <- renderPlot({
hist(rnorm(input$num))})}
shinyApp(ui = ui, server = server)
10. Integration with Python
library(reticulate)
py_run_string("x = 5 + 3")
py$x # Output: 8
11. Package Development
# Create a package structure
usethis::create_package("myPackage")
# Add a function
usethis::use_r("myFunction")
3.1.7 EVOLUTION AND SIGNIFICANCE OF PYTHON IN DATA ANALYTICS
Python is a high-level, interpreted, general-purpose programming language created by Guido
van Rossum and first released in 1991. Its emphasis on code readability, simple syntax, and
extensive libraries has made it a favorite among software developers, researchers, and data
analysts worldwide.
3.1.8 Evolution of Python in Data Analytics
Early Years (1990s – early 2000s)
 Initially designed for general-purpose programming and scripting.
 Gained popularity for its clean syntax and ease of learning.
 Limited adoption in scientific and data-related work during this phase.
Scientific Computing Era (2006–2012)
 Development of NumPy (2006) and SciPy led Python into scientific computing.
 These packages provided high-performance array operations, statistical tools, and
numerical methods.
 Python began competing with R and MATLAB in academic and research environments.
Rise of Data Science (2012–2016)
 Emergence of pandas, a powerful data manipulation library, revolutionized data
wrangling.
 Growth of scikit-learn for machine learning made Python a strong choice for predictive
analytics.
 Matplotlib and seaborn brought in quality data visualization capabilities.
 Python's flexibility in scripting, data cleaning, and model building made it the language
of choice for data scientists.
Modern Era (2016–Present)
 Explosion of data science, AI, and machine learning boosted Python’s popularity.
 Integration with big data platforms like Spark (PySpark), Hadoop.
 Deep learning frameworks such as TensorFlow, PyTorch, and Keras expanded Python’s
reach.
 Python now supports full pipelines from data collection to deployment, including
dashboard creation (e.g., Streamlit, Dash).
3.1.9 Significance of Python in Data Analytics
Open Source and Community Support
 Free and open-source.
 Backed by a massive global community that continuously develops and maintains
powerful libraries and tools.
Ease of Learning and Use
 Simple, readable syntax that lowers the barrier for entry.
 Ideal for both beginners and experienced analysts.
Rich Ecosystem of Libraries

Library Purpose

NumPy Numerical operations, arrays

pandas Data manipulation and analysis

matplotlib, seaborn Visualization

scikit-learn Machine learning

statsmodels Statistical modeling and testing

TensorFlow, PyTorch Deep learning and AI

OpenCV, NLTK, spaCy Image & text analytics

Versatility Across Domains

 Used in finance, healthcare, manufacturing, education, e-commerce, and more.
 Powers data pipelines, APIs, and even full-stack web applications.
Integration Capabilities
 Easily integrates with SQL, Excel, R, C++, and cloud platforms.
 Can read/write files in multiple formats: CSV, Excel, JSON, Parquet, etc.
Deployment and Visualization
 Allows quick development of interactive dashboards using tools like:
o Streamlit

o Dash

o Voila

 Python models can be deployed as REST APIs with Flask or FastAPI.

3.1.10 Applications of Python in Data Analytics
 Healthcare: Predicting patient readmission, disease classification.
 Finance: Fraud detection, stock price forecasting, risk modeling.
 Retail: Customer segmentation, recommendation engines.
 Government: Policy impact analysis, public health monitoring.
 Agriculture: Yield prediction, climate data analysis.
3.1.11 Functionalities of Python
1. Data Handling and Manipulation
 Efficient handling of structured data using pandas (DataFrames).
 Supports multiple data formats: CSV, Excel, JSON, SQL, Parquet.
 Filtering, grouping, merging, reshaping, and time series operations.
Example of Python
python
import pandas as pd
df = pd.read_csv('data.csv')
df.groupby('Category').mean()
2. Numerical and Scientific Computation
 NumPy: Fast numerical arrays and matrix operations.
 SciPy: Advanced scientific functions like integration, optimization, signal processing,
and linear algebra.
Example:
import numpy as np
a = np.array([1, 2, 3])
np.mean(a)
3. Data Visualization
 Powerful libraries like:
o matplotlib for custom plots

o seaborn for statistical charts

o plotly and bokeh for interactive visuals

Example:
import seaborn as sns
sns.boxplot(x='Category', y='Value', data=df)
4. Statistical Analysis
 statsmodels for linear models, hypothesis testing, ANOVA, time series analysis.
 Also supports probabilistic models and regression diagnostics.
Example:
import statsmodels.api as sm
model = sm.OLS(df['Y'], sm.add_constant(df['X'])).fit()
model.summary()
5. Machine Learning and AI
 scikit-learn: For classification, regression, clustering, etc.
 TensorFlow, Keras, PyTorch: For deep learning.
 Model evaluation, feature engineering, and pipeline tools.
Example:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
6. Text and Natural Language Processing (NLP)
 Libraries: NLTK, spaCy, TextBlob, transformers
 Text cleaning, tokenization, named entity recognition, sentiment analysis.
Example:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Python is great for data analytics.")
print([token.text for token in doc])
7. Time Series Analysis
 Built-in support in pandas for datetime indexes and resampling.
 Advanced modeling via statsmodels or fbprophet.
Example:
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date').resample('M').mean()
8. Web Scraping and APIs
 Libraries like requests, BeautifulSoup, Scrapy, and Selenium.
 Extract data from websites and APIs.
Example:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://example.com")
soup = BeautifulSoup(r.text, "html.parser")
9. Big Data and Distributed Computing
 Tools like PySpark, Dask, and Vaex to work with large datasets.
 Supports parallel and distributed data processing.
10. Dashboarding and Web Applications
 Create interactive dashboards using:
o Dash (by Plotly)

o Streamlit

o Panel
 Build full web apps with Flask or FastAPI.
11. Automation and Scripting
 Write scripts to automate data cleaning, reporting, file management, etc.
 Schedule tasks using cron or schedule.
12. Database Connectivity
 Connects with SQL, NoSQL, and cloud databases using:
o sqlite3, SQLAlchemy, PyMySQL, psycopg2, MongoDB (via pymongo)

Example:
import sqlite3
conn = sqlite3.connect('mydb.sqlite')
pd.read_sql("SELECT * FROM table_name", conn)
13. Object-Oriented Programming (OOP)
 Define classes and reusable objects.
 Supports inheritance, encapsulation, and polymorphism.
Example:
class Person:
def __init__(self, name):
self.name = name
def greet(self):
print(f"Hello, {self.name}")
14. Modular and Package Development
 You can create reusable modules and Python packages.
 Use pip, setuptools, and virtual environments for dependency management.
15. Cross-Platform and Cloud Integration
 Python scripts run on Windows, Linux, and MacOS.
 Connects with cloud platforms like AWS, GCP, Azure for ML deployment, data pipelines,
and storage.
3.1.12 EVOLUTION AND SIGNIFICANCE OF SQL IN DATA ANALYTICS
SQL (Structured Query Language) is a domain-specific language used for managing and
manipulating relational databases. It was developed in the 1970s at IBM by Donald D.
Chamberlin and Raymond F. Boyce, and later standardized by ANSI and ISO. SQL allows users
to query, insert, update, and delete data within relational database systems.
3.1.13 Early Development (1970s–1980s)
 Originated from the relational model proposed by E.F. Codd in 1970.
 IBM’s System R used an early version of SQL called SEQUEL.
 In 1979, Oracle released the first commercially available implementation of SQL.
Standardization and Commercial Adoption (1986–1990s)
 ANSI standardized SQL in 1986, followed by ISO in 1987.
 Became the standard query language for relational database systems.
 Widely adopted by Oracle, IBM DB2, Microsoft SQL Server, MySQL, and others.
Expansion with the Web (1990s–2000s)
 SQL became critical for dynamic websites and applications (via PHP, ASP, Java).
 Introduction of OLAP (Online Analytical Processing) for business intelligence.
 SQL was integrated with ETL tools and enterprise data warehouses.
Modern Era (2010s–Present)
 Rise of data analytics, data science, and cloud computing brought renewed focus to
SQL.
 Integration with big data tools like HiveQL (Hadoop) and Presto.
 Advent of cloud databases: Google BigQuery, Amazon Redshift, Snowflake.
 Support for semi-structured data (JSON, XML) and advanced analytics.
3.1.14 Functionalities of SQL in Data Analytics

Function SQL Features

Data retrieval SELECT, WHERE, JOIN, GROUP BY,
ORDER BY
Data manipulation INSERT, UPDATE, DELETE
Data aggregation COUNT(), AVG(), SUM(), MAX(), MIN()
Data filtering WHERE, HAVING, IN, LIKE, BETWEEN
Data modeling CREATE TABLE, ALTER,
Subqueries & nesting CONSTRAINTS
Views and stored procedures Nested SELECT, EXISTS, ANY, ALL
Data security CREATE VIEW, PROCEDURE,
FUNCTION
GRANT, REVOKE, roles, permissions

SELECT department, AVG(salary) AS avg_salary

FROM employees
WHERE hire_date >= '2020-01-01'
GROUP BY department
ORDER BY avg_salary DESC;
3.1.15 Significance of SQL in Data Analytics
Data Access and Exploration
 SQL allows direct access to databases for exploratory data analysis (EDA).
 Analysts can summarize, aggregate, and filter large datasets efficiently.
Universality Across Tools
 SQL is supported in nearly all data platforms: MySQL, PostgreSQL, Oracle, SQL Server,
Snowflake, etc.
 Tools like Tableau, Power BI, R, Python, and Excel connect seamlessly with SQL
databases.
Foundation for Data Warehousing and BI
 SQL powers ETL pipelines, data marts, and data warehouses.
 Commonly used in tools like Apache Hive, AWS Redshift, Google BigQuery, and
Databricks SQL.
Efficient Handling of Large Datasets
 SQL engines are optimized for high-speed querying over millions of records.
 Often used for querying “cold” data stored in data lakes and warehouses.
Reproducibility and Automation
 SQL scripts ensure consistent, auditable, and reproducible analyses.
 Can be scheduled as part of ETL or dashboard refresh workflows.
Data Governance and Compliance
 SQL enables fine-grained access control and auditing, which is critical for regulatory
compliance (GDPR, HIPAA, etc.).
3.1.16 Applications of SQL in Analytics
 Marketing Analytics: Customer segmentation, campaign effectiveness.
 Finance: Credit risk analysis, budget monitoring, fraud detection.
 Healthcare: Patient data retrieval, treatment outcome summaries.
 E-commerce: Product performance tracking, recommendation systems.
 Telecom: Churn prediction, usage analysis.
3.2 USES OF LIBRARIES IN STATISTICAL SOFTWARE
1. Pandas (Python Library)
Pandas is an open-source Python library primarily used for data manipulation, data cleaning, and
data analysis. It provides two main data structures:
 Series (1D)
 DataFrame (2D, like a table)
3.2.1 Uses of Pandas

Features Description
Data loading Read/write data from CSV, Excel, SQL,
Data inspection JSON, Parquet, etc.
Data cleaning Quick exploration:.head(), .info(), .describe(),
Data transformation .shape, .columns
Aggregation Handle missing values (.isnull(), .fillna()),
Merging & joining duplicates, outliers
Time series analysis Filtering, sorting, grouping, reshaping, pivot
tables
Data exporting
Grouping and summarizing data
Integration with other tools
using .groupby()
Combine datasets using merge(), concat(),
join()
Date parsing, rolling statistics, resampling
Save cleaned data back to CSV, Excel, JSON,
etc.
Works well with NumPy, Matplotlib,
Seaborn, Scikit-learn

Example:
import pandas as pd
df = pd.read_csv('sales.csv')
monthly_sales = df.groupby('Month')['Revenue'].sum()
3.2.2 Matplotlib (Python Library)
Matplotlib is a comprehensive Python plotting library used for creating static, animated, and
interactive visualizations.
3.2.3 Uses of Matplotlib

Visualization Type Description

Line plots Time series, trends over time
Bar charts Category-wise comparisons
Histograms Distribution of continuous variables
Scatter plots Relationship between two variables
Pie charts Percentage distribution
Custom plots Fully customizable charts (color, size, labels,
Subplots & grids legends, etc.)
Saving figures Multiple plots in a single figure
Animation support Exporting visualizations to PNG, JPG, PDF,
3D plotting etc.
Creating animated plots using
FuncAnimation
With mpl_toolkits.mplot3d for 3D data
visualization

Example:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
plt.title('Simple Line Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()
3.2.4 ggplot2 (R Library)
ggplot2 is a powerful R library for data visualization built on the grammar of graphics concept.
It’s known for its elegant, layered, and customizable plots.
3.2.5 Uses of ggplot2

Feature Description
Grammar of graphics Build plots layer-by-layer (data → aesthetics
Bar, line, scatter plots → geometries → themes)
Statistical visualizations Standard chart types made easy with
Faceting geom_bar(), geom_line(), etc.
Customization Smooth lines, box plots, violin plots,
Theming histograms, and density plots
Coordinate systems Create subplots for different categories using
facet_wrap() or facet_grid()
Integration with tidyverse
Titles, labels, colors, shapes, themes, legends,
scales
Predefined themes: theme_minimal(),
theme_classic (), etc.
Transformations like flip (coord_flip()),
polar, map projections
Works seamlessly with dplyr, tidyr, and other
tidyverse packages

Example:
library(ggplot2)
ggplot(data = mpg, aes(x = displ, y = hwy, color = class)) +geom_point() +
labs(title = "Engine Displacement vs. Highway MPG")
3.2.6 Summary Comparison

Features Pandas Matplotlib ggplots2

Language Python Python R
Main purpose Data manipulation Data visualization Data visualization
Strength Tabular data Fully custom plots Elegant statistical
Learning curve processing Moderate plots
Integration Moderate Pandas, Seaborn, etc. Beginner-friendly in
Output type NumPy, Scikit-learn, Charts and figures R
etc. dplyr, tidyr, tidyverse
DataFrames Charts and figures
3.3 DATA VISUALIZATION
Data visualization is the graphical representation of information and data using visual elements
like charts, graphs, and maps. It transforms raw data into a visual context to help people
understand trends, outliers, patterns, and insights more easily.
3.3.1 The Process of Data Visualization
1. Data Collection
 Gather data from various sources: databases, spreadsheets, APIs, surveys, or sensors.
 Ensure data relevance and integrity.
2. Data Cleaning and Preparation
 Handle missing values, duplicates, and outliers.
 Convert data types, create calculated fields, normalize or aggregate values.
3. Define the Objective
 What do you want to reveal?
o Trends over time?

o Comparisons between groups?

o Distribution of data?

o Relationships between variables?

4. Choose the Right Visualization Technique

 The choice depends on data type and analytical goals (e.g., bar charts for comparison,
scatter plots for correlation).
5. Use a Visualization Tool or Library
 Tools: Tableau, Power BI, Excel
 Libraries: Matplotlib, Seaborn (Python), ggplot2 (R), D3.js (JavaScript)
6. Design and Customize the Visualization
 Add titles, labels, legends, and colors for clarity.
 Follow best practices: avoid clutter, use readable fonts, and maintain proper scales.
7. Interpret and Communicate Insights
 Analyze the visual outputs to derive meaningful insights.
 Present findings to stakeholders through dashboards or reports.
3.3.2 Relevance of Data Visualization in Decision Making

Quick Pattern Detection Helps detect trends, anomalies, and

correlations in large datasets.
Improves Understanding Converts complex data into easily digestible
visuals.
Supports Evidence-Based Decisions Empowers managers to make informed
choices based on data, not intuition.
Enhances Communication Facilitates storytelling and communication
with non-technical stakeholders.
Aids in Monitoring KPIs Enables tracking and comparison of key
performance indicators over time.
Identifies Problems Early Highlights negative trends or outliers that
need attention.

3.3.3 Common Data Visualization Techniques and Their Uses

Technique Description Appropriate Use Case

Rectangular bars to represent categorical Comparing sales by product, revenue

Bar Chart
data. by region.

Connects data points with lines, showing Stock prices over months, temperature
Line Chart
trends over time. changes daily.

Market share by brand, budget

Pie Chart Divides a circle to show proportions.
allocation.

Shows frequency distribution of Distribution of exam scores, income

Histogram
continuous data. levels.

Dots represent two variables' values to Relationship between height and

Scatter Plot
show correlation. weight, sales vs. ad spend.

Displays median, quartiles, and outliers Comparing salary distribution across

Box Plot
in the data. departments.

Uses color intensity to show value Correlation matrices, website activity

Heatmap
magnitude in a matrix. patterns.

Extension of scatter plot with an extra Revenue (size), by product (x), and
Bubble Chart
dimension shown by bubble size. profit margin (y).
Technique Description Appropriate Use Case

Like line charts, but filled under the

Area Chart Cumulative sales, population growth.
curve to show volume.

Nested rectangles are sized and colored Hierarchical data, like product
Tree Map
by data values. categories and subcategories.

Dashboard Interactive visualization containing Executive summaries, financial

(combo) multiple charts. dashboards, business intelligence.

3.3.4 Best Practices for Effective Data Visualization

 Choose the right chart for your data.
 Avoid misleading scales or visual distortion.
 Use consistent color schemes.
 Label axes, legends, and data.
 Provide context and summary to guide interpretation.

R PROGRAMMING QUESTION BANK Answer
100% (1)
R PROGRAMMING QUESTION BANK Answer
20 pages
Unit 5 - R and Data Analysis
No ratings yet
Unit 5 - R and Data Analysis
29 pages
R Notes Previous Year Paper
No ratings yet
R Notes Previous Year Paper
42 pages
Spy Stories India, Paksitan - Andian Levy, Cathy Scott
No ratings yet
Spy Stories India, Paksitan - Andian Levy, Cathy Scott
360 pages
Answer Key
No ratings yet
Answer Key
3 pages
4251 Assignment 8
No ratings yet
4251 Assignment 8
15 pages
R Assignment
No ratings yet
R Assignment
22 pages
NM and R - Unit - IV-Q&A
No ratings yet
NM and R - Unit - IV-Q&A
13 pages
Features of R and Its Applications
No ratings yet
Features of R and Its Applications
2 pages
R Recipes A Problemsolution Approach 1st Ed Larry Pace PDF Download
No ratings yet
R Recipes A Problemsolution Approach 1st Ed Larry Pace PDF Download
82 pages
Assignment For MCA 3rd Sem HPU R Programming
No ratings yet
Assignment For MCA 3rd Sem HPU R Programming
31 pages
10EXP01
No ratings yet
10EXP01
12 pages
Week15 - LAQs - SWR
No ratings yet
Week15 - LAQs - SWR
4 pages
Ba Notes
No ratings yet
Ba Notes
34 pages
Allaire, 2012
No ratings yet
Allaire, 2012
185 pages
Basic Features of R Programming
No ratings yet
Basic Features of R Programming
10 pages
DataAnalyticsUsingR Dr.P.rajesh
No ratings yet
DataAnalyticsUsingR Dr.P.rajesh
77 pages
Sep Report Yash
No ratings yet
Sep Report Yash
33 pages
Unit 1 - R Programming
No ratings yet
Unit 1 - R Programming
30 pages
Lesson Plan in English (PREFIX)
86% (7)
Lesson Plan in English (PREFIX)
3 pages
Harnessing The Power of R in Business
No ratings yet
Harnessing The Power of R in Business
26 pages
BA303 Role of R
No ratings yet
BA303 Role of R
3 pages
Unit 1 Question - Answer
No ratings yet
Unit 1 Question - Answer
10 pages
An R Companion To Statistical Thinking For The 21st Century
No ratings yet
An R Companion To Statistical Thinking For The 21st Century
159 pages
R Practical Report
No ratings yet
R Practical Report
55 pages
Dsur Ea2352001010391 W4
No ratings yet
Dsur Ea2352001010391 W4
3 pages
Practical 01
No ratings yet
Practical 01
3 pages
R Is A Programming Language and Environment Specifically Designed For Statistical Computing
No ratings yet
R Is A Programming Language and Environment Specifically Designed For Statistical Computing
2 pages
DataAnalytics Using R
No ratings yet
DataAnalytics Using R
102 pages
Unit 1
No ratings yet
Unit 1
16 pages
Data Analysis Using R and Python
No ratings yet
Data Analysis Using R and Python
96 pages
Assignment of Business Analytics
No ratings yet
Assignment of Business Analytics
6 pages
R LANGUAGE Final
No ratings yet
R LANGUAGE Final
8 pages
Getting Started With R Detailed Notes
No ratings yet
Getting Started With R Detailed Notes
3 pages
Chapter 8
No ratings yet
Chapter 8
2 pages
Introduction To R Programming
No ratings yet
Introduction To R Programming
5 pages
R Assignment Final
No ratings yet
R Assignment Final
12 pages
Ayush Lab File R
No ratings yet
Ayush Lab File R
25 pages
Unit 5 R
No ratings yet
Unit 5 R
51 pages
R Tutiorial
No ratings yet
R Tutiorial
6 pages
Ashish Srivastava R Lab File
No ratings yet
Ashish Srivastava R Lab File
25 pages
Stats With R
No ratings yet
Stats With R
103 pages
Introduction To R (LECT 1)
No ratings yet
Introduction To R (LECT 1)
15 pages
Unit 1 R PDF
No ratings yet
Unit 1 R PDF
61 pages
Edar M-1
No ratings yet
Edar M-1
46 pages
Ucsp DLL
100% (1)
Ucsp DLL
18 pages
DATA ANALYTICS Practical 1
No ratings yet
DATA ANALYTICS Practical 1
4 pages
Nirula R Programming Lab Manual
No ratings yet
Nirula R Programming Lab Manual
94 pages
A Crash R Course On Statistical Graphics
No ratings yet
A Crash R Course On Statistical Graphics
169 pages
Lecture 1.3.2
100% (1)
Lecture 1.3.2
12 pages
Download, Install and Explore The Features of R For Machine Learning
No ratings yet
Download, Install and Explore The Features of R For Machine Learning
6 pages
Introduction R
No ratings yet
Introduction R
20 pages
R Language 1st Unit Deep
100% (3)
R Language 1st Unit Deep
61 pages
R Language
No ratings yet
R Language
59 pages
PC Magazine - February 2014 USA
No ratings yet
PC Magazine - February 2014 USA
142 pages
DataAnalytics Using R
No ratings yet
DataAnalytics Using R
101 pages
R Viva Questions
100% (1)
R Viva Questions
4 pages
CS ELEC 4 - Analytics Techniques & Tools/Machine Learning: Module No.: 1 (Prelim) Module Title: Writer
No ratings yet
CS ELEC 4 - Analytics Techniques & Tools/Machine Learning: Module No.: 1 (Prelim) Module Title: Writer
22 pages
Data Analysis Using R
100% (1)
Data Analysis Using R
78 pages
R Programming
No ratings yet
R Programming
11 pages
What Is R Programming
No ratings yet
What Is R Programming
7 pages
Introduction and Small Talk
No ratings yet
Introduction and Small Talk
1 page
Chapter-1:-Introduction To R Language: 1.1 History and Overview
No ratings yet
Chapter-1:-Introduction To R Language: 1.1 History and Overview
7 pages
Cedaspe - Bushing
No ratings yet
Cedaspe - Bushing
4 pages
2916 Generator Rotor (El Test)
No ratings yet
2916 Generator Rotor (El Test)
2 pages
Work Order For School Uniform
No ratings yet
Work Order For School Uniform
1 page
Electric Submersible Pump
No ratings yet
Electric Submersible Pump
2 pages
ETR PHD Chemistry 2019
No ratings yet
ETR PHD Chemistry 2019
5 pages
The Concept of The General Will in The Writings of Rousseau, Sièyes, and Robespierre by Dr. Stephen Carruthers
No ratings yet
The Concept of The General Will in The Writings of Rousseau, Sièyes, and Robespierre by Dr. Stephen Carruthers
10 pages
UGBS 105 Lecture 1 - 4 Updated
No ratings yet
UGBS 105 Lecture 1 - 4 Updated
28 pages
ENG-189 SAS12 Speaking 2324
No ratings yet
ENG-189 SAS12 Speaking 2324
7 pages
Colour Codes
No ratings yet
Colour Codes
3 pages
Keisha Williams
No ratings yet
Keisha Williams
26 pages
Module 3 Becg
No ratings yet
Module 3 Becg
23 pages
Bioinformatics Lab Assignment Group 3
No ratings yet
Bioinformatics Lab Assignment Group 3
7 pages
7-Highway Drainage Maintenance DR Khairil
No ratings yet
7-Highway Drainage Maintenance DR Khairil
111 pages
Economics A Contemporary Introduction With InfoTrac 7th Edition William A. Mceachern Instant Download
No ratings yet
Economics A Contemporary Introduction With InfoTrac 7th Edition William A. Mceachern Instant Download
55 pages
Complete
No ratings yet
Complete
14 pages
03-08-24 - JR - IPL-IC - Jee-Main - WTM-05 - Key & Sol's
No ratings yet
03-08-24 - JR - IPL-IC - Jee-Main - WTM-05 - Key & Sol's
16 pages
Dragonborn Warlock 3rd Level
No ratings yet
Dragonborn Warlock 3rd Level
3 pages
Bakteriemia, Sepsis Dan Syok Septik
No ratings yet
Bakteriemia, Sepsis Dan Syok Septik
36 pages
Adaptive Headlight With Orvm Technology
No ratings yet
Adaptive Headlight With Orvm Technology
7 pages
MCEN2001 Lab Report 1
No ratings yet
MCEN2001 Lab Report 1
8 pages
Review On Community Detection Algorithms in Social Network
No ratings yet
Review On Community Detection Algorithms in Social Network
5 pages
IMS Questions 2024 - Bangalore (English) Above 15 Years
No ratings yet
IMS Questions 2024 - Bangalore (English) Above 15 Years
2 pages
Unfavarouble and Hostile Witnesess
No ratings yet
Unfavarouble and Hostile Witnesess
2 pages
Adobe Scan 09 Jan 2025
No ratings yet
Adobe Scan 09 Jan 2025
1 page
Script
No ratings yet
Script
3 pages
R Programming Insights Textbook
From Everand
R Programming Insights Textbook
Manish Soni
No ratings yet
R Programming - a Comprehensive Guide: Software
From Everand
R Programming - a Comprehensive Guide: Software
Editor IJSMI
No ratings yet

Note 5-7

Uploaded by

Note 5-7

Uploaded by

WEEK FIVE-SEVEN

3.1 FUNCTIONS OF STATISTICAL SOFTWARE

NumPy Numerical operations, arrays

pandas Data manipulation and analysis

matplotlib, seaborn Visualization

scikit-learn Machine learning

statsmodels Statistical modeling and testing

TensorFlow, PyTorch Deep learning and AI

OpenCV, NLTK, spaCy Image & text analytics

Versatility Across Domains

 Python models can be deployed as REST APIs with Flask or FastAPI.

o seaborn for statistical charts

o plotly and bokeh for interactive visuals

Function SQL Features

SELECT department, AVG(salary) AS avg_salary

Visualization Type Description

Features Pandas Matplotlib ggplots2

o Comparisons between groups?

o Relationships between variables?

4. Choose the Right Visualization Technique

Quick Pattern Detection Helps detect trends, anomalies, and

3.3.3 Common Data Visualization Techniques and Their Uses

Technique Description Appropriate Use Case

Rectangular bars to represent categorical Comparing sales by product, revenue

Market share by brand, budget

Shows frequency distribution of Distribution of exam scores, income

Dots represent two variables' values to Relationship between height and

Displays median, quartiles, and outliers Comparing salary distribution across

Uses color intensity to show value Correlation matrices, website activity

Like line charts, but filled under the

Dashboard Interactive visualization containing Executive summaries, financial

3.3.4 Best Practices for Effective Data Visualization

You might also like