Note 5-7
Note 5-7
Library Purpose
o Dash
o Voila
Example:
import seaborn as sns
sns.boxplot(x='Category', y='Value', data=df)
4. Statistical Analysis
statsmodels for linear models, hypothesis testing, ANOVA, time series analysis.
Also supports probabilistic models and regression diagnostics.
Example:
import statsmodels.api as sm
model = sm.OLS(df['Y'], sm.add_constant(df['X'])).fit()
model.summary()
5. Machine Learning and AI
scikit-learn: For classification, regression, clustering, etc.
TensorFlow, Keras, PyTorch: For deep learning.
Model evaluation, feature engineering, and pipeline tools.
Example:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
6. Text and Natural Language Processing (NLP)
Libraries: NLTK, spaCy, TextBlob, transformers
Text cleaning, tokenization, named entity recognition, sentiment analysis.
Example:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Python is great for data analytics.")
print([token.text for token in doc])
7. Time Series Analysis
Built-in support in pandas for datetime indexes and resampling.
Advanced modeling via statsmodels or fbprophet.
Example:
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date').resample('M').mean()
8. Web Scraping and APIs
Libraries like requests, BeautifulSoup, Scrapy, and Selenium.
Extract data from websites and APIs.
Example:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://example.com")
soup = BeautifulSoup(r.text, "html.parser")
9. Big Data and Distributed Computing
Tools like PySpark, Dask, and Vaex to work with large datasets.
Supports parallel and distributed data processing.
10. Dashboarding and Web Applications
Create interactive dashboards using:
o Dash (by Plotly)
o Streamlit
o Panel
Build full web apps with Flask or FastAPI.
11. Automation and Scripting
Write scripts to automate data cleaning, reporting, file management, etc.
Schedule tasks using cron or schedule.
12. Database Connectivity
Connects with SQL, NoSQL, and cloud databases using:
o sqlite3, SQLAlchemy, PyMySQL, psycopg2, MongoDB (via pymongo)
Example:
import sqlite3
conn = sqlite3.connect('mydb.sqlite')
pd.read_sql("SELECT * FROM table_name", conn)
13. Object-Oriented Programming (OOP)
Define classes and reusable objects.
Supports inheritance, encapsulation, and polymorphism.
Example:
class Person:
def __init__(self, name):
self.name = name
def greet(self):
print(f"Hello, {self.name}")
14. Modular and Package Development
You can create reusable modules and Python packages.
Use pip, setuptools, and virtual environments for dependency management.
15. Cross-Platform and Cloud Integration
Python scripts run on Windows, Linux, and MacOS.
Connects with cloud platforms like AWS, GCP, Azure for ML deployment, data pipelines,
and storage.
3.1.12 EVOLUTION AND SIGNIFICANCE OF SQL IN DATA ANALYTICS
SQL (Structured Query Language) is a domain-specific language used for managing and
manipulating relational databases. It was developed in the 1970s at IBM by Donald D.
Chamberlin and Raymond F. Boyce, and later standardized by ANSI and ISO. SQL allows users
to query, insert, update, and delete data within relational database systems.
3.1.13 Early Development (1970s–1980s)
Originated from the relational model proposed by E.F. Codd in 1970.
IBM’s System R used an early version of SQL called SEQUEL.
In 1979, Oracle released the first commercially available implementation of SQL.
Standardization and Commercial Adoption (1986–1990s)
ANSI standardized SQL in 1986, followed by ISO in 1987.
Became the standard query language for relational database systems.
Widely adopted by Oracle, IBM DB2, Microsoft SQL Server, MySQL, and others.
Expansion with the Web (1990s–2000s)
SQL became critical for dynamic websites and applications (via PHP, ASP, Java).
Introduction of OLAP (Online Analytical Processing) for business intelligence.
SQL was integrated with ETL tools and enterprise data warehouses.
Modern Era (2010s–Present)
Rise of data analytics, data science, and cloud computing brought renewed focus to
SQL.
Integration with big data tools like HiveQL (Hadoop) and Presto.
Advent of cloud databases: Google BigQuery, Amazon Redshift, Snowflake.
Support for semi-structured data (JSON, XML) and advanced analytics.
3.1.14 Functionalities of SQL in Data Analytics
Features Description
Data loading Read/write data from CSV, Excel, SQL,
Data inspection JSON, Parquet, etc.
Data cleaning Quick exploration:.head(), .info(), .describe(),
Data transformation .shape, .columns
Aggregation Handle missing values (.isnull(), .fillna()),
Merging & joining duplicates, outliers
Time series analysis Filtering, sorting, grouping, reshaping, pivot
tables
Data exporting
Grouping and summarizing data
Integration with other tools
using .groupby()
Combine datasets using merge(), concat(),
join()
Date parsing, rolling statistics, resampling
Save cleaned data back to CSV, Excel, JSON,
etc.
Works well with NumPy, Matplotlib,
Seaborn, Scikit-learn
Example:
import pandas as pd
df = pd.read_csv('sales.csv')
monthly_sales = df.groupby('Month')['Revenue'].sum()
3.2.2 Matplotlib (Python Library)
Matplotlib is a comprehensive Python plotting library used for creating static, animated, and
interactive visualizations.
3.2.3 Uses of Matplotlib
Example:
import matplotlib.pyplot as plt
plt.plot([1, 2, 3], [4, 5, 6])
plt.title('Simple Line Plot')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')
plt.show()
3.2.4 ggplot2 (R Library)
ggplot2 is a powerful R library for data visualization built on the grammar of graphics concept.
It’s known for its elegant, layered, and customizable plots.
3.2.5 Uses of ggplot2
Feature Description
Grammar of graphics Build plots layer-by-layer (data → aesthetics
Bar, line, scatter plots → geometries → themes)
Statistical visualizations Standard chart types made easy with
Faceting geom_bar(), geom_line(), etc.
Customization Smooth lines, box plots, violin plots,
Theming histograms, and density plots
Coordinate systems Create subplots for different categories using
facet_wrap() or facet_grid()
Integration with tidyverse
Titles, labels, colors, shapes, themes, legends,
scales
Predefined themes: theme_minimal(),
theme_classic (), etc.
Transformations like flip (coord_flip()),
polar, map projections
Works seamlessly with dplyr, tidyr, and other
tidyverse packages
Example:
library(ggplot2)
ggplot(data = mpg, aes(x = displ, y = hwy, color = class)) +geom_point() +
labs(title = "Engine Displacement vs. Highway MPG")
3.2.6 Summary Comparison
o Distribution of data?
Connects data points with lines, showing Stock prices over months, temperature
Line Chart
trends over time. changes daily.
Extension of scatter plot with an extra Revenue (size), by product (x), and
Bubble Chart
dimension shown by bubble size. profit margin (y).
Technique Description Appropriate Use Case
Nested rectangles are sized and colored Hierarchical data, like product
Tree Map
by data values. categories and subcategories.