0% found this document useful (0 votes)
9 views

Data Analytics Curriculum

The document outlines a comprehensive data analytics curriculum consisting of five modules: Basic Python, Data Cleaning with Pandas and Numpy, Explorative Data Analysis with Visualization, SQL Database, and Power BI. Each module covers essential topics such as Python programming basics, data cleaning techniques, exploratory data analysis methods, SQL queries and database management, and Power BI for data visualization and reporting. The curriculum is designed to equip learners with the necessary skills for effective data analysis and visualization in various contexts.

Uploaded by

pra56.kum61
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Data Analytics Curriculum

The document outlines a comprehensive data analytics curriculum consisting of five modules: Basic Python, Data Cleaning with Pandas and Numpy, Explorative Data Analysis with Visualization, SQL Database, and Power BI. Each module covers essential topics such as Python programming basics, data cleaning techniques, exploratory data analysis methods, SQL queries and database management, and Power BI for data visualization and reporting. The curriculum is designed to equip learners with the necessary skills for effective data analysis and visualization in various contexts.

Uploaded by

pra56.kum61
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Data Analytics Curriculum

Module 01: Basic Python


Python Programming
Introduction to Python
Overview of Python and its features
Installing Python
Python IDEs
Writing and executing Python programs
Understanding Python's interactive mode and script mode
Python Basics
Python syntax and indentation
Python variables and data types (int, float, str, bool)
Input/output functions (input(), print())
Comments in Python
Data Structures in Python
Lists:
Creating and manipulating lists
List functions (append(), insert(), remove(), slicing)
Tuples:
Difference between lists and tuples
Accessing elements in a tuple
Sets:
Creating sets, set operations
union(), intersection(), difference()
Dictionaries:
Key-value pairs
Accessing and updating dictionaries
Common dictionary methods (get(), items(), keys(), values())
Control Structures
Conditional statements (if, elif, else)
Loops:
for loop
while loop
break, continue, and pass statements
Logical and comparison operators
Functions
Defining functions in Python
Function arguments and return values
Default and keyword arguments
*args and **kwargs
Lambda functions
Exception Handling
Errors in Python (syntax and runtime errors)
try, except, finally blocks
Raising exceptions using raise
Modules and Packages
Importing modules (import, from...import)
Standard Python libraries (math, random, os, sys, datetime)
Creating and using your own modules
File Handling
Reading from and writing to files (open(), read(), write())
Working with file modes (r, w, a, rb, wb)
Handling file exceptions
Object-Oriented Programming (OOPs) Basics
Introduction to classes and objects
Defining a class and object
Introduction to Python Libraries
Using Python libraries such as NumPy, Matplot and Pandas
Simple data analysis examples using these libraries

Module 02: Data Cleaning with Pandas and Numpy


Introduction to Data Cleaning
Importance of Data Cleaning in Data Science
Common Data Quality Issues:
Missing Data
Duplicate Data
Incorrect Data Types
Inconsistent Data
Overview of Pandas and NumPy for Data Cleaning
Introduction to Pandas and NumPy
Installing and Setting Up Pandas and NumPy
Overview of Pandas DataFrames and Series
Overview of NumPy Arrays and Basic Operations
Importing Data using Pandas:
CSV, Excel, and JSON files
Data Inspection:
head(), info(), describe(), shape, and dtypes
Handling Missing Data
Identifying Missing Data:
isnull(), notnull(), isna(), sum()
Filling Missing Values:
Using fillna() and ffill(), bfill()
Filling with Mean, Median, Mode
Interpolation Techniques
Dropping Missing Values:
dropna() function and its parameters
Replacing Values using replace()
Handling Duplicate Data
Identifying Duplicate Rows:
duplicated(), drop_duplicates()
Removing Duplicate Rows and Columns
Dealing with Duplicate Values based on Conditions
Data Type Conversion
Checking Data Types with dtypes
Converting Data Types using astype():
Converting between integers, floats, and strings
Handling Date and Time Data with Pandas:
Converting to datetime using to_datetime()
Extracting date and time components (day, month, year, hour, minute)
Handling Inconsistent Data
String Manipulation:
Cleaning text data using str methods
Case conversion (lower(), upper())
Removing whitespace and special characters
Replacing substrings in text data
Dealing with Inconsistent Labels:
Renaming columns with rename()
Standardizing labels
Handling Outliers
Identifying Outliers:
Using statistical techniques (IQR, Z-score)
Visualizing outliers with Boxplots and Histograms
Treating Outliers:
Capping, Flooring, and Winsorization
Removing or transforming outliers
Data Transformation with Pandas
Applying Functions to Data using apply(), map(), and applymap()
Lambda Functions for Custom Operations
Creating New Columns from Existing Data
Grouping and Aggregating Data:
groupby(), aggregate(), transform()
Pivoting and Unpivoting Data with pivot(), melt()
Working with Large Datasets
Handling Large Data with NumPy:
Efficient data storage with NumPy arrays
Loading and Manipulating Large Files with Pandas:
Chunking large datasets
Memory optimization techniques (downcasting)
Using Dask for large-scale DataFrames
Merging and Joining DataFrames
Concatenating DataFrames with concat()
Merging DataFrames with merge():
Types of Joins (Inner, Outer, Left, Right)
Combining DataFrames using join()
Data Cleaning with NumPy
Introduction to NumPy Arrays for Data Cleaning
Element-wise Operations on Arrays
Handling Missing Values in NumPy:
np.nan, np.isnan(), and np.nan_to_num()
Using np.where() for Conditional Data Cleaning
Efficient Data Filtering using Boolean Indexing

Module 03: Explorative Data Analysis with Visualization


Introduction to Exploratory Data Analysis (EDA)
Importance of EDA in Data Science
Goals of EDA: Detecting patterns, identifying anomalies, and hypothesis
formulation
Overview of Tools for EDA:
Pandas for data manipulation
Matplotlib, Seaborn, and Plotly for data visualization
Introduction to Data Visualization
Overview of Visualization Tools: Matplotlib, Seaborn, and Plotly
Basic Plot Types:
Line Plot, Bar Plot, Scatter Plot
Univariate Analysis with Matplotlib & Seaborn
Visualization of Single Variables
Numerical Data:
Histograms, Boxplots, Violin Plots
Categorical Data:
Bar Charts, Count Plots
Kernel Density Estimation (KDE) Plots
Bivariate Analysis with Matplotlib & Seaborn
Exploring Relationships between Two Variables
Scatter Plots for Numerical Data
Boxplots and Violin Plots for Categorical vs. Numerical Data
Multivariate Analysis with Matplotlib & Seaborn
Visualizing Multiple Variables
Pair Plots, Joint Plots, and Facet Grids
3D Scatter Plots and Bubble Charts
Correlation Matrices and Heatmaps
Customizing with Matplotlib
Customizing Plots: Titles, Labels, Legends, Grids
Subplots and Figure Layouts
Saving and Exporting Figures
Time Series Visualization
Line Plots for Time Series Data
Rolling Statistics and Moving Averages
Time Series Decomposition for Trend and Seasonality

Module 04: SQL Data Base


Introduction to SQL
History and evolution of SQL
SQL vs NoSQL
Types of databases (RDBMS, column-based, key-value, etc.)
Database concepts: Tables, Rows, Columns, Relationships
SQL Data Types
Numeric types (INT, FLOAT, DECIMAL)
Character types (CHAR, VARCHAR, TEXT)
Date and time types (DATE, TIME, TIMESTAMP)
Boolean types
BLOB (Binary Large Object)
Database Design
Normalization (1NF, 2NF, 3NF, BCNF)
Denormalization
Primary keys, foreign keys, and unique keys
Indexing
Constraints (NOT NULL, DEFAULT, UNIQUE, CHECK)
Basic SQL Queries
SELECT statement
WHERE clause and logical operators (AND, OR, NOT)
ORDER BY clause
LIMIT and OFFSET clauses
DISTINCT keyword
SQL Functions
Aggregate functions (COUNT, SUM, AVG, MIN, MAX)
Scalar functions (UPPER, LOWER, LENGTH, ROUND)
Date functions (NOW, CURDATE, DATE_ADD, DATE_SUB)
Joins in SQL
INNER JOIN
LEFT JOIN (or LEFT OUTER JOIN)
RIGHT JOIN (or RIGHT OUTER JOIN)
FULL OUTER JOIN
CROSS JOIN
Self joins
Subqueries and Nested Queries
Single-row subqueries
Multi-row subqueries
Correlated subqueries
EXISTS and NOT EXISTS clauses
Set Operations
UNION and UNION ALL
INTERSECT
EXCEPT (or MINUS)
Data Manipulation Language (DML)
INSERT statement
UPDATE statement
ADD statement
DELETE statement
TRUNCATE statement
Data Definition Language (DDL)
CREATE TABLE
ALTER TABLE (add, modify, drop columns)
DROP TABLE
CREATE VIEW, DROP VIEW
Constraints in SQL
PRIMARY KEY constraint
FOREIGN KEY constraint
UNIQUE constraint
CHECK constraint
DEFAULT constraint
Transactions in SQL
ACID properties (Atomicity, Consistency, Isolation, Durability)
COMMIT and ROLLBACK
SAVEPOINT
Transaction isolation levels (READ UNCOMMITTED, READ
COMMITTED, REPEATABLE READ, SERIALIZABLE)
Indexes in SQL
Purpose of indexes
Types of indexes (single-column, multi-column)
Unique and non-unique indexes
Full-text index
Index performance considerations
SQL Views
Creating views
Updating views
Dropping views
Advantages and limitations of views
SQL Trigger
Stored Procedures and Functions
Creating stored procedures
IN, OUT, and INOUT parameters
Creating user-defined functions
Differences between stored procedures and functions

Module 05: Power BI


Introduction to Power BI
Overview of Business Intelligence and Data Visualization
Importance of Power BI in Data Science
Components of Power BI:
Power BI Desktop, Power BI Service, Power BI Mobile
Installation and Setup of Power BI Desktop
Getting Started with Power BI Desktop
Interface Overview: Ribbon, Fields Pane, Visualizations Pane
Importing Data:
Connecting to various data sources (Excel, SQL, Web, etc.)
Understanding Data Types and Basic Data Profiling
Data Transformation with Power Query
Introduction to Power Query Editor
Data Cleaning Techniques:
Removing duplicates, filtering rows, and changing data types
Merging and Appending Queries
Creating Custom Columns and Calculated Fields
Handling Missing Values
Data Modeling in Power BI
Understanding Relationships: One-to-One, One-to-Many, Many-to-Many
Creating and Managing Relationships between Tables
Using Star Schema and Snowflake Schema for Data Models
Introduction to Data Hierarchies
DAX (Data Analysis Expressions) Basics
Introduction to DAX: What It Is and Why It’s Important
Creating Calculated Columns and Measures
Basic DAX Functions:
SUM, AVERAGE, COUNT, DISTINCTCOUN
Time Intelligence Functions:
YTD, QTD, MTD calculations
Data Visualization Techniques
Creating Basic Visualizations:
Stacked Column charts, Line charts, Pie charts,Donut Chart, Ribbon Plot,
Tables, and Matrix
Advanced Visualizations:
Treemaps, Waterfall charts, Scatter plots, Maps
Custom Visualizations from Power BI Marketplace
Best Practices for Data Visualization Design
Creating Interactive Reports and Dashboards
Designing Interactive Reports:
Using slicers, filters, and drill-through functionality
Creating Bookmarks and Buttons for Navigation
Tips for Effective Dashboard Design
Publishing Reports to Power BI Service

You might also like