Teks DATA SCIENCE Syllabus - QR
Teks DATA SCIENCE Syllabus - QR
CURRICULUM
Installation of Anaconda
Overview on Jupiter notebook
Python Basics
Python syntax and indentation
Python variables and data types (int, float, str, bool)
Input/output functions (input(), print())
Comments in Python
Control Structures
Conditional statements (if, elif, else)
Loops:
for loop
while loop
break, continue, and pass statements
Logical and comparison operators
Functions
Defining functions in Python
Function arguments and return values
Default and keyword arguments
*args and **kwargs
Lambda functions
Exception Handling
Errors in Python (syntax and runtime errors)
try, except, finally blocks
Raising exceptions using raise
Descriptive Statistics
Measures of Central Tendency:
Mean, Median, Mode
Measures of Dispersion:
Range, Variance, Standard Deviation
Interquartile Range (IQR)
Shape of Data Distribution:
Skewness and Kurtosis
Probability Basics
Definition of Probability
Types of Probability:
Classical, Empirical, and Subjective Probability
Probability Rules:
Addition and Multiplication rules
Conditional Probability
Independent and Dependent Events
Bayes’ Theorem and Applications
Linear Regression
Introduction to Regression Problems
Simple Linear Regression
Multiple Linear Regression
Assumptions of Linear Regression
Evaluating Regression Models:
Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-Squared Value
Logistic Regression
Introduction to Classification Problems
Logistic Regression for Binary Classification
Sigmoid Function and Decision Boundaries
Evaluating Classification Models:
Confusion Matrix, Precision, Recall, F1-Score, ROC Curve, AUC
Bagging Techniques
Introduction to Bagging:
Definition and Concept
How Bagging Reduces Variance
Decision Tree Ensembles:
Random Forest Algorithm
Building and Tuning Random Forest Models
Out-of-Bag Error Estimation
Boosting Techniques
Introduction to Boosting:
Definition and Concept
How Boosting Reduces Bias
Common Boosting Algorithms:
AdaBoost
Gradient Boosting Machines (GBM)
XGBoost: Features and Implementation
LightGBM and CatBoost: Differences and Use Cases
k-Means Clustering
Introduction to Clustering
Working of k-Means Algorithm
Evaluating Clusters:
Elbow Method to Find Optimal Number of Clusters
Silhouette Score
Hierarchical Clustering
Introduction to Hierarchical Clustering
Agglomerative vs. Divisive Clustering
Dendrograms and Linkage Methods
Pros and Cons of Hierarchical Clustering
Anomaly Detection
Introduction to Anomaly Detection
Techniques for Identifying Outliers
Isolation Forest, One-Class SVM
Applications of Anomaly Detection in Fraud Detection and Network Security
Database Design
Normalization (1NF, 2NF, 3NF, BCNF)
Denormalization
Primary keys, foreign keys, and unique keys
Indexing
Constraints (NOT NULL, DEFAULT, UNIQUE, CHECK)
SQL Functions
Aggregate functions (COUNT, SUM, AVG, MIN, MAX)
Scalar functions (UPPER, LOWER, LENGTH, ROUND)
Date functions (NOW, CURDATE, DATE_ADD, DATE_SUB)
Joins in SQL
INNER JOIN
LEFT JOIN (or LEFT OUTER JOIN)
RIGHT JOIN (or RIGHT OUTER JOIN)
FULL OUTER JOIN
CROSS JOIN
Self joins
Set Operations
UNION and UNION ALL
INTERSECT
EXCEPT (or MINUS)
Constraints in SQL
PRIMARY KEY constraint
FOREIGN KEY constraint
UNIQUE constraint
CHECK constraint
DEFAULT constraint
Transactions in SQL
ACID properties (Atomicity, Consistency, Isolation, Durability)
COMMIT and ROLLBACK
SAVEPOINT
Transaction isolation levels (READ UNCOMMITTED, READ
COMMITTED, REPEATABLE READ, SERIALIZABLE)
Indexes in SQL
Purpose of indexes
Types of indexes (single-column, multi-column)
Unique and non-unique indexes
Full-text index
Index performance considerations