0% found this document useful (0 votes)
5 views1 page

DS Cheatsheet

The document is a comprehensive cheatsheet for data science, covering core concepts in mathematics, probability, statistics, and machine learning techniques including supervised and unsupervised learning. It also includes sections on deep learning, programming languages, data handling, visualization, and ethical considerations in data science. Key topics include linear algebra, calculus, various statistical methods, optimization techniques, and model evaluation metrics.

Uploaded by

Mirjan Ali Sha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views1 page

DS Cheatsheet

The document is a comprehensive cheatsheet for data science, covering core concepts in mathematics, probability, statistics, and machine learning techniques including supervised and unsupervised learning. It also includes sections on deep learning, programming languages, data handling, visualization, and ethical considerations in data science. Key topics include linear algebra, calculus, various statistical methods, optimization techniques, and model evaluation metrics.

Uploaded by

Mirjan Ali Sha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Data Science Cheatsheet tiRepost

I. Core Concepts & Mathematics

Linear Algebra: Calculus:


• Derivatives: Rules (power, product, quotient, chain), Partial Derivatives,
• Vectors: Operations (addition, scalar multiplication, dot product, cross product), Gradients, Jacobian, Hessian. Understanding optimization (finding
Norms (LI, L2), Linear Independence, Span, Basis. minima/maxima).
•Matrices: Operations (addition, multiplication,

Trace, Rank, Eigenvalues, Eigenvectors, Matrix Decompositions


transpose, inverse), Detorminant,

(SVD, PCA - brief


Integrals

critical for
:Basic
daily
integration
DS work
rules ,Definite
than derivatives)
vs .Indefinite integrals (briefly , less

mention, full details in Machine Learning). • Limits and Continuity (fundamental concepts, but rarely used explicitly in daily

•Systems of Linear Equations: Gaussian Elimination, LU Decomposition (briefly). DS work).


•Vector Spaces and Subspaces • Taylor Series: (Important for understanding some model approximations).

Probability & Statistics:


Basic Probability: Definitions (sample space, events, probability axioms), Conditional Probability, Independence, Bayes' Theorem.

Random Variables: Discrete vs. Continuous, Probability Distributions (PDF, PMF,CDF), Expectation, Variance, Standard Deviation, Covariance, Correlation.

CommonDistributions
:Discrete: Bernoulli, Binomial, Poisson, Geometric.

Continuous: Uniform, Normal (Gaussian), Exponential, Chi-squared, t-distribution, F-distribution.

Descriptive Statistics: Measures of Central Tendency(Mean,Median, Mode),Measures of Dispersion (Variance, Standard Deviation, Range,IOR), Quantiles,

Percentiles.

Inforontial Statistics:

• Hypothesis Testingg: and Alternative Hypotheses, p-value, Significance Level,Type


Null and Type Errors, t-tests, ANOVA, Chi-squared tests,
I
IlI
•Confidence Intervals: and calculation.
Interpretation

Sampling: Simple Random Sampling, Stratified Sampling, etc. (brief overview).


Central Limit Theorem (crucial for understanding sampling distributions).

Maximum Likolihood Estimation (MLE)


•Bayesian Statistics (basics): Prior, Likelihood, Posterior, MAP estimation.

Optimization

• Gradient Descent: Variants (Batch, Stochastic, Mini-batch), Learning Rate, Momentum,Adam, RMSprop.
• Convex Optimization: (brief overview - knowing when a problem is convex is helpful).

• Regularization Concepts: L1 (Lasso), L2 (Ridge) (can also be listed in Machine Learning).

• Constraint Optimisation (Lagrange Multipliers)

Supervised Learning: Unsupervised Learning

Regrossion:
Machine Learning Clustering:
.k-Means Clustering.
Linear Regression (Simple, Multiple), Polynomial Regression,
Model Selection & Evaluation:
Hierarchical Clustering.
Regularization (Lasso,Ridge, Elastic Net)., Evaluation .CuValdation: k-lold.Legve-One-Out, Stratfied DBSCAN.
k-fold.
Metrics: MSE, RMSE, MAE,R-squared, Adjusted R-squared Hyperparameter Tuning:Gnd Search, Random Search, Bayesian Evaluation Metrics: Silhouette Score, Davies
Optimization.
(Mentioned speoificolgorithma,but good have a Bouldin
in Index.
Regularization:
to
In
classification:
in
Dimonsionality Reduction:
k-Nearest Neighbors (k-NN), Logistic Regression, Support Vector .itnoond Jnderfirting
•Principal Component Analysis (PCA).
Machines (sVM): Kernels (Linear, Polynomial, RBF), Decision Trees.,
Foature Engineering: Time Sorios: *-distributed Stochastic Neighbor Embedding (1-SNE).
Random Forests, Gradient Boosting Machines (GBM): XGBoost,
LightGBM, CatBoost, Naive Bayos., Evaluation Metrios: Accuracy, • Scalng Standardization,Normaization. Stationarity, ARIMA, .Adm Alyss (LDA).
ncoders (brief mention, more detail Deep Learning
in
Precision, Recall, Fl-score, ROC AUC, Confusion Matrix, PR AUC.
Wropper, Fmbedded Exponential Smoothing
Selection: section)
Filter,

DEEP LEARNING PROGRAMMING LANGUAGE & ToOLS COMMUNICATION, DEPLOYMENT& ETHICs
Noural Networks: Perceoptron, MLP, Activation (RelU, Sigmoid, Python:
Softmax), Backpropagation, Loss Functions. • Data Structures: Lists,Dictionaries. Communication & Deployment
.CNNS: Convolutional Layers, Pooling. •Control Flow: Loops,Conditionals. • Storytelling, Presentation, Reports.

•RNNs: LSTM, GRU. • Deployment: Model serialization, REST APIs,


•Functions, Lambdas.
Other: Batch Normalization, Dropout, Transfer Learning. Libraries: Cloud.

• NumPy: Array operations.

Data Handling • Pandas: DataFrames


Ethics
• Scikit-learn:Machine learning.
•Bias, Fairness, Privacy, Transparency.
Visualization
pioib/Seaborn:
Wrangling: Missing Values (Imputation), Outliers, Data Cleaning.

•Formats: CSV, JSON, SOL.
Visualzation: Histograms, Scatter Plots, Box Plots, Heatmaps.
. sOL: SELECT,WHERE, GROUP BY, JOIN.
add, commit, push,

• Git: pull,branch, merge.
Lubraries:Matplotlib, Seaborn, • Environments: Conda, Virtualenv By Shailesh

in
Plotly.
@beginnersblog.org

You might also like