0% found this document useful (0 votes)
43 views4 pages

Data Science Toc Srinivas

The document outlines the prerequisites, lab requirements, and agenda for a Python for Data Science training program. The program covers Python fundamentals, classes and exceptions, relational database interaction, the Python data ecosystem including NumPy and Pandas, machine learning techniques like classification and clustering, natural language processing, and concludes with a wrap-up discussion. Participants will get hands-on experience through exercises requiring a local Python installation and modules like NumPy, Pandas, and Matplotlib.

Uploaded by

muthukumar550
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views4 pages

Data Science Toc Srinivas

The document outlines the prerequisites, lab requirements, and agenda for a Python for Data Science training program. The program covers Python fundamentals, classes and exceptions, relational database interaction, the Python data ecosystem including NumPy and Pandas, machine learning techniques like classification and clustering, natural language processing, and concludes with a wrap-up discussion. Participants will get hands-on experience through exercises requiring a local Python installation and modules like NumPy, Pandas, and Matplotlib.

Uploaded by

muthukumar550
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

!

Python for Data Science


Training Program

Pre-requisites
1. Intermediate level expertise with Python
2. Basic idea of the Python data ecosystem
3. Some background on file formats and Relational Databases

Lab requirements
1. 1:1 or 2:1 participant-machine ratio for hands-on and exercises

2. Local installation of Anaconda distribution for Python 3.6


-or-
3. Local installation of ActivePython 3.6+ community edition + additional modules as
needed
4. The following python modules need to be installed: numpy, pandas, jupyter, matplotlib,
flask

Agenda

Python refresher
• The Python interpreter
• Python Data Types
• Data and type introspection basics
• Control structures
• Functions
• Classes
• Errors and exceptions
• Regular expressions
Class basics
• __init__
• self
• private vs public convention
• magic functions
• object creation
• type of objects
• inheritance, multiple inheritance

Errors & exceptions


• Standard exception hierarchy
• exception payloads
• defining new exceptions
• chaining exceptions
• traceback objects
• Assertions

Relational Database Interaction


• CRUD operations
• SQL
• Python DB API 2.0
• sqlite3
• MySQLdb module
o connect()
o Connection objects
o Cursor objects
o execute()
o fetch*()

Data Ecosystem in Python


• Scipy
• Numpy
• Pandas
• Matplotlib
• Ipython
• Jupyter

Numpy
• Why numpy?
• Comparison on memory and run-time with native lists
• Numpy arrays
• Multi-dim arrays
• Mapped operation on numpy arrays
• Filtering

Pandas
• DataFrames
• Series
• Indexes
• Inherited operations from numpy arrays
• from_* methods for reading file formats
• Selecting columns with [] and .
• Filtering
• value_counts()
• group_by() and aggregation functions
• sort_index() and sort_values() to speed-up lookups
• pivoting/unstacking
• Merging dataframes
• Appending
• .loc[] and .iloc[] based lookup
• Working with dates
• Timeseries
• Real examples to try all these operations

Machine Learning: Basics


• Algorithmic logic vs ML logic
• Supervised vs Unsupervised
• Training Data and Test Data
• Classification
• Regression
• Clustering

Supervised Learning: Classification


• The Classification Problem
• Bayes Theorem
• Conditional Probability
• Probabilistic classifier: Naive Bayes Classifier
• Non-probabilistic classifier: k-nearest neighbours(knn)
• K in knn
• Kind of problem instances in knn
• Distance
• Differences between Naive Bayes and knn

Support Vector Machines (SVM)


• Formal definition
• The intuition
• SVM classes in sklearn
• SVM kernels
• RBF and Linear kernels

Unsupervised learning in Python


• Need for dimensionality reduction
• Principal Component Analysis (PCA)
• Difference between PCAs and Latent Factors
• Factor Analysis
• Hierarchical, K-means & DBSCAN Clustering, Gaussian Mixture Models
• SVD
• Clustering Use Cases
Generalised Linear Models in Python
• Linear Regression
• Regularization of Generalized Linear Models
• Ridge and Lasso Regression
• Logistic Regression
• Methods of threshold determination and performance measures for classification score
models

Basics of Natural Language Processing

• Text Processing
• Lemmatization
• Parts of Speech Tagging
• Named Entity Recognition
• Word Embeddings
• Ngrams
• Tf-IDF
• Text Classification

Wrap-up, Discussion and Q&A

You might also like