Data Science Course Content
Data Science Course Content
1
Download &Python Installation First Python Application
Process in Windows, Unix, Linux Comments in Python
and Mac Python file extensions
Online Python IDLE Setting Path in Windows
Python Real-time IDEs like Spyder, Edit and Run python program
Jupyter Note Book, PyCharm, without IDE
Rodeo, Visual Studio Code, ATOM, Edit and Run python program using
PyDevetc IDEs
INSIDE PYTHON
Python Language Fundamentals
Programmers View of Interpreter
Inside INTERPRETER
Python Implementation
What is Byte Code in PYTHON?
Alternatives/Flavors
Python Debugger
Keywords
Identifiers
Python Variables
Constants / Literals
Data types bytes Data Type
Python VS JAVA byte array
Python Syntax String Formatting in Python
Math, Random, Secrets Modules
Different Modes of Python
Introduction
Initialization of variables
Interactive Mode
Local variables
Scripting Mode
Global variables
Programming Elements
‘global’ keyword
Structure of Python program
2
Input and Output operations If
Data conversion functions – int(), If-else
float(), complex(), str(), chr(), ord() If-elif-else
Nested-if
Operators
Loop control statements
for
Arithmetic Operators
while
Comparison Operators
Nested loops
Python Assignment Operators
Branching statements
Logical Operators
Break
Bitwise Operators
Continue
Shift operators
Pass
Membership Operators
Return
Identity Operators
Case studies
Ternary Operator
Operator precedence
Data Structures or Collections
Difference between “is” vs “==”
Introduction
Input & Output Operators
Importance of Data structures
Applications of Data structures
Print
Types of Collections
Input
Sequence
Command-line arguments
Strings, List, Tuple, range
Control Statements Non sequence
Set, Frozen set, Dictionary
Conditional control statements
3
Strings Mutable and Immutable elements of
What is string List
Representation of Strings Nested Lists
Processing elements using indexing List_of_lists
Processing elements using Iterators Hardcopy, shallowCopy and
Manipulation of String using DeepCopy
Indexing and Slicing zip() in Python
String operators How to unzip?
Methods of String object Python Arrays:
String Formatting Case studies
String functions
Tuple Collection
String Immutability
Case studies
What is tuple?
Different ways of creating Tuple
List Collection
Method of Tuple object
What is List Tuple is Immutable
Need of List collection Mutable and Immutable elements of
Different ways of creating List Tuple
List comprehension Process tuple through Indexing and
List indices Slicing
Processing elements of List through List v/s Tuple
Indexing and Slicing Case studies
List object methods
Set Collection
List is Mutable
4
What is set? Reading keys from Dictionary
Different ways of creating set Reading values from Dictionary
Difference between list and set Reading items from Dictionary
Iteration Over Sets Delete Keys from the dictionary
Accessing elements of set Sorting the Dictionary
Python Set Methods Python Dictionary Functions and
Python Set Operations methods
Union of sets Dictionary comprehension
functions and methods of set
Functions
Python Frozen set
Difference between set and frozenset
What is Function?
?
Advantages of functions
Case study
Syntax and Writing function
Calling or Invoking function
Dictionary Collection
Classification of Functions
What is dictionary? No arguments and No return values
Difference between list, set and With arguments and No return
dictionary values
How to create a dictionary? With arguments and With return
PYTHON HASHING? values
Accessing values of dictionary No arguments and With return
Python Dictionary Methods values
Copying dictionary Recursion
Updating Dictionary Python argument type functions :
5
Default argument functions Types of Modules – Pre defined,
Required(Positional) arguments User defined.
function User defined modules creation
Keyword arguments function Functions based modules
Variable arguments functions Class based modules
‘pass’ keyword in functions Connecting modules
Lambda functions/Anonymous Import module
functions From … import
map() Module alias / Renaming module
filter() Built In properties of module
reduce()
Packages
Nested functions
Non local variables, global variables
Organizing python project into
Closures
packages
Decorators
Types of packages – pre defined,
Generators
user defined.
Iterators
Package v/s Folder
Monkey patching
py file
Importing package
Advanced Python
PIP
Python Modules Introduction to PIP
Installing PIP
Importance of modular programming
Installing Python packages
What is module
Un installing Python packages
6
OOPs 1. i) Method overriding
2. ii) Constructor overriding
Procedural v/s Object oriented
programming Overloading
Principles of OOP – Encapsulation ,
1. i) Method Overloading
Abstraction (Data Hiding)
2. ii) Constructor Overloading
Classes and Objects
How to define class in python
iii) Operator Overloading
Types of variables – instance
variables, class variables. Class re-usability
Types of methods – instance Composition
methods, class method, static method Aggregation
Inheritance – single , multi level,
Object initialization multiple, hierarchical and hybrid
‘self’ reference variable inheritance and Diamond inheritance
‘cls’ reference variable Constructors in inheritance
Access modifiers – private(__) , Object class
protected(_), public super()
AT property class Runtime polymorphism
Property() object Method overriding
Creating object properties using Method resolution order(MRO)
setaltr, getaltr functions Method overriding in Multiple
Encapsulation(Data Binding) inheritance and Hybrid Inheritance
What is polymorphism? Duck typing
Overriding
7
Concrete Methods in Abstract Base Try with multi except
Classes Handling multiple exceptions with
Difference between Abstraction & single except block
Encapsulation Finally block
Inner classes Try-except-finally
Introduction Try with finally
Writing inner class Case study of finally block
Accessing class level members of Raise keyword
inner class Custom exceptions / User defined
Accessing object level members of exceptions
inner class Need to Custom exceptions
Local inner classes Case studies
Complex inner classes
Regular expressions
Case studies
8
Expressions using operators and Reading from CSV file
symbols Writing into CSV file
Simple character matches Object serialization – pickle module
Special characters XML parsing
Character classes JSON parsing
Mobile number extraction
Python Logging
Mail extraction
Different Mail ID patterns
Logging Levels
Data extraction
implement Logging
Password extraction
Configure Log File in over writing
URL extraction
Mode
Vehicle number extraction
Timestamp in the Log Messages
Case study
Python Program Exceptions to the
Log File
File &Directory handling
Requirement of Our Own
Introduction to files Customized Logger
Opening file Features of Customized Logger
File modes
Date & Time module
Reading data from file
Writing data into file
How to use Date & Date Time class
Appending data into file
How to use Time Delta object
Line count in File
Formatting Date and Time
CSV module
Calendar module
Creating CSV file
Text calendar
9
HTML calendar Join()
Synchronization – Lock class –
OS module
acquire(), release() functions
Case studies
Shell script commands
Various OS operations in Python
Garbage collection
Python file system shell methods
Creating files and directories Introduction
Removing files and directories Importance of Manual garbage
Shutdown and Restart system collection
Renaming files and directories Self reference objects garbage
Executing system commands collection
‘gc’ module
Multi-threading & Multi Processing
Collect() method
Threshold function
Introduction
Case studies
Multi tasking v/s Multi threading
Threading module
Python Data Base
Creating thread – inheriting Thread
Communications(PDBC)
class , Using callable object
Life cycle of thread Introduction to DBMS applications
Single threaded application File system v/s DBMS
Multi threaded application Communicating with MySQL
Can we call run() directly? Python – MySQL connector
Need to start() method connector module
Sleep() connect() method
10
Oracle Database Tkinter module
Install cx_Oracle Tk class
Cursor Object methods Components / Widgets
execute() method Label , Entry , Button , Combo,
executeMany() method Radio
fetchone() Types of Layouts
fetchmany() Handling events
fetchall() Widgets properties
Static queries v/s Dynamic queries Case studies
Transaction management
Data analytics modules
Case studies
Numpy
Python – Network Programming
Introduction
What is Sockets? Scipy
What is Socket Programming? Introduction
The socket Module Arrays
Server Socket Methods Datatypes
Connecting to a server Matrices
A simple server-client program N dimension arrays
Server Indexing and Slicing
Client Pandas
Introduction
Tkinter & Turtle
Data Frames
Merge , Join, Concat
Introduction to GUI programming
11
MatPlotLib introduction Pandas — Series
Drawing plots
Series
Introduction to Machine learning
Create an Empty Series
Types of Machine Learning?
Create a Series f
Introduction to Data science
rom ndarray
DJANGO rom dict
rom Scalar
Introduction to PYTHON Django
Accessing Data from Series with
What is Web framework?
Position
Why Frameworks?
Retrieve Data Using Label (Index)
Define MVT Design Pattern
Difference between MVC and MVT Pandas – DataFrame
PANDAS DataFrame
Create DataFrame
Pandas – Introduction
Create an Empty DataFrame
Create a DataFrame from Lists
Pandas – Environment Setup
Create a DataFrame from Dict of
Pandas – Introduction to Data Structures ndarrays / Lists
Create a DataFrame from List of
Dimension & Description
Dicts
Series
Create a DataFrame from Dict of
DataFrame
Series
Data Type of Columns
Column Selection
Panel
Column Addition
12
Column Deletion Reindex to Align with Other Objects
Row Selection, Addition, and Filling while ReIndexing
Deletion Limits on Filling while Reindexing
Renaming
Pandas – Panel
Pandas – Iteration
Panel()
Create Panel Iterating a DataFrame
Selecting the Data from Panel iteritems()
iterrows()
Pandas – Basic Functionality
itertuples()
13
Pandas – Indexing and Selecting Data
.loc()
.iloc()
.ix()
Use of Notations
Percent_change
Covariance
Correlation
Data Ranking
.rolling() Function
.expanding() Function
.ewm() Function
Pandas – Aggregations
Applying Aggregations on
DataFrame
14
Replace NaN with a Scalar Value EXCEL MODULE
Fill NA Forward and Backward
Drop Missing Values Getting Started
Replace Missing (or) Generic Values Starting Excel
Opening a Workbook
Pandas – GroupBy
Understanding the Display Screen
Filtration
Entering Data
Pandas – Merging/Joining Moving the Cell Pointer
Selecting a Range of Cells
Merge Using ‘how’ Argument
Creating a New Workbook
Inserting, Renaming, and Deleting
Pandas – Concatenation
Worksheets
Concatenating Objects Entering Constant Values
Time Series Using Auto Fill to Enter Data
Saving a Workbook
Editing Cell Contents
Clearing Cell Contents
Working with Undo and Redo
15
Closing a Workbook
Applying Cell Styles
Using Formulas
Entering Formulas Modifying Columns and Rows
Using Auto Fill with Formulas Changing Column Width
Using the SUM Function Changing Row Height
Summing Columns or Rows Automatically Inserting and Deleting Columns or Rows
Using Statistical Functions Hiding Columns or Rows
Working with the Range Finder
Using Formula Error Checking Editing Workbooks
Working with AutoCorrect
Working with Constant Values and Checking Spelling
Formulas Using Find and Replace
Copying and Pasting Constant Values and
Formulas Printing Worksheets
Cutting and Pasting Constant Values and Using Print Preview
Formulas Working with Print Settings
Using Collect and Paste Using Page Setup Tools
Working in Page Layout View
16
Chapter 10 How to design a database
Chapter 11 How to create databases,
tables, and indexes
MYSQL MODULE Chapter 12 How to create views
Workbench and other development tools Chapter 14 How to use transactions and
locking
or more tables
Chapter 5 How to code summary queries Section 5 Database administration
Chapter 8 How to work with data types Chapter 19 How to back up and restore a
Treemaps
Connect to your data.
Scatter plots
Edit and save a data source.
Cross Tabs
Geographic maps
18
SPARK MODULE HADOOP MODULE
19
Hadoop DFS The Command-Line Map Reduce Functional
Interface Programming Basics
Basic File System Operations Map and Reduce Basics
Anatomy of File Read,File Write How Map Reduce Works
Block Placement Policy and Modes Anatomy of a Map Reduce Job Run
More detailed explanation about Legacy Architecture ->Job
Configuration files Submission, Job Initialization, Task
Metadata, FS image, Edit log, Assignment, Task Execution,
Secondary Name Node and Safe Progress and Status Updates
Mode Job Completion, Failures
How to add New Data Node Shuffling and Sorting
dynamically,decommission a Data Splits, Record reader, Partition,
Node dynamically (Without stopping Types of partitions & Combiner
cluster) Optimization Techniques ->
FSCK Utility. (Block report) Speculative Execution, JVM Reuse
How to override default and No. Slots
configuration at system level and Types of Schedulers and Counters
Programming level Comparisons between Old and New
HDFS Federation API at code and Architecture Level
ZOOKEEPER Leader Election Getting the data from RDBMS into
Algorithm HDFS using Custom data types
Exercise and small use case on Distributed Cache and Hadoop
HDFS Streaming (Python, Ruby and R)
YARN
Map Reduce
Sequential Files and Map Files
20
Enabling Compression Codec’s ACID in RDBMS and BASE in
Map side Join with distributed Cache NoSQL
Types of I/O Formats: Multiple CAP Theorem and Types of
outputs, NLINEinputformat Consistency
Handling small files using Types of NoSQL Databases in detail
CombineFileInputFormat Columnar Databases in Detail
(HBASE and CASSANDRA)
Map Reduce Programming – Java
TTL, Bloom Filters and
Programming
Compensation
21
Client Side Buffering and Process 1 Working with Partitions
million records using Client side User Defined Functions
Buffering Hive Bucketed Tables and Sampling
HBase Counters External partitioned tables, Map the
Enabling Replication and HBase data to the partition in the table,
RAW Scans Writing the output of one query to
HBase Filters another table, Multiple inserts
Bulk Loading and Co processors Dynamic Partition
(Endpoints and Observers with Differences between ORDER BY,
programs) DISTRIBUTE BY and SORT BY
Real world use case consisting of Bucketing and Sorted Bucketing
HDFS,MR and HBASE with Dynamic partition
RC File
Hive
INDEXES and VIEWS
MAPSIDE JOINS
Hive Installation, Introduction and
Compression on hive tables and
Architecture
Migrating Hive tables
Hive Services, Hive Shell, Hive
Dynamic substation of Hive and
Server and Hive Web Interface
Different ways of running Hive
(HWI)
How to enable Update in HIVE
Meta store, Hive QL
Log Analysis on Hive
OLTP vs. OLAP
Access HBASE tables using Hive
Working with Tables
Hands on Exercises
Primitive data types and complex
data types
Pig
22
Pig Installation User Defined Functions, Dynamic
Execution Types Invokers and Macros
Grunt Shell How to access HBASE using PIG,
Pig Latin Load and Write JSON DATA using
Data Processing PIG
Schema on read Piggy Bank
Primitive data types and complex Hands on Exercises
data types
SQOOP
Tuple schema, BAG Schema and
MAP Schema
Sqoop Installation
Loading and Storing
Import Data.(Full table, Only
Filtering, Grouping and Joining
Subset, Target Directory, protecting
Debugging commands (Illustrate and
Password, file format other than
Explain)
CSV, Compressing, Control
Validations,Type casting in PIG
Parallelism, All tables Import)
Working with Functions
Incremental Import(Import only
User Defined Functions
New data, Last Imported data,
Types of JOINS in pig and
storing Password in Metastore,
Replicated Join in detail
Sharing Metastore between Sqoop
SPLITS and Multiquery execution
Clients)
Error Handling, FLATTEN and
Free Form Query Import
ORDER BY
Export data to RDBMS,HIVE and
Parameter Substitution
HBASE
Nested For Each
Hands on Exercises
23
HCatalog More Ecosystems
Flume
Flume Installation
Introduction to Flume
Flume Agents: Sources, Channels
and Sinks
Log User information using Java
program in to HDFS using LOG4J
and Avro Source, Tail Source
Log User information using Java
program in to HBASE using LOG4J
and Avro Source, Tail Source
Flume Commands
Use case of Flume: Flume the data
from twitter in to HDFS and
HBASE. Do some analysis using
HIVE and PIG
24