0% found this document useful (0 votes)
96 views24 pages

Data Science Course Content

This document provides an overview of the Python programming language. It introduces Python concepts like data types, variables, operators, control flow statements, collections like lists, tuples, sets and dictionaries, and functions. The document also discusses Python fundamentals like syntax, modes, implementations and applications of Python in real-world industries.

Uploaded by

sudhakar kethana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views24 pages

Data Science Course Content

This document provides an overview of the Python programming language. It introduces Python concepts like data types, variables, operators, control flow statements, collections like lists, tuples, sets and dictionaries, and functions. The document also discusses Python fundamentals like syntax, modes, implementations and applications of Python in real-world industries.

Uploaded by

sudhakar kethana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

DATA SCIENCE COURSE  What is Python?

CONTENT  WHY PYTHON?


 History
 Features – Dynamic, Interpreted,
Object oriented, Embeddable,
PYTHON CONTENT
Extensible, Large standard libraries,
Introduction to Languages Free and Open source
 Why Python is General Language?
 What is Language?
 Limitations of Python
 Types of languages
 What is PSF?
 Introduction to Translators
 Python implementations
 Compiler
 Python applications
 Interpreter
 Python versions
 What is Scripting Language?
 PYTHON IN REALTIME
 Types of Script
INDUSTRY
 Programming Languages v/s
 Difference between Python 2.x and
Scripting Languages
3.x
 Difference between Scripting and
 Difference between Python 3.7 and
Programming languages
3.8
 What is programming paradigm?
 Software Development Architectures
 Procedural programming paradigm
 Object Oriented Programming Python Software’s
paradigm
 Python Distributions
Introduction to Python

1
 Download &Python Installation  First Python Application
Process in Windows, Unix, Linux  Comments in Python
and Mac  Python file extensions
 Online Python IDLE  Setting Path in Windows
 Python Real-time IDEs like Spyder,  Edit and Run python program
Jupyter Note Book, PyCharm, without IDE
Rodeo, Visual Studio Code, ATOM,  Edit and Run python program using
PyDevetc IDEs
 INSIDE PYTHON
Python Language Fundamentals
 Programmers View of Interpreter
 Inside INTERPRETER
 Python Implementation
 What is Byte Code in PYTHON?
Alternatives/Flavors
 Python Debugger
 Keywords
 Identifiers
Python Variables
 Constants / Literals
 Data types  bytes Data Type
 Python VS JAVA  byte array
 Python Syntax  String Formatting in Python
 Math, Random, Secrets Modules
Different Modes of Python
 Introduction
 Initialization of variables
 Interactive Mode
 Local variables
 Scripting Mode
 Global variables
 Programming Elements
 ‘global’ keyword
 Structure of Python program

2
 Input and Output operations  If
 Data conversion functions – int(),  If-else
float(), complex(), str(), chr(), ord()  If-elif-else
 Nested-if
Operators
 Loop control statements
 for
 Arithmetic Operators
 while
 Comparison Operators
 Nested loops
 Python Assignment Operators
 Branching statements
 Logical Operators
 Break
 Bitwise Operators
 Continue
 Shift operators
 Pass
 Membership Operators
 Return
 Identity Operators
 Case studies
 Ternary Operator
 Operator precedence
Data Structures or Collections
 Difference between “is” vs “==”
 Introduction
Input & Output Operators
 Importance of Data structures
 Applications of Data structures
 Print
 Types of Collections
 Input
 Sequence
 Command-line arguments
 Strings, List, Tuple, range
Control Statements  Non sequence
 Set, Frozen set, Dictionary
 Conditional control statements

3
 Strings  Mutable and Immutable elements of
 What is string List
 Representation of Strings  Nested Lists
 Processing elements using indexing  List_of_lists
 Processing elements using Iterators  Hardcopy, shallowCopy and
 Manipulation of String using DeepCopy
Indexing and Slicing  zip() in Python
 String operators  How to unzip?
 Methods of String object  Python Arrays:
 String Formatting  Case studies
 String functions
Tuple Collection
 String Immutability
 Case studies
 What is tuple?
 Different ways of creating Tuple
List Collection
 Method of Tuple object
 What is List  Tuple is Immutable
 Need of List collection  Mutable and Immutable elements of
 Different ways of creating List Tuple
 List comprehension  Process tuple through Indexing and
 List indices Slicing
 Processing elements of List through  List v/s Tuple
Indexing and Slicing  Case studies
 List object methods
Set Collection
 List is Mutable

4
 What is set?  Reading keys from Dictionary
 Different ways of creating set  Reading values from Dictionary
 Difference between list and set  Reading items from Dictionary
 Iteration Over Sets  Delete Keys from the dictionary
 Accessing elements of set  Sorting the Dictionary
 Python Set Methods  Python Dictionary Functions and
 Python Set Operations methods
 Union of sets  Dictionary comprehension
 functions and methods of set
Functions
 Python Frozen set
 Difference between set and frozenset
 What is Function?
?
 Advantages of functions
 Case study
 Syntax and Writing function
 Calling or Invoking function
Dictionary Collection
 Classification of Functions
 What is dictionary?  No arguments and No return values
 Difference between list, set and  With arguments and No return
dictionary values
 How to create a dictionary?  With arguments and With return
 PYTHON HASHING? values
 Accessing values of dictionary  No arguments and With return
 Python Dictionary Methods values
 Copying dictionary  Recursion
 Updating Dictionary  Python argument type functions :

5
 Default argument functions  Types of Modules – Pre defined,
 Required(Positional) arguments User defined.
function  User defined modules creation
 Keyword arguments function  Functions based modules
 Variable arguments functions  Class based modules
 ‘pass’ keyword in functions  Connecting modules
 Lambda functions/Anonymous  Import module
functions  From … import
 map()  Module alias / Renaming module
 filter()  Built In properties of module
 reduce()
Packages
 Nested functions
 Non local variables, global variables
 Organizing python project into
 Closures
packages
 Decorators
 Types of packages – pre defined,
 Generators
user defined.
 Iterators
 Package v/s Folder
 Monkey patching
 py file
 Importing package
Advanced Python
 PIP
Python Modules  Introduction to PIP
 Installing PIP
 Importance of modular programming
 Installing Python packages
 What is module
 Un installing Python packages

6
OOPs 1. i) Method overriding
2. ii) Constructor overriding
 Procedural v/s Object oriented
programming  Overloading
 Principles of OOP – Encapsulation ,
1. i) Method Overloading
Abstraction (Data Hiding)
2. ii) Constructor Overloading
 Classes and Objects
 How to define class in python
iii) Operator Overloading
 Types of variables – instance
variables, class variables.  Class re-usability
 Types of methods – instance  Composition
methods, class method, static method  Aggregation
  Inheritance – single , multi level,
 Object initialization multiple, hierarchical and hybrid
 ‘self’ reference variable inheritance and Diamond inheritance
 ‘cls’ reference variable  Constructors in inheritance
 Access modifiers – private(__) ,  Object class
protected(_), public  super()
 AT property class  Runtime polymorphism
 Property() object  Method overriding
 Creating object properties using  Method resolution order(MRO)
setaltr, getaltr functions  Method overriding in Multiple
 Encapsulation(Data Binding) inheritance and Hybrid Inheritance
 What is polymorphism?  Duck typing
 Overriding

7
 Concrete Methods in Abstract Base  Try with multi except
Classes  Handling multiple exceptions with
 Difference between Abstraction & single except block
Encapsulation  Finally block
 Inner classes  Try-except-finally
 Introduction  Try with finally
 Writing inner class  Case study of finally block
 Accessing class level members of  Raise keyword
inner class  Custom exceptions / User defined
 Accessing object level members of exceptions
inner class  Need to Custom exceptions
 Local inner classes  Case studies
 Complex inner classes
Regular expressions
 Case studies

 Understanding regular expressions


Exception Handling & Types of Errors
 String v/s Regular expression string
 What is Exception?  “re” module functions
 Why exception handling?  Match()
 Syntax error v/s Runtime error  Search()
 Exception codes – AttributeError,  Split()
ValueError, IndexError,  Findall()
TypeError…  Compile()
 Handling exception – try except  Sub()
block  Subn()

8
 Expressions using operators and  Reading from CSV file
symbols  Writing into CSV file
 Simple character matches  Object serialization – pickle module
 Special characters  XML parsing
 Character classes  JSON parsing
 Mobile number extraction
Python Logging
 Mail extraction
 Different Mail ID patterns
 Logging Levels
 Data extraction
 implement Logging
 Password extraction
 Configure Log File in over writing
 URL extraction
Mode
 Vehicle number extraction
 Timestamp in the Log Messages
 Case study
 Python Program Exceptions to the
Log File
File &Directory handling
 Requirement of Our Own
 Introduction to files Customized Logger
 Opening file  Features of Customized Logger
 File modes
Date & Time module
 Reading data from file
 Writing data into file
 How to use Date & Date Time class
 Appending data into file
 How to use Time Delta object
 Line count in File
 Formatting Date and Time
 CSV module
 Calendar module
 Creating CSV file
 Text calendar

9
 HTML calendar  Join()
 Synchronization – Lock class –
OS module
acquire(), release() functions
 Case studies
 Shell script commands
 Various OS operations in Python
Garbage collection
 Python file system shell methods
 Creating files and directories  Introduction
 Removing files and directories  Importance of Manual garbage
 Shutdown and Restart system collection
 Renaming files and directories  Self reference objects garbage
 Executing system commands collection
 ‘gc’ module
Multi-threading & Multi Processing
 Collect() method
 Threshold function
 Introduction
 Case studies
 Multi tasking v/s Multi threading
 Threading module
Python Data Base
 Creating thread – inheriting Thread
Communications(PDBC)
class , Using callable object
 Life cycle of thread  Introduction to DBMS applications
 Single threaded application  File system v/s DBMS
 Multi threaded application  Communicating with MySQL
 Can we call run() directly?  Python – MySQL connector
 Need to start() method  connector module
 Sleep()  connect() method

10
 Oracle Database  Tkinter module
 Install cx_Oracle  Tk class
 Cursor Object methods  Components / Widgets
 execute() method  Label , Entry , Button , Combo,
 executeMany() method Radio
 fetchone()  Types of Layouts
 fetchmany()  Handling events
 fetchall()  Widgets properties
 Static queries v/s Dynamic queries  Case studies
 Transaction management
Data analytics modules
 Case studies

 Numpy
Python – Network Programming
 Introduction
 What is Sockets?  Scipy
 What is Socket Programming?  Introduction
 The socket Module  Arrays
 Server Socket Methods  Datatypes
 Connecting to a server  Matrices
 A simple server-client program  N dimension arrays
 Server  Indexing and Slicing
 Client  Pandas
 Introduction
Tkinter & Turtle
 Data Frames
 Merge , Join, Concat
 Introduction to GUI programming

11
 MatPlotLib introduction Pandas — Series
 Drawing plots
 Series
 Introduction to Machine learning
 Create an Empty Series
 Types of Machine Learning?
 Create a Series f
 Introduction to Data science
 rom ndarray
DJANGO  rom dict
 rom Scalar
 Introduction to PYTHON Django
 Accessing Data from Series with
 What is Web framework?
Position
 Why Frameworks?
 Retrieve Data Using Label (Index)
 Define MVT Design Pattern
 Difference between MVC and MVT Pandas – DataFrame

PANDAS  DataFrame
 Create DataFrame
Pandas – Introduction
 Create an Empty DataFrame
 Create a DataFrame from Lists
Pandas – Environment Setup
 Create a DataFrame from Dict of
Pandas – Introduction to Data Structures ndarrays / Lists
 Create a DataFrame from List of
 Dimension & Description
Dicts
 Series
 Create a DataFrame from Dict of
 DataFrame
Series
 Data Type of Columns
 Column Selection
 Panel
 Column Addition

12
 Column Deletion  Reindex to Align with Other Objects
 Row Selection, Addition, and  Filling while ReIndexing
Deletion  Limits on Filling while Reindexing
 Renaming
Pandas – Panel
Pandas – Iteration
 Panel()
 Create Panel  Iterating a DataFrame
 Selecting the Data from Panel  iteritems()
 iterrows()
Pandas – Basic Functionality
 itertuples()

 DataFrame Basic Functionality


Pandas – Sorting

Pandas – Descriptive Statistics


 By Label
 Sorting Algorithm
 Functions & Description
 Summarizing Data
Pandas – Working with Text Data

Pandas – Function Application


Pandas – Options and Customization

 Table-wise Function Application


 get_option(param)
 Row or Column Wise Function
 set_option(param,value)
Application
 reset_option(param)
 Element Wise Function Application
 describe_option(param)
 option_context()
Pandas – Reindexing

13
Pandas – Indexing and Selecting Data

 .loc()
 .iloc()
 .ix()
 Use of Notations

Pandas – Statistical Functions

 Percent_change
 Covariance
 Correlation
 Data Ranking

Pandas – Window Functions

 .rolling() Function
 .expanding() Function
 .ewm() Function

Pandas – Aggregations

 Applying Aggregations on
DataFrame

Pandas – Missing Data

 Cleaning / Filling Missing Data

14
 Replace NaN with a Scalar Value EXCEL MODULE
 Fill NA Forward and Backward
 Drop Missing Values Getting Started
 Replace Missing (or) Generic Values Starting Excel
Opening a Workbook
Pandas – GroupBy
Understanding the Display Screen

 Split Data into Groups Working with the Ribbon

 View Groups Exploring the File Tab

 Iterating through Groups Working with the Quick Access Toolbar

 Select a Group Working with the Status Bar

 Aggregations Switching Between Opened Workbooks

 Transformations Using Excel Help

 Filtration
Entering Data
Pandas – Merging/Joining Moving the Cell Pointer
Selecting a Range of Cells
 Merge Using ‘how’ Argument
Creating a New Workbook
Inserting, Renaming, and Deleting
Pandas – Concatenation
Worksheets
 Concatenating Objects Entering Constant Values
 Time Series Using Auto Fill to Enter Data
Saving a Workbook
Editing Cell Contents
Clearing Cell Contents
Working with Undo and Redo

15
Closing a Workbook
Applying Cell Styles
Using Formulas
Entering Formulas Modifying Columns and Rows
Using Auto Fill with Formulas Changing Column Width
Using the SUM Function Changing Row Height
Summing Columns or Rows Automatically Inserting and Deleting Columns or Rows
Using Statistical Functions Hiding Columns or Rows
Working with the Range Finder
Using Formula Error Checking Editing Workbooks
Working with AutoCorrect
Working with Constant Values and Checking Spelling
Formulas Using Find and Replace
Copying and Pasting Constant Values and
Formulas Printing Worksheets
Cutting and Pasting Constant Values and Using Print Preview
Formulas Working with Print Settings
Using Collect and Paste Using Page Setup Tools
Working in Page Layout View

Formatting Worksheets Creating a Header and Footer


Using Page Break Preview
Formatting Numbers
Printing a Worksheet
Changing the Font Format
Exiting Excel
Aligning Cell Contents
VBA
Merging Cells
Adding Borders MACRO

16
Chapter 10 How to design a database
Chapter 11 How to create databases,
tables, and indexes
MYSQL MODULE Chapter 12 How to create views

Section 1 An introduction to MySQL


Chapter 1 An introduction to relational Section 4 Stored program development

databases and SQL Chapter 13 Language skills for writing

Chapter 2 How to use MySQL stored programs

Workbench and other development tools Chapter 14 How to use transactions and
locking

Section 2 The essential SQL skills Chapter 15 How to create stored

Chapter 3 How to retrieve data from a procedures and functions

single table Chapter 16 How to create triggers and

Chapter 4 How to retrieve data from two events

or more tables
Chapter 5 How to code summary queries Section 5 Database administration

Chapter 6 How to code subqueries Chapter 17 An introduction to database

Chapter 7 How to insert, update, and administration

delete data Chapter 18 How to secure a database

Chapter 8 How to work with data types Chapter 19 How to back up and restore a

Chapter 9 How to use functions database

Section 3 Database design and


implementation
17
TABLEAU MODULE  Highlight tables

 Treemaps
 Connect to your data.
 Scatter plots
 Edit and save a data source.

 Understand Tableau terminology.

 Use the Tableau interface/paradigm


to effectively create powerful
visualizations.

 Create basic calculations including


basic arithmetic calculations, custom
aggregations and ratios, date math,
and quick table calculations.

 Build dashboards to share


visualizations.

CHART TYPES COVERED:

 Cross Tabs

 Pie and bar charts

 Geographic maps

 Dual axis and combo charts with


different mark types

18
SPARK MODULE HADOOP MODULE

 Apache Spark and Scala programming


 Difference between Apache Spark and Introduction to Big Data
Hadoop
 Scala and its programming implementation  What is Big data
 Implementing Spark on a cluster  Big Data opportunities,Challenges
 Writing Spark applications using Python,  Characteristics of Big data
Java and Scala
Introduction to Hadoop
 RDD and its operation, along with the
implementation of Spark algorithms
 Hadoop Distributed File System
 Defining and explaining Spark streaming
 Comparing Hadoop & SQL
 Scala classes concept and executing pattern
 Industries using Hadoop
matching
 Data Locality
 Scala–Java interoperability and other Scala
 Hadoop Architecture
operations
 Map Reduce & HDFS
 Working on projects using Scala to run on
 Using the Hadoop single node image
Spark application.
(Clone)

Hadoop Distributed File System (HDFS)

 HDFS Design & Concepts


 Blocks, Name nodes and Data nodes
 HDFS High-Availability and HDFS
Federation

19
 Hadoop DFS The Command-Line  Map Reduce Functional
Interface Programming Basics
 Basic File System Operations  Map and Reduce Basics
 Anatomy of File Read,File Write  How Map Reduce Works
 Block Placement Policy and Modes  Anatomy of a Map Reduce Job Run
 More detailed explanation about  Legacy Architecture ->Job
Configuration files Submission, Job Initialization, Task
 Metadata, FS image, Edit log, Assignment, Task Execution,
Secondary Name Node and Safe Progress and Status Updates
Mode  Job Completion, Failures
 How to add New Data Node  Shuffling and Sorting
dynamically,decommission a Data  Splits, Record reader, Partition,
Node dynamically (Without stopping Types of partitions & Combiner
cluster)  Optimization Techniques ->
 FSCK Utility. (Block report) Speculative Execution, JVM Reuse
 How to override default and No. Slots
configuration at system level and  Types of Schedulers and Counters
Programming level  Comparisons between Old and New
 HDFS Federation API at code and Architecture Level
 ZOOKEEPER Leader Election  Getting the data from RDBMS into
Algorithm HDFS using Custom data types
 Exercise and small use case on  Distributed Cache and Hadoop
HDFS Streaming (Python, Ruby and R)
 YARN
Map Reduce
 Sequential Files and Map Files

20
 Enabling Compression Codec’s  ACID in RDBMS and BASE in
 Map side Join with distributed Cache NoSQL
 Types of I/O Formats: Multiple  CAP Theorem and Types of
outputs, NLINEinputformat Consistency
 Handling small files using  Types of NoSQL Databases in detail
CombineFileInputFormat  Columnar Databases in Detail
(HBASE and CASSANDRA)
Map Reduce Programming – Java
 TTL, Bloom Filters and
Programming
Compensation

 Hands on “Word Count” in Map


HBase
Reduce in standalone and Pseudo
distribution Mode  HBase Installation, Concepts
 Sorting files using Hadoop  HBase Data Model and Comparison
Configuration API discussion between RDBMS and NOSQL
 Emulating “grep” for searching  Master & Region Servers
inside a file in Hadoop  HBase Operations (DDL and DML)
 DBInput Format through Shell and Programming and
 Job Dependency API discussion HBase Architecture
 Input Format API discussion,Split  Catalog Tables
API discussion  Block Cache and sharding
 Custom Data type creation in  SPLITS
Hadoop  DATA Modeling (Sequential, Salted,
Promoted and Random Keys)
NOSQL
 JAVA API’s and Rest Interface

21
 Client Side Buffering and Process 1  Working with Partitions
million records using Client side  User Defined Functions
Buffering  Hive Bucketed Tables and Sampling
 HBase Counters  External partitioned tables, Map the
 Enabling Replication and HBase data to the partition in the table,
RAW Scans Writing the output of one query to
 HBase Filters another table, Multiple inserts
 Bulk Loading and Co processors  Dynamic Partition
(Endpoints and Observers with  Differences between ORDER BY,
programs) DISTRIBUTE BY and SORT BY
 Real world use case consisting of  Bucketing and Sorted Bucketing
HDFS,MR and HBASE with Dynamic partition
 RC File
Hive
 INDEXES and VIEWS
 MAPSIDE JOINS
 Hive Installation, Introduction and
 Compression on hive tables and
Architecture
Migrating Hive tables
 Hive Services, Hive Shell, Hive
 Dynamic substation of Hive and
Server and Hive Web Interface
Different ways of running Hive
(HWI)
 How to enable Update in HIVE
 Meta store, Hive QL
 Log Analysis on Hive
 OLTP vs. OLAP
 Access HBASE tables using Hive
 Working with Tables
 Hands on Exercises
 Primitive data types and complex
data types
Pig

22
 Pig Installation  User Defined Functions, Dynamic
 Execution Types Invokers and Macros
 Grunt Shell  How to access HBASE using PIG,
 Pig Latin Load and Write JSON DATA using
 Data Processing PIG
 Schema on read  Piggy Bank
 Primitive data types and complex  Hands on Exercises
data types
SQOOP
 Tuple schema, BAG Schema and
MAP Schema
 Sqoop Installation
 Loading and Storing
 Import Data.(Full table, Only
 Filtering, Grouping and Joining
Subset, Target Directory, protecting
 Debugging commands (Illustrate and
Password, file format other than
Explain)
CSV, Compressing, Control
 Validations,Type casting in PIG
Parallelism, All tables Import)
 Working with Functions
 Incremental Import(Import only
 User Defined Functions
New data, Last Imported data,
 Types of JOINS in pig and
storing Password in Metastore,
Replicated Join in detail
Sharing Metastore between Sqoop
 SPLITS and Multiquery execution
Clients)
 Error Handling, FLATTEN and
 Free Form Query Import
ORDER BY
 Export data to RDBMS,HIVE and
 Parameter Substitution
HBASE
 Nested For Each
 Hands on Exercises

23
HCatalog More Ecosystems

 HCatalog Installation  HUE.(Hortonworks and Cloudera)


 Introduction to HCatalog
 About Hcatalog with PIG,HIVE and
MR
 Hands on Exercises

Flume

 Flume Installation
 Introduction to Flume
 Flume Agents: Sources, Channels
and Sinks
 Log User information using Java
program in to HDFS using LOG4J
and Avro Source, Tail Source
 Log User information using Java
program in to HBASE using LOG4J
and Avro Source, Tail Source
 Flume Commands
 Use case of Flume: Flume the data
from twitter in to HDFS and
HBASE. Do some analysis using
HIVE and PIG

24

You might also like