0% found this document useful (0 votes)
5 views15 pages

Python Course in Theory

This document provides an overview of Python programming concepts, including variables, data types, functions, conditionals, and loops. It also covers libraries like NumPy and Pandas for data manipulation and analysis, as well as tools for natural language processing (NLP). Additionally, it discusses file handling, regular expressions, and best practices for writing Python code.

Uploaded by

Tanvir Anjum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views15 pages

Python Course in Theory

This document provides an overview of Python programming concepts, including variables, data types, functions, conditionals, and loops. It also covers libraries like NumPy and Pandas for data manipulation and analysis, as well as tools for natural language processing (NLP). Additionally, it discusses file handling, regular expressions, and best practices for writing Python code.

Uploaded by

Tanvir Anjum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Python course in theory

Variables
In essence, variables are like labeled containers for storing data values in your Python programs.
They make your code more flexible and reusable by letting you reference these values using
descriptive names.

Naming Conventions:
● Start with a letter (lowercase) or underscore.
● Can contain letters (uppercase/lowercase), numbers, and underscores.
● Be descriptive and meaningful.
● Avoid using reserved keywords like if, else, for, etc

Simple Types: Integers, Strings, and Floats


These are the fundamental building blocks for representing data in Python.
● Integers (int): Whole numbers (positive, negative, or zero).
● Strings (str): Sequences of characters enclosed in single or double quotes.
● Floats (float): Numbers with decimal points.
List Types
Lists are versatile containers that can store ordered collections of items (of any data type) within
a single variable.
Operations:
● append(item): Add an item to the end.
● insert(index, item): Insert an item at a specific index.
● remove(item): Remove the first occurrence of an item.
● pop(index): Remove and return an item at a specific index (defaults to the last item).
● len(list): Get the number of items in the list.

Type Attributes
Every object in Python has a type, which determines its behavior and the operations you can
perform on it. You can check an object's type using the type() function:
Dictionary Types
Dictionaries store unordered collections of key-value pairs. Each key is unique and associated
with a value.
Tuple Types
Tuples are ordered collections of items, similar to lists, but they are immutable – once created,
their elements cannot be changed.
When to Use Tuples:
● When you want to ensure that data remains constant.
● To represent fixed groups of values (e.g., coordinates).

How to Use Data Types in the Real World


● Integers: Counting items, representing ages, storing numerical data.
● Strings: Storing names, addresses, text messages, any textual information.
● Floats: Representing prices, scientific measurements, financial data.
● Lists: Storing collections of items (e.g., shopping lists, search results).
● Dictionaries: Storing structured data with key-value relationships (e.g., contact
information, product catalogs).
● Tuples: Representing fixed data structures (e.g., coordinates, RGB colors).

More Operations with Lists


● Concatenation: Combine lists using the + operator.
Repetition: Repeat a list using the * operator.
Membership Testing: Check if an item is in a list using in or not in.
Sorting: Use list.sort() to sort in place, or sorted(list) to create a new sorted list.
● Slicing (More on this below): Extract portions of lists.
Accessing List Items
● Indexing: Use square brackets [] to access items by their position (0-based).
Negative Indexing: Access items from the end using negative indices (-1 is the last item).
Accessing List Slices
● Slicing: Use [start:stop:step] to extract a portion of a list.
Accessing Characters and Slices in Strings
Strings are sequences of characters, so you can access them like lists:
Accessing Items in Dictionaries
● Key Lookup: Use square brackets with the key to get its associated value.
get() Method: A safer way to access values, returning None if the key doesn't exist.
Tip: Converting Between Data Types
● Explicit Conversion (Casting):
Implicit Conversion: Sometimes Python automatically converts types (e.g., in arithmetic
operations). Be cautious with this!

Creating Your Own Function


Functions are reusable blocks of code that perform specific tasks. They help you organize your
code, avoid repetition, and make it more readable.

Print or Return
● print(): Displays output on the console. Primarily used for providing information to the
user.
● return: Sends a value back to the code that called the function. This value can be stored in
a variable, used in further calculations, etc.
Intro to Conditionals
Conditionals (like if, elif, else) let your program make decisions based on whether certain
conditions are true or false.

More on Conditionals
● Comparison Operators: == (equal), != (not equal), >, <, >=, <=.
● Logical Operators: and, or, not.
● Chained Conditionals (elif): Test multiple conditions in sequence.

White Space
Python uses indentation (usually four spaces) to define code blocks. This improves readability
and is strictly enforced.
Functions and Conditionals
● Functions: Encapsulate reusable code, improving organization and reducing redundancy.
● Conditionals: Enable decision-making based on various conditions, making your
programs more dynamic and intelligent
User Input
The input() function allows you to take input from the user through the keyboard.
String Formatting
String formatting allows you to create dynamic strings by inserting values into placeholders.
Using f-strings (Formatted String Literals): (Most recommended and easy way)
For Loops: How and Why
For loops iterate over a sequence of items (like a list, string, or range of numbers) and execute a
block of code for each item
Why:
● Automate repetitive tasks.
● Process each element in a collection.
● Create sequences of numbers easily.
Looping through a Dictionary

While Loops: How and Why


While loops repeatedly execute a block of code as long as a given condition remains true.
Why:
● Repeat tasks an unknown number of times.
● Keep running a process until a specific event occurs.

While Loops with Break and Continue


● break: Immediately exits the loop, regardless of the condition.
● continue: Skips the rest of the current iteration and moves to the next one
List Comprehensions
List comprehensions provide a concise way to create lists based on existing ones.

Python Shell and Terminal Tips


The Python shell provides an interactive environment to experiment with code. The terminal
allows you to execute Python scripts and interact with your operating system. Here are some tips:
● IPython/Jupyter: Consider using enhanced shells like IPython or Jupyter notebooks for
better code editing, autocompletion, and visualization.
● Tab Completion: Press Tab to autocomplete variable names, function names, and even
file paths.
● Up Arrow: Cycle through previously executed commands to save time.
● Help Function: Use help(object) to get documentation on functions, modules, or classes.

Python coding syntax in summary

Indentation: Unlike many languages that use braces {}, Python uses indentation (spaces or tabs)
to define code blocks.

Statements: Instructions executed line by line.


Data Types: Determine how values are stored and manipulated. Python has many built-in types,
and you can create custom ones.

Loops (for, while): Repeat code blocks.


Functions Reusable blocks of code
Lists: Ordered collections of items.
Dictionaries: Key-value pairs.
Sets: Unordered collections of unique items.
Tuples: Immutable ordered collections of items.
Modules and Libraries : Extend Python's functionality
Readability: Use meaningful names for variables and functions, add comments, and follow
consistent formatting.
Exception Handling: try/except/finally: Handle errors gracefully.
Map, Filter, Reduce: Process iterables functionally
NumPy: Numerical Python, the foundation for numerical operations. Provides powerful arrays
and mathematical functions.
.Pandas: Data analysis workhorse. Offers DataFrames for tabular data manipulation and analysis.
Matplotlib: Versatile plotting library for visualizations.
Scikit-learn: Machine learning toolkit with a wide array of algorithms (regression, classification,
clustering) and model evaluation tools.
Seaborn: Statistical data visualization library built on top of Matplotlib. Offers visually appealing
and informative plots.
Statsmodels: Provides statistical models and tests for more in-depth analysis.
SciPy: Scientific computing library offering modules for optimization, linear algebra,
integration, and more.
Beautiful Soup: Web scraping library for extracting data from websites.
NLTK (Natural Language Toolkit): Library for working with human language data (text
analysis, sentiment analysis).
Jupyter Notebooks: Interactive environment ideal for data exploration, analysis, and
visualization.
Virtual Environments: Create isolated environments to manage dependencies for different
projects.
Version Control (Git): Track changes in your code and collaborate effectively.

Why Python for NLP?


Python is a popular programming language that's perfect for NLP because:
● Easy to Learn: Python's syntax is clear and straightforward, even for beginners.
● Powerful Libraries: Python has special tools (libraries) made specifically for NLP tasks.
Think of these libraries as handy toolboxes with pre-made solutions for common NLP
challenges.
Key Python Libraries for NLP:
● Spacy: A fast and efficient library for tasks like breaking down sentences, identifying
parts of speech (nouns, verbs, etc.), and understanding the relationships between words.
● NLTK (Natural Language Toolkit): A classic NLP library packed with tools for text
analysis, classification, and more.
● Scikit-Learn: This library is your go-to for machine learning tasks like training models to
predict outcomes or classify text.
● Deep Learning Libraries (TensorFlow, Keras, PyTorch): These libraries help you build
and train powerful neural networks, which are like supercharged brains for complex NLP
tasks.
How It Works (Simple Steps):
1. Get Your Text: Gather the text data you want to analyze (e.g., emails, reviews, articles).
2. Clean It Up: Remove unnecessary characters, punctuation, and other noise.
3. Break It Down: Divide the text into words, sentences, or even smaller chunks.
4. Understand the Meaning: Figure out parts of speech, identify important words (entities),
and analyze sentiment.
5. Do Cool Stuff: Here's where the fun begins! You can:
○ Classify Text: Decide if an email is spam or not, or figure out the topic of a news
article.
○ Generate Text: Create automated summaries of documents, write poems, or even
have your computer chat with you!
○ Translate Languages: Build your own language translation tool.

Example: Sentiment Analysis


Let's say you want to analyze movie reviews to see if people liked or disliked a movie. Here's a
simple example:
1. Get reviews: Collect a bunch of movie reviews from the web.
2. Clean: Remove extra stuff like punctuation and special characters.
3. Break down: Separate each review into individual words.
4. Analyze: Use a pre-trained model (like those in Spacy or NLTK) to figure out if each
word is generally positive or negative.
5. Decide: Count up the positive and negative words in each review to see if the overall
sentiment is positive or negative

Working with Text Files in Python:


● Opening and Reading: Use the open() function to access text files. You can read them
line by line or all at once.
● Writing: Use the open() function with the 'w' mode to create or overwrite a file, or 'a'
mode to append to an existing file.
● Encoding: Be mindful of character encoding (e.g., UTF-8) to avoid garbled text.

Working with PDF Files in Python:


● PyPDF2: A popular library for extracting text from PDFs, splitting or merging them, and
even adding metadata.
● PDFMiner: Another library for parsing PDF files and extracting text content.
● OCR (Optical Character Recognition): If your PDF is an image-based scan, you might
need OCR tools like Tesseract to convert the images into text before using the libraries
above.
Introduction to Python Text Basics:
● Strings: The most fundamental way to represent text in Python. They are sequences of
characters enclosed in single or double quotes (e.g., 'hello', "world").
● String Operations: Python offers a wide range of built-in functions for manipulating
strings:
○ Concatenation: Joining strings together with the + operator.
○ Slicing: Extracting parts of a string using [] (e.g., text[0:5] gets the first five
characters).
○ Methods: Powerful actions like lower() (convert to lowercase), upper() (convert to
uppercase), replace() (swap out characters), split() (break into words), and many
more.
Working with Text Files with Python (Part One & Two):
● Opening Files: The open() function gives you access to a file. You need to specify:
○ File Path: Where the file is located on your computer.
○ Mode: How you want to interact with the file ('r' for reading, 'w' for writing, 'a' for
appending).
● Reading Files:
○ read(): Reads the entire file as one big string.
○ readline(): Reads a single line of text.
○ readlines(): Reads all lines and returns them as a list.
● Writing Files:
○ write(): Writes a string to the file.
○ writelines(): Writes a list of strings to the file.
● Closing Files: Always close your files using the close() method to free up resources.
Working with PDFs:
● Libraries:
○ PyPDF2: Great for basic tasks like extracting text, merging/splitting PDFs, and
adding metadata.
○ PDFMiner: More powerful but can be a bit trickier to use. Good for parsing PDFs
and extracting text.
● OCR: If your PDF is a scanned image, you'll need OCR tools like Tesseract to convert
the images into text that can be processed.
Regular Expressions (Part One & Two):
● Pattern Matching: Regex allows you to create patterns to find or replace specific text.
● Special Characters: Regex uses special characters to define patterns:
○ . matches any single character.
○ * matches zero or more repetitions of the preceding character.
○ + matches one or more repetitions of the preceding character.
○ ? matches zero or one occurrence of the preceding character.
○ [] defines a character class (e.g., [a-z] matches any lowercase letter).
● Libraries: Python's re module provides tools for working with regular expressions.

NumPy Introduction
NumPy, short for "Numerical Python," is the cornerstone of scientific computing in
Python. It provides:
● Powerful N-dimensional arrays: This is NumPy's core object, enabling
efficient storage and manipulation of numerical data.
● Broadcasting: A powerful mechanism that allows arithmetic operations
between arrays of different shapes.
● High-performance functions: NumPy includes functions for linear
algebra, Fourier transforms, random number generation, and more.

● Numpy array creation


● Numy indexing and slicing

NaN and INF


● NaN (Not a Number): Represents undefined or unrepresentable values
(e.g., 0/0).
● INF (Infinity): Represents positive or negative infinity.
Statistical Operators
● Mean: np.mean(arr)
● Median: np.median(arr)
● Standard Deviation: np.std(arr)
Shape and Reshape
● Shape: Returns the dimensions of an array.

Reshape: Changes the shape of an array.

Ravel and Flatten


● Ravel: Returns a contiguous flattened array.
● Flatten: Similar to ravel, but returns a copy of the array.
Sequence, Repetitions, and Random Numbers
● Sequence: np.arange(1, 11, 2) # Output: [1 3 5 7 9]
● Repetitions: np.repeat([1, 2], 3) # Output: [1 1 1 2 2 2]
● Random Numbers: np.random.rand(3, 2) # 3x2 array of random numbers
where(), argmax(), argmin()
● where(): Returns the indices where a condition is true.
● argmax(): Returns the index of the maximum value.
● argmin(): Returns the index of the minimum value.
File Read and Write
● Save: np.save('my_array.npy', arr)
● Load: loaded_arr = np.load('my_array.npy')
Concatenate and Sorting
● Concatenate: np.concatenate((arr, [6, 7]))
● Sorting: np.sort(arr)
Working with Dates
NumPy doesn't directly handle dates, but you can use the datetime module or
libraries like pandas for date manipulation in conjunction with NumPy.

pandas introduction, dataFrame and series, file reading and writing, info, shape ,
duplicated and drop, columns, Nan and Null values, imputation, Lambda Function.

Pandas Introduction
Pandas is built on top of NumPy and provides two key data structures:
1. Series: A one-dimensional labeled array that can hold any data type
(integers, strings, floating-point numbers, Python objects, etc.).
2. DataFrame: A two-dimensional labeled data structure with columns of
potentially different types. Think of it like a spreadsheet or a SQL table.
File Reading and Writing
Pandas offers versatile functions to read and write data from various formats:
● CSV: pd.read_csv('file.csv')
● Excel: pd.read_excel('file.xlsx')
● SQL: pd.read_sql_query('SELECT * FROM table', connection)
● Writing: df.to_csv('output.csv'), df.to_excel('output.xlsx'),
df.to_excel('output.xlsx')
DataFrame and Series
● Creation:
● Indexing and Slicing:
Info and Shape
● Info: Provides a summary of the DataFrame (columns, data types,
non-null values).
● Shape: Returns the dimensions (rows, columns) of the DataFrame.

Duplicates and Drop


● Duplicates: df.duplicated() returns a boolean Series indicating
duplicate rows.
● Drop: df.drop_duplicates() removes duplicate rows.
● Drop Columns: df.drop(columns=['Age']) removes the 'Age' column.
Columns
● Renaming: df.rename(columns={'Name': 'First Name'})
● Adding: df['Salary'] = [50000, 60000, 70000]
● Deleting: del df['Salary']
NaN and Null Values
● Checking: df.isnull() returns a boolean DataFrame indicating missing
values.
● Dropping: df.dropna() drops rows or columns with missing values.
Imputation
● Filling NaN: df.fillna(0) replaces NaN values with 0.
● Advanced Imputation: Using mean, median, mode, or even machine
learning models to fill missing values.
Lambda Functions
● Applying Functions: df['Age'].apply(lambda x: x + 1) applies a lambda
function to each element in the 'Age' column.

spaCy 3 Introduction
spaCy 3 is a cutting-edge, open-source Python library for NLP. It's designed for
production use, offering speed, accuracy, and scalability. spaCy's key features
include:

● Industrial-Strength NLP: Handles real-world text data efficiently and


reliably.
● Pre-trained Models: Offers pre-trained models for various languages and
tasks, saving you time and resources.
● Customizable Pipelines: Lets you build tailored pipelines for your specific
NLP tasks.
● Modern Architecture: Employs transformer-based models, a revolution in
NLP performance.
spaCy 3 Tokenization
Tokenization is the process of breaking text down into individual words,
punctuation marks, or other meaningful units. spaCy 3 offers a sophisticated
tokenizer that handles various complexities in language:

● Customizable Rules: Allows you to tailor tokenization to your specific


requirements.
● Rule-Based Matching: Uses patterns to identify and segment tokens
effectively.
● Statistical Models: Employs machine learning models to improve
tokenization accuracy.
POS Tagging in spaCy 3
Part-of-speech (POS) tagging assigns grammatical tags (e.g., noun, verb, adjective)
to each token in a sentence. spaCy 3's POS tagger is highly accurate:

Visualizing Dependency Parsing with Displacy


Dependency parsing reveals the grammatical structure of a sentence, showing how
words relate to each other. spaCy's displacy tool visualizes dependency trees:

Visualizing Dependency Parsing with Displacy


Dependency parsing reveals the grammatical structure of a sentence, showing how
words relate to each other. spaCy's displacy tool visualizes dependency trees:

Stop Words in spaCy 3


Stop words are common words (e.g., "the," "and," "is") that are often removed from
text as they carry little meaning. spaCy provides a built-in list of stop words:

Lemmatization in spaCy 3
Lemmatization reduces words to their base or dictionary form (lemma). This helps
normalize text for analysis:
Stemming in NLTK - Lemmatization vs. Stemming
Both lemmatization and stemming reduce words to their roots, but they differ in
their approach:
● Lemmatization: Aims for the base dictionary form (lemma) of a word.
● Stemming: Employs heuristic rules to chop off affixes, sometimes resulting
in non-words.
Example:
● "better" (lemmatization) -> "good"
● "better" (stemming) -> "bett"
Word Frequency Counter
A word frequency counter helps analyze the distribution of words in a text. Here's
a simple example using spaCy 3:

Rule Based Matching in spaCy Part 1


spaCy's Matcher class is a core tool for rule-based matching. You create patterns
that describe the token attributes you want to find, and the Matcher identifies
matches in your text.

Rule Based Token Matching Examples Part 2


Here are more advanced token-matching patterns:
● Quantifiers:
○ {"OP": "?"}: Optional (zero or one time)
○ {"OP": "*"}: Zero or more times
○ {"OP": "+"}: One or more times
● Attributes:
○ IS_ALPHA, IS_DIGIT, LIKE_NUM, LIKE_EMAIL, etc.
● Custom Attributes:
○ Extend the Token class to add your own attributes.
Rule Based Phrase Matching in spaCy
The PhraseMatcher finds exact matches of multi-word phrases in your text. You
create a Doc object from your phrases and add it to the PhraseMatcher.
Rule Based Entity Matching in spaCy
The EntityRuler allows you to create custom entity recognizers based on patterns.
You define patterns that match specific entities, and spaCy will add them to the
Doc.ents property.
NER (Named Entity Recognition) in spaCy 3 Part 1 & 2
spaCy's built-in NER model is a powerful tool for identifying and classifying named
entities (people, organizations, locations, dates, etc.).

You can also train custom NER models with your own labeled data to improve
performance on specific domains.
Word to Vector (word2vec) and Sentence Similarity in spaCy
spaCy uses word embeddings (vector representations of words) to capture
semantic relationships between words. This enables tasks like:
● Semantic Similarity: Calculate how similar two words or sentences are in
meaning.
● Word Analogies: Find words that relate to each other in the same way as
other word pairs (e.g., "king" is to "man" as "queen" is to "woman").

Regular Expression Part 1 & 2


Regular expressions (regex) are a powerful tool for pattern matching in text. While
spaCy's Matcher is often easier to use for simple patterns, regex can be essential
for more complex tasks.

● working with text files :


● String Formatting
● Working with open() Files in write() Mode Part 1
● Working with open() Files in write() Mode Part 2
● Working with open() Files in write() Mode Part 3
● Read and Evaluate the Files
● Reading and Writing .CSV and .TSV Files with Pandas
● Reading and Writing .XLSX Files with Pandas
● Reading Files from URL Links
● Record the Audio and Convert to Text
● Convert Audio in Text Data
● Text to Speech Generation

String Formatting
String formatting is the process of inserting variables or expressions into strings to
create dynamic text. Python offers several formatting methods:
format() method:
% operator (older style):
Working with open() Files in write() Mode
The open() function provides a file object for reading, writing, or appending data to
files. In write() mode, the file is either created (if it doesn't exist) or overwritten.

The with statement ensures that the file is properly closed even if an error occurs.
Part 2: Appending
To add content without overwriting, use append mode ("a"):

Part 3: Error Handling


You can catch potential errors during file operations using try-except blocks:

Read and Evaluate the Files


To read a text file, open it in read mode ("r") and use methods like read(),
readline(), or readlines()

Reading and Writing CSV and TSV Files with Pandas


Pandas provides convenient functions for reading and writing structured data in
CSV (comma-separated values) and TSV (tab-separated values) formats:

Reading and Writing XLSX Files with Pandas


Pandas uses the openpyxl engine to work with Excel files

Reading Files from URL Links


You can read files directly from URLs using libraries like requests

Record Audio and Convert to Text (Speech-to-Text)


You can use libraries like SpeechRecognition

You might also like