0% found this document useful (0 votes)
4 views

Python - Learn Data Analytics Together's Group

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Python - Learn Data Analytics Together's Group

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Learn Data Analytics Together

Python - Learn Data Analytics Together's Group

Disclaimer
Leveraging insights gained from a remarkable Python challenge hosted by Eric in the
Learn Data Analytics Together Group, I've compiled the following information. A special
thanks to Eric & oducal for their meticulous proofreading support.

Compiler: gento

e
Proofreaders: Eric and oducal

ng
Full credit goes to Alex The Analyst and Corey Schafer, our dedicated instructors.

lle
Self-Study Data
Learn Data Analytics Together

ha
C
o n
th
Py
o
nt
ge

1
Learn Data Analytics Together

Introduction
Python is a high-level programming language known for its clear syntax, making it easy to learn
and widely used in fields like web development, data science, AI, and automation

Key Features of Python:

Easy to Learn and Read: Python's syntax is very close to natural language, making it easy
for beginners to approach.
Cross-Platform: Python can run on various operating systems such as Windows, macOS,
and Linux.

e
Rich Library Ecosystem: Python has a rich library ecosystem, supporting almost any task,

ng
from scientific computing to web development.
Open Source: Python is an open-source language with a large and strong development
community.

lle
Basic Example: Below is a simple example of how to print "Hello, World!" on the screen in
Python.
ha
C
n

Summary of Python Applications Across Various Fields:


o

1. Data Science:
th

Data Analysis: Widely used with libraries like Pandas, NumPy, and Scipy.
Py

Data Visualization: Tools like Matplotlib, Seaborn, and Plotly for creating charts.
Machine Learning: Libraries such as Scikit-learn, TensorFlow, and PyTorch are
essential for building and deploying models.
o

2. Artificial Intelligence:
nt

Natural Language Processing (NLP): Supports applications like chatbots with


libraries such as NLTK and SpaCy.
ge

Computer Vision: OpenCV and deep learning libraries help with image recognition
and facial recognition.
3. Web Development:
Frameworks: Django, Flask, and FastAPI enable rapid and secure web
development.
Backend Development: Commonly used for managing databases, handling HTTP
requests, and building APIs.
4. Automation:
Scripting: Used for automating tasks such as file management and software testing.

2
Learn Data Analytics Together

Web Scraping: Libraries like BeautifulSoup and Selenium allow data collection
from websites.
5. Game Development:
Pygame Library: Suitable for developing simple games and game development
tools.
Game Logic Development: Used for creating game logic, especially in indie or
educational games.
6. Finance:
Financial Analysis: Python is used to build financial models, forecasts, and risk
analysis.
Algorithmic Trading: Supports developing and testing automated trading

e
strategies.

ng
7. Internet of Things (IoT):
Microcontroller Programming: MicroPython and CircuitPython are used for

lle
programming IoT devices like Raspberry Pi.
8. Education:

ha
Learning to Code: Popular in schools due to its easy syntax and abundant
resources.
C
Developing Educational Applications: Used to create learning apps and
educational games.
n

9. Software Development:
o

Scripting Language: Creates command-line tools, automation software, and other


th

development tools.
Project Management: Used to develop project management tools, bug trackers,
Py

and CI/CD tools.


10. Research and Academia:
Data Analysis: Applied in scientific research for data analysis, simulation, and
o

visualization.
nt

Research Tools: Supports developing tools for research in various fields like
ge

biology, chemistry, and physics.

Summary: Python is a versatile tool with widespread applications across modern fields,
making it a top choice for many projects and applications due to its flexibility and strong
development community.

There are many ways to write Python programs using software such as:

1. Python IDE
2. Visual Studio Code
3. Jupyter notebook
4. ….

3
Learn Data Analytics Together

Online Compilers:

1. Datacamp
2. Google Colab
3. w3schools
4. …

Basic Concepts in Python


Python Variables

e
Python is fully object-oriented. Variables in Python do not need to be declared before use or

ng
have their type specified. Every variable is an object, created by assigning a value using the =
operator. Variable names must start with a letter (a-z, A-Z) or an underscore , and subsequent

lle
characters can be letters, numbers (0-9), or underscores.

Best Practices for Naming Variables:

ha
C
Tips for Naming Variables
n

Use clear and descriptive variable names that reflect their purpose.
o

Use (lowercase letters with underscores) for variable names.


th

Avoid short, meaningless variable names like , , unless necessary.


Do not use Python keywords as variable names.
Py

Use for class names (capitalize the first letter of each word).
Avoid special characters and numbers in variable names (do not start with a number).
Maintain consistency in naming conventions.
o

Avoid generic variable names like or .


nt

Use all-uppercase names with underscores for constants, like .


ge

4
Learn Data Analytics Together

Slicing Variables
Slicing in Python lets you extract parts of a sequence, like strings or lists, using indices
without changing the original data.

e
ng
lle
ha
C
n

Data Types
o
th

Numeric Types:
Py

Integer (int): Used to store whole numbers, without decimal points.


Floating point (float): Used to store real numbers, with decimal points.
Complex (complex): Used to store complex numbers, with both real and imaginary parts.
o
nt
ge

String Type ( str ):

A string is used to store sequences of characters, including letters, numbers, and symbols.
Strings are enclosed in either single quotes ' ' or double quotes " ".

5
Learn Data Analytics Together

Boolean Type ( bool ):

Logical values that can only be either True or False, commonly used in conditional
statements and loops.

List :

A list is a data structure that allows you to store an ordered collection of values that are

e
mutable (can be changed).

ng
lle
Tuple

ha
A tuple is similar to a list, but it cannot be changed once created. Tuples are ordered and
C
can hold different types of values.
o n
th

Dictionnary
Py

A dictionary is an unordered data structure that holds key-value pairs. Keys can be strings,
numbers, or tuples, while values can be of any data type.
o
nt

Set
ge

A set is an unordered data structure that does not contain duplicate elements and supports
set operations like union, intersection, and difference.

6
Learn Data Analytics Together

NoneType

NoneType represents an undefined or no value. It is a special data type in Python, and it


has only one value: None.

Operators
In Python, operators are special symbols that perform operations on variables and values.
Python supports several types of operators:

e
ng
Operator Name operation Example
+ Add x+y

lle
- Minus x-y
* Multiply x*y
/
%
Divide
Modulo
ha x/y
x%y
C
** Exponentiation x ** y
n

// Integer division x // y
o

Assignment Operators
th

Assignment Operators Example Meaning


Py

= x = 10 Assign the value 10 to the variable


x.
o

+= x += 3 x=x+3
nt

-= x -= 3 x=x-3
*= x *= 3 x=x*3
ge

/= x /= 3 x=x/3
%= x %= 3 x=x%3
//= x //= 3 x = x // 3
**= x **= 3 x = x ** 3
&= x &= 3 x=x& 3
^= x ^= 3 x=x^3
>>= x >>= 3 x = x >> 3
<<= x <<= 3 x = x << 3

7
Learn Data Analytics Together

Bitwise Operators

Operators Meaning Explanation


& AND Bitwise AND operation
OR Bitwise OR operation
^ XOR Bitwise XOR operation
~ NOT Bitwise NOT operation
<< Left shift Left shift, adding zeros to the right
>> Right shift Right shift, adding the value of the
leftmost bit to the left

e
ng
Types of Loops

lle
In Python, loops are used to repeat a block of code multiple times until a specific
condition is met.

If-Else
ha
C
o n
th

The else statement complements the if statement. An else statement contains a block of code
that will be executed if the if statement's condition is false. Here is the basic syntax:
Py
o
nt
ge

The elif statement (short for "else if") is a combination of else and if. It allows you to check
multiple expressions to determine if they are true and execute a block of code as soon as one
of the conditions is evaluated as true.

8
Learn Data Analytics Together

For

This for loop will iterate through each character in the string word and print each character.

e
Using for with range( ) : This for loop will iterate through values from 0 to 4 (a total of 5 values)

ng
and print each one.

lle
Tuple ha
C
o n
th

Dictionary
Py

This for loop will iterate through each element in the numbers tuple and print the value of
each element.
When iterating through a dictionary, this for loop will go through each key-value pair in the
o

student dictionary and print both the key and the corresponding value.
nt
ge

Using for to calculate the sum of numbers in a list :

9
Learn Data Analytics Together

While

The while loop repeatedly executes code while the condition is true and stops when it's false.

The program asks the user to enter a number. If it's even, it prints "This is an even number" and
continues. If it's odd, it prints "This is an odd number" and prompts for another input. The
program stops when the user enters 0.

e
ng
lle
ha
C
o n

A while True loop creates an infinite loop that continues until a break statement is encountered.
th

The code checks if the user enters 0; if so, the loop ends with break. If the number is even
Py

(number % 2 == 0), it prints "is even" and starts a new iteration with continue. If the number is
odd, it prints "is odd."

List Comprehension
o
nt

List comprehension is a concise way to create and transform lists in Python, allowing for filtering
and modifying elements from sequences or other iterables.
ge

Example 1: Filter strings containing the word "Choco" in a list.

Example 2: Create a list of squares of even numbers in the range from 0 to 9.

10
Learn Data Analytics Together

Project 1: Unit of Measurement Converter


Project
Your task in this project is to write Python code to convert between different units. The minimum
number of units you need to convert is 3, and the maximum is 6. For each additional unit beyond
3, you will earn 1 bonus point.

For example, converting between 4 units will earn you 2 bonus point, 5 units will earn you 3
bonus points, and 6 units will earn you 4 bonus points.

GUIDE:

e
1. Identify Units: Choose the units to convert between, such as meters, kilometers, and

ng
centimeters.
2. Write Conversion Functions: Create functions for converting between each pair of units.

lle
3. General Conversion Function: Combine individual functions into a general one that takes
the value, original unit, and target unit, and performs the conversion.

ha
4. User Interface: Create an interface for users to input values, original units, and target units.
5. Run and Display Results: The program will convert the input value based on the specified
C
units and display the result.
o n
th
Py
o
nt
ge

11
Learn Data Analytics Together

e
ng
lle
ha
C
o n
th
Py
o
nt
ge

Example: Input 1500 meters to kilometers returns 1.5 kilometers.


Summary:
The code converts between meters, kilometers, and centimeters using specific functions. It
uses a dictionary in the convert_units function to map conversions, making the code flexible
and easy to extend. The program interacts with users via input prompts and displays conversion
results. Use functions, dictionaries, and exception handling for flexible, extendable code.

12
Learn Data Analytics Together

Modules
In Python, built-in modules are libraries that are pre-installed, offering various useful functions
and utilities for common tasks without needing additional external packages. They help you
perform tasks such as data processing, file handling, time management, and more.

Some Popular Built-in Modules:


math

Provides basic mathematical functions such as calculating square roots, exponents, and

e
mathematical constants.

ng
lle
datetime

Provides classes for working with dates and times.


ha
C
o n
th

os
Py

Provides functions for interacting with the operating system, such as working with files and
directories.
o
nt
ge

sys

Provides access to specific Python system variables and functions.

13
Learn Data Analytics Together

json

Provides functions for working with JSON data.

re

Provides functions for working with regular expressions.

e
ng
random

lle
Provides functions for generating random numbers.

ha
C
collections
n

Provides additional data structures such as Counter, defaultdict, and namedtuple.


o
th
Py

Built-in Modules: These are libraries that come pre-installed in Python, requiring no additional
installation.
o
nt

Import Modules and Exploring The Standard Library


ge

Importing modules is the process of bringing libraries or collections of functions, classes, and
variables into your Python program to use their functionalities. This helps in organizing code and
reusing available libraries or custom modules.

Example: Using import with as

14
Learn Data Analytics Together

Example: Importing a Module and Using It

Example: Importing a Specific Part of a Module

e
ng
Summary:

lle
Importing Modules: Provides a way to use the functions and classes available in libraries
or modules.

ha
Standard Library: A collection of built-in modules in Python that help perform many
common tasks.
C
OS Module - Use Underlying Operating System
n

Functionality
o
th

The os module in Python allows interaction with the operating system, enabling tasks like file
Py

and directory management, system information retrieval, and running OS commands from within
a Python program.

Some key functions of the os module:


o
nt

getcwd( ) : Returns the current working directory path.


listdir(path) : Lists all files and directories in the specified directory.
ge

mkdir(path) : Creates a new directory at the specified path.


rmdir(path) : Removes an empty directory at the specified path.
remove(path) : Deletes a file at the specified path.
rename(src, dst) : Renames a file or directory from src to dst.
path.join(path, *paths) : Joins multiple path components into a valid path.
path.exists(path) : Checks if the specified path exists.

15
Learn Data Analytics Together

e
ng
lle
ha
C
o n

Creating a Module in Python


th
Py

Step 1: Create a Module

Create a .py file containing functions, variables, or classes.


Example: mymodule.py
o
nt
ge

Step 2: Using the Module

Import the module into another file and use it:

16
Learn Data Analytics Together

Specific Import:

Import specific elements from the module:

e
ng
lle
ha
C
o n
th
Py
o
nt
ge

17
Learn Data Analytics Together

Function in Python
Funtion
In Python, a function is defined using the def keyword, followed by the function name and
parentheses containing input parameters. The function body is indented relative to the def
keyword.

e
Simple Function Without Parameters:

ng
lle
Function with Parameters:
ha
C
o n

Function with Multiple Parameters:


th
Py
o
nt
ge

Function with Default Parameters:

18
Learn Data Analytics Together

Function with args (Variable Arguments):

The make_pizza function takes a size and an indefinite number of toppings (toppings). These
toppings are gathered into a tuple.

e
Function with kwargs (Variable Keyword Arguments):

ng
lle
ha
C
o n
th

The build_profile function takes two required parameters, first and last, along with an indefinite
number of additional key-value pairs (user_info). These key-value pairs are gathered into a
Py

dictionary.
o

Lambda Functions
nt

Lambda functions are concise, anonymous functions used for simple and quick operations,
ge

often when passing a function as an argument to other functions. They are defined using the
lambda keyword and can only contain a single expression.

Adding Two Numbers:

19
Learn Data Analytics Together

Using with map( ) :

Using with filter( ) :

e
ng
When to Use Lambda Functions:

When you need a concise, simple function for one-time use.

lle
When you want to write quick code without defining a full function with def.

Print Statement ha
C
In Python, the print( ) statement is used to output information to the screen. It's a fundamental
tool for displaying text or the value of an expression in the console. Its main purpose is to
n

provide information to the user or assist with debugging. When used inside a function, print only
o

shows output on the console and does not affect the program's flow or the function's return
th

value.
Py

Return Statement
o

In Python, the return keyword is used in functions to end the function and return a value. It is
nt

essential for determining the output of a function and controlling the flow of execution in a
program.
ge

Sum Function

20
Learn Data Analytics Together

Function Without a Return Value:

Function Returning Multiple Values:

e
ng
lle
Summary:

ha
return with an expression: Ends the function and returns the value of the expression.
return without an expression: Returns the default value None.
C
return multiple values: Can return multiple values separated by commas, which can be
assigned to multiple variables.
o n

Return is crucial in function design and controlling execution flow in Python, enabling clear
th

definition of function outputs and efficient data handling.


Py

Some built-in functions in Python


o

input( )
nt

Used to receive input from the user as a string.


ge

enumerate( )

Adds an index to an iterable (such as a list) and returns an enumerate object.

21
Learn Data Analytics Together

append( )

Used to add an element to the end of a list.

len( )

Returns the length of an object (the number of elements in a list, string, dictionary, etc.)

e
ng
range( )

lle
Returns a sequence of numbers, starting from 0 (by default) and ending before a specified
number
ha
C
n

slice( )
o
th

Creates a slice object used to extract a portion of a list, string, or tuple.


Py
o
nt

round( )
ge

Rounds a number to the nearest integer or to a specified number of decimal places.

format( )

Formats a string by inserting values into it.

22
Learn Data Analytics Together

strip( )

Removes whitespace or specified characters from the beginning and end of a string.

replace

Replaces parts of a string with another string.

e
ng
join( )

lle
Joins the elements in an iterable into a single string, using a specified separator.

ha
C
o n
th
Py
o
nt
ge

23
Learn Data Analytics Together

Variable Scope
Variable Scope in Python refers to the region of code where a variable can be accessed.
Python has a scope system to determine where a variable can be used. The variable scope can
be global or local. Common types of scope include:

1. Local Scope: Variables declared within a function can only be used inside that function.
2. Global Scope: Variables declared outside of all functions can be used anywhere in the
code.
3. Enclosing Scope: Variables in the scope of an outer function when there is a nested
function inside.

e
4. Built-in Scope: Contains names built into Python, such as functions like len(), print(), and

ng
keywords like True, False.

Python follows the LEGB rule to determine the order of variable lookup:

lle
Local: Local scope.
Enclosing: Enclosing scope (outer function).
Global: Global scope. ha
C
Built-in: Built-in scope.
o n
th
Py
o
nt
ge

Explanation:

x = " global ": This is a global variable, accessible from anywhere in the code.
x = " outer ": This is a local variable within the outer_function( ) . It can only be accessed
inside outer_function( ) and any nested functions within it.

24
Learn Data Analytics Together

x = " inner ": This is a local variable within the inner_function( ) , accessible only inside
inner_function( ) .

When running the above code, the result will be:

Inner function: inner


Outer function: outer
Global scope: global

e
ng
lle
ha
C
o n
th
Py
o
nt
ge

25
Learn Data Analytics Together

Project 2 - Calculator
Project #2 is to create a Calculator. Specific points for project 2 are as follows:

1 bonus point: Successfully create an input-output for the four basic operations: addition,
subtraction, multiplication, and division.

2 bonus points: Add the modulo (%) operation.

3 bonus points: Allow inputting calculations with 3 or more number pairs, e.g., aa + bb + cc.

4 bonus points: Group operations together, e.g., (A+b) x c.

e
5 bonus points: Include all the above bonus requirements and add continuous calculation

ng
functionality, e.g., A + B = AB, AB + C = ABC.
GUIDE: Function continuous_calculator( )

lle
Purpose: Allow users to continuously input mathematical expressions for calculation.
Steps:
Start with a greeting: "Calculator!". ha
Create a while True: loop to keep the program running until the user chooses to exit.
C
Display a menu with 3 options for the user:
Continue with the previous result.
n

Enter a new expression.


o

Exit the program.


th

Handle the user's choice:


Py

If continuing with the previous result, concatenate the previous result with the
new expression and compute.
If entering a new expression, start fresh and compute.
o

If exiting, break the loop and end the program.


nt

Check for and handle exit commands ("exit", "quit").


Call the calculate(expression) function to calculate and print the result.
ge

26
Learn Data Analytics Together

e
ng
lle
ha
C
o n
th
Py
o
nt
ge

27
Learn Data Analytics Together

SUMMARY:

Using eval( ) : Be cautious when using eval( ) as it can execute any Python code, which
may lead to security issues if not properly controlled.
Checking for division by zero: The code handles division by zero simply, but it might
require more thorough checks for more complex expressions.
State management: The variable result is used to store the result of the previous
calculation, allowing the user to continue calculations easily.
Simple user interface: The program provides a simple command-line interface, making

e
ng
lle
ha
C
o n
th
Py
o
nt
ge

28
Learn Data Analytics Together

I/O file and exception


I/O file
open( )
file_object = open('file_name', 'access_mode', buffering)
Mode Description
r Read-only mode.
r+ Read and write mode.

e
rb Open a file in binary read mode. The cursor is positioned at the

ng
beginning of the file.
rb+ Open a file for reading and writing in binary mode. The cursor is

lle
positioned at the beginning of the file.
r+b Open a file for reading and writing in binary mode. The cursor is

w ha
positioned at the beginning of the file.
Open a file for writing. If the file does not exist, it will be created; if the file
exists, it will be truncated and the old content will be overwritten.
C
w+ Open a file for reading and writing. If the file does not exist, it will be
created; if the file exists, it will be truncated and the old content will be
n

overwritten.
o

wb Open a file for writing in binary mode. If the file does not exist, it will be
th

created; if the file exists, it will be truncated and the old content will be
overwritten.
Py

wb+ Open a file for reading and writing in binary mode. If the file does not
exist, it will be created; if the file exists, it will be truncated and the old
content will be overwritten.
o

a Open a file in append mode. If the file already exists, the content will be
nt

added to the end of the file; if the file does not exist, a new file will be
created and the content will be written to it.
ge

a+ Open a file in read and append mode. If the file already exists, the
content will be added to the end of the file; if the file does not exist, a new
file will be created and the content will be written to it.
ab Open a file in binary append mode. If the file already exists, the content
will be added to the end of the file; if the file does not exist, a new file will
be created and the content will be written to it.
ab+ Open a file in read and append mode in binary format. If the file already
exists, the content will be added to the end of the file; if the file does not
exist, a new file will be created and the content will be written to it.
x Open a file in write mode with exclusive creation. If the file already exists,
an error will be raised; if not, a new file will be created and the content will

29
Learn Data Analytics Together

be written to it.
x+ Open a file in read and write mode with exclusive creation. If the file
already exists, an error will be raised; if not, a new file will be created and
the content will be written to it.
xb Open a file in binary write mode with exclusive creation. If the file already
exists, an error will be raised; if not, a new file will be created and the
content will be written to it.
xb+ Open a file in binary read and write mode with exclusive creation. If the
file already exists, an error will be raised; if not, a new file will be created
and the content will be written to it.
b Open a file in binary mode.

e
t Open a file in text mode (default).

ng
Output: file is opened successfully

lle
close( )

Method is provided by Python to close a file once all necessary operations have been
completed.
ha
C
o n
th
Py

After closing the file, the program cannot perform any operations on it. The file needs to be
properly closed. If any exception occurs while performing some operations on the file, the
program will terminate without closing the file.
o
nt

We should use the following method to address such issues:


ge

30
Learn Data Analytics Together

write( )

To write some text to a file, we need to open the file using the open method with one of the
following access modes:

w: This will overwrite the file if any existing file is present. The cursor is positioned at the
beginning of the file.
a: This will append to the existing file. The cursor is positioned at the end of the file. It
creates a new file if no file exists.

e
ng
lle
ha
C
Output: file2.txt
n

Python is the modern day language. It makes things so simple


o

It is the fastest growing programing language.


th
Py
o
nt
ge

Output: file2.txt

Python is the modern day language. It makes things so simple


It is the fastest growing programing language Python has an
easy
syntax and user-friendly interaction.

We can see that the content of the file has been modified. We opened the file in a specific
mode, and it has appended content to the existing file file2.txt.

31
Learn Data Analytics Together

read( )

To read a file using a Python script, Python provides the read( ) method. The read() method
reads a string from the file. It can read data in both text and binary formats.

Here, the count refers to the number of bytes read from the file starting from the beginning. If the
count is not specified, it will read the contents of the file until the end.

e
ng
lle
ha
C
Output:
o n

Python is the modern day language. It makes things so simple


th

It is the fastest growing programing language. Python has an


easy
Py

syntax and user-friendly interaction.

Reading a file using a for loop


o
nt
ge

Output: file2.txt

Python is the modern day language. It makes things so simple


It is the fastest growing programing language. Python has an
easy
syntax and user-friendly interaction.

32
Learn Data Analytics Together

Reading the lines of a file

Python facilitates reading individual lines of a file using the readline( ) method. The readline()
method reads lines of the file from the beginning; that is, if we use the readline( ) method
twice, we will get the first two lines of the file.

e
ng
lle
Output: file2.txt ha
C
Python is the modern day language. It makes things so simple
n

It is the fastest growing programing language. Python has an


o

easy
th

syntax and user-friendly interaction.


Py

We called the readline( ) function twice, which is why it read two lines from the file.

Python also provides the readlines( ) method, which is used for reading lines. It returns a list of
lines until it reaches the end of the file (EOF).
o
nt
ge

33
Learn Data Analytics Together

Output: file2.txt

Python is the modern day language. It makes things so simple


It is the fastest growing programing language. Python has an
easy
syntax and user-friendly interaction.

with( )

The with statement in Python ensures that a file is opened and automatically closed after the
code block is executed, helping manage resources and preventing errors like forgetting to

e
close the file.

ng
The advantage of using the with statement is that it guarantees the file will be closed regardless
of how the nested block exits.

lle
ha
C
o n

Creating a Custom Context Manager: You can create a custom context manager by using a
th

class with the _ _ enter _ _ ( ) and _ _ exit _ _( ) methods, or by using the @contextmanager
decorator from the contextlib module.
Py
o
nt
ge

Explanation:

The _ _ enter _ _( ) method is called at the start of the with block and can return a value (if
needed) to the variable used in the with statement.
The _ _ enxit _ _( ) method is called when the with block ends, and it handles cleanup
tasks. If an exception occurs, the parameters exc_type, exc_value, and traceback will

34
Learn Data Analytics Together

contain information about the exception.


contextlib.contextmanager

e
ng
Explanation:

lle
The @contextmanager decorator from the contextlib module provides a simpler way to
create a context manager by using a function with the yield keyword.

ha
The code before yield is executed when the with block begins, and the code after yield is
executed when the with block ends.
C
Exception
o n

Error and exception handling is the process of managing unexpected situations that may
th

occur during program execution, such as division by zero, accessing a non-existent file, or
passing an invalid value to a function. Python provides mechanisms to detect and handle these
Py

errors through try, except, else, and finally blocks.

try : This block contains code that might cause an error.


o

except `except`: This block executes if an error occurs in the try block.
nt

else `else`: This block executes if no error occurs in the try block.
finally : This block always executes, regardless of whether an error occurs or not. It's
ge

typically used for releasing resources like closing files or connections.

35
Learn Data Analytics Together

I/O operations and exception handling in Python. It explains how to open, read, write, and
close files using different access modes and highlights the importance of closing files properly

e
to avoid resource issues. The chapter also delves into exception handling, teaching how to use

ng
try, except, else, and finally blocks to manage errors effectively. This chapter equips learners
with essential skills for working with files and handling exceptions in Python, ensuring more
robust and reliable code.

lle
ha
C
o n
th
Py
o
nt
ge

36
Learn Data Analytics Together

CSV Module
The csv module in Python provides tools for reading and writing files in Comma-Separated
Values (CSV) format. CSV is a simple text format used to store tabular data, where each line is
a record and fields are separated by commas or other characters like semicolons, tabs, etc.
The csv module supports various CSV formats by customizing parameters such as delimiter,
quotechar, and more. It provides utility classes and functions to work with CSV data efficiently
and flexibly.

Read a CSV file using csv.reader

e
ng
Assume we have a file named data.csv with the following content:

Name Age City

lle
Alice 30 New York
Bob 25 Los Angeles
Charlie 35 ha
Chicago
C
o n
th
Py
o
nt
ge

Output:

Headers: ['Name', 'Age', 'City']


Row: ['Alice', '30', 'New York']
Row: ['Bob', '25', 'Los Angeles']
Row: ['Charlie', '35', 'Chicago']

37
Learn Data Analytics Together

Write a CSV file using csv.writer

e
ng
lle
ha
C
n

Read a CSV file using csv.DictReader


o
th

Python code to read a CSV file and convert each row into a dictionary
Py
o
nt
ge

{'Name': 'Alice', 'Age': '30', 'City': 'New York'},


{'Name': 'Bob', 'Age': '25', 'City': 'Los Angeles'},
{'Name': 'Charlie', 'Age': '35', 'City': 'Chicago'}

Explain:

DictReader(file) reads the CSV file and uses the first row as keys for the dictionaries.
Each subsequent row is converted into a dictionary with the corresponding keys.
38
Learn Data Analytics Together

Access specific value:

Output:

Alice is 30 years old and lives in New York


Bob is 25 years old and lives in Los Angeles.

e
Charlie is 35 years old and lives in Chicago.

ng
lle
Write a CSV file using csv.DictWriter

ha
Python code to write data from dictionaries to a CSV file:
C
o n
th
Py
o
nt
ge

39
Learn Data Analytics Together

Output: The file dict_output.csv will contain the following content

Grace, 27, Houston


Henry, 31, Phoenix
Isabella, 24, Philadelphia

Explanation:

DictWriter(file, fieldnames=fieldnames) creates a writer object aware of the fields


(headers).
writeheader( ) writes the header row to the file.

e
writerow(row) writes each dictionary to the file as a CSV row.

ng
Use custom delimiter and quotechar

lle
Read and write CSV files with custom delimiter and quotechar:

Python code:
ha
C
o n
th
Py
o
nt
ge

40
Learn Data Analytics Together

Output: Content of the file custom_output.csv

Product, Price, Quantity


Apple, 1.00, 50
Banana, 0.50, 100
Cherry, 2.00, 200

Explanation:

When creating the writer, we specify


delimiter = ';'
to use a semicolon as the delimiter between fields.

e
quotechar = ' " ' specifies using double quotes to enclose values when necessary.

ng
quoting=csv.QUOTE_MINIMAL only encloses values with quotechar when necessary
(e.g., when a value contains the delimiter character).

lle
When reading the file, we use the same configuration to ensure that the data is parsed
correctly.

Handling Large CSV Files with a High Number of


ha
C
Rows
n

Read one row at a time to save memory:


o
th
Py
o
nt
ge

Explanation:

Read one row at a time from the CSV file without loading the entire content into memory,
which is suitable for handling large files.
The
process_row(row)
function represents any processing you want to perform on each data row.

41
Learn Data Analytics Together

Handling Exceptions When Working with CSV Files


Python code:

e
ng
lle
Explanation:

Use a try-except block to catch and handle potential exceptions.

ha
FileNotFoundError is caught when the file does not exist.
Exception is caught for errors that occur during the process of reading the CSV file.
C
o n
th
Py
o
nt
ge

42
Learn Data Analytics Together

Project 3 - Automatic File Sorter Project


Organize files into separate folders.

3 file types: 1 bonus point

4 file types: 2 bonus points

6 file types: 3 bonus points

In simpler terms:

e
You will earn bonus points based on the number of different file types you can correctly sort into
separate folders. The more file types you can handle, the higher your score.

ng
If you can sort 3 different types of files: You'll earn 1 bonus point.

lle
If you can sort 4 different types of files: You'll earn 2 bonus points.

ha
If you can sort 6 different types of files: You'll earn 3 bonus points.

GUIDE
C
Define the source directory and destination directories
o n
th
Py
o
nt

First, you need to specify the path to the directory containing the files to be sorted and create a
dictionary to map the destination directories for each file type:
ge

source_directory
destination_directory

43
Learn Data Analytics Together

Define File Extensions

Next, you need to define file extensions for each file type:

file_extensions

e
Create Destination Directories if They Do Not Exist

ng
Before moving files, you need to ensure that the destination directories are created:

lle
Write the File Sorting Function ha
C
This function will perform the file movement based on their extensions:
o n
th
Py
o
nt
ge

The os.listdir(source_directory) function returns a list of names of items in the


source_directory .
Use a for loop to iterate through each file or directory name in this list.

44
Learn Data Analytics Together

The os.path.join(source_directory, file_name) function combines source_directory


with file_name to create an absolute path to the file.
The os.path.isfile(file_path) function returns True if file_path is a file, and False if it
is a directory.
The os.path.splitext(file_name) function splits the file name into a name and an
extension, returning a tuple.
The extension is extracted from the tuple and converted to lowercase using the lower( )
method for easier comparison.
The file_extensions.items( ) function returns file type and extension list pairs from the
file_extensions dictionary.
The for( ) loop iterates through each file type and its corresponding list of extensions.

e
Check if the extension is in the list of extensions for the current file type.

ng
The destination directory corresponding to the file type is determined through the
destination_directories dictionary.

lle
The shutil.move function moves the file from its current location to the destination
directory.

ha
Print a message indicating that the file has been successfully moved.
Exit the loop once the appropriate file type is found to avoid checking additional types.
C
The final step is to call the function.
n

SUMMARY:
o

This script automates file organization by categorizing files based on their extensions (e.g.,
th

images, documents, CSV files) and moving them to designated folders. It’s adaptable for
various scenarios like document management, media sorting, or preparing files for backup,
Py

making it a useful tool for automating file organization tasks.


o
nt
ge

45
Learn Data Analytics Together

Virtual Environment
A Virtual Environment in Python is a tool that allows you to create an isolated workspace
separate from the global Python system. It helps you manage libraries and dependencies for
each project individually, avoiding conflicts between different versions of libraries that various
projects might require.

Why use a Virtual Environment?

Manage dependencies: Easily manage packages and their versions for each project.
Avoid conflicts: Prevent version conflicts between different projects.

e
Easy deployment: Ensures that development and production environments use the

ng
same library versions, simplifying project deployment and sharing.

lle
Create a Virtual Environment
To create a virtual environment in Python, you can use the
venv ha
C
module (available in Python 3.3 and later).

Run the following command in the Terminal:


o n
th

Explanation:
Py

python -m venv myenv : This command creates a directory named myenv containing
an independent copy of Python and package management tools.
o
nt

Activate the Virtual Environment


ge

After creating the virtual environment, you need to activate it to start working within it.

On Windows:

On macOS and Linux:

46
Learn Data Analytics Together

Explanation:

After activation, you will see the name of the virtual environment (e.g., (myenv) appear at
the beginning of the command line, indicating that you are working in that environment.

Install Packages in the Virtual Environment


Once the virtual environment is activated, you can install packages using pip , and they will
only be installed within this environment.

e
ng
Explanation:

lle
This command installs the requests library into the current virtual environment.

Install Packages from requirements.txt ha


C
Assume you have a requirements.txt file in your project directory. This file contains a list of
required packages and their versions.
o n

For example, the contents of requirements.txt might be:


th

requests==2.25.1
Py

beautifulsoup4==4.9.3
pandas==1.2.3
o

To install all the packages listed in the requirements.txt file, you use the command:
nt
ge

List Installed Packages


To view the list of packages installed in the virtual environment, you can use:

Deactivate the Virtual Environment

47
Learn Data Analytics Together

When you finish your work and want to exit the virtual environment, you can deactivate it by:

Explanation:

After deactivation, you will return to the system's default Python environment.

Delete the Virtual Environment


To delete a virtual environment, you simply delete the directory containing it:

e
ng
Window

lle
macOS/Linux:

ha
C
Explanation:
o n

This command deletes the entire myenv directory and all its contents.
th
Py
o
nt
ge

48
Learn Data Analytics Together

Web Scraping
Web Scraping is the process of automatically collecting data from websites. In this process, a
program or script sends a request to a website, retrieves the HTML content of the page, and
then extracts the necessary data from this content.

Beautiful Soup
Beautiful Soup is a Python library used to parse HTML and XML documents. It creates a
hierarchical tree from a web page, allowing you to easily search, navigate, and modify elements

e
on the page.

ng
Extracting the Page Title:

lle
soup.title.text : Retrieves the content within the title tag and returns it as a string.

Extracting the Main Heading ( h1 ):

ha
soup.h1.text : Retrieves the content within the h1 tag and returns it as a string.
C
Extracting All Paragraphs ( p ):
n

soup.find_all('p') : Finds all p tags in the HTML page and returns a list of p tag objects.
o

p.text : Retrieves the text content inside each p tag in the list.
th

Extracting Links from a Tags:


Py

soup.find_all('a') : Finds all a tags in the document and returns a list of a tag objects.
link['href'] : Retrieves the value of the href attribute in an a tag.
o

link.text : Retrieves the text content inside an a tag.


nt

Extracting All Headings from h2 Tags:


ge

soup.find_all('a h2') : Finds all h2 tags in the document and returns a list of h2 tag
objects.
heading.text : Retrieves the text content inside each h2 tag in the list.

Extracting Content from Tags with Specific Class or ID:

soup.find('div', id='main') : Finds a div tag with the id = "main" attribute and returns
this tag object.
soup.find_all('div', class_='sub') : Finds all div tags with the class="sub" attribute
and returns a list of these tag objects.
div.text : Retrieves the text content inside each div tag found.

49
Learn Data Analytics Together

Extracting Data from an HTML Table:

soup.find('table') : Finds a table tag in the document and returns this tag object.
table.find_all('tr') : Finds all tr rows in the table and returns a list of tr tag objects.
row.find_all('td') : Finds all td data cells in each row and returns a list of td tag objects.
cell.text : Retrieves the text content inside each td tag in the list.
Extracting Content from Nested Tags:
soup.find('span').text : Retrieves the text content inside the span tag, including the
content of nested tags like b and i .

e
ng
lle
ha
C
o n
th
Py
o
nt
ge

50
Learn Data Analytics Together

Output:

Title: Sample Page


Heading: Welcome to Web Scraping
Link: [https://example.com](https://example.com/)

Requests:
Requests is a Python library used to send HTTP requests in a simple way. Requests allows
you to send GET, POST, PUT, DELETE requests, and more.

e
ng
Sending a GET Request to a Website:

requests.get(url) : Sends a GET request to the specified URL and returns a Response

lle
object.

Retrieving the Content of a Web Page:


ha
response.text : Returns the content of the web page as a string.
C
response.content : Returns the content of the web page as bytes (binary data).
n

Checking the HTTP Status Code:


o

response.status_code : Returns the HTTP status code of the response (e.g., 200 for
th

success, 404 for not found).


Py

response.ok : Returns True if the HTTP status code indicates a successful request (i.e.,
in the range of 200 to 299).

Sending a POST Request with Data:


o
nt

requests.post(url, data={'key': 'value'}) : Sends a POST request to the specified URL


with accompanying data as a dictionary.
ge

Sending a Request with Query Parameters:

requests.get(url, params={'key': 'value'}) : Sends a GET request with query


parameters appended to the URL (e.g., ?key=value).

Retrieving Response Headers:

response.headers : Returns a dictionary containing all HTTP headers of the response.


response.headers['Content-Type'] : Returns the value of the Content-Type header,
indicating the type of content in the response.

51
Learn Data Analytics Together

Sending a Request with Custom Headers:

requests.get(url, headers={'User-Agent': 'my-app'}) : Sends a GET request with


custom HTTP headers, such as User-Agent.

Downloading a File from a Web Page:

requests.get(url, stream=True) : Sends a GET request to download a file and uses


stream=True to stream data in chunks.
response.iter_content(chunk_size=1024) : Iterates through the response content in
chunks, with each chunk being 1024 bytes.

e
Sending a Request with JSON Data:

ng
requests.post(url, json={'key': 'value'}) : Sends a POST request with JSON data
provided as a Python dictionary. The data is automatically converted to JSON format.

lle
Handling Connection Errors:

ha
requests.exceptions.RequestException : Catches all types of exceptions related to
HTTP requests, including connection errors, timeouts, etc.
C
o n
th
Py
o
nt

Output:
ge

52
Learn Data Analytics Together

e
ng
lle
ha
C
o n
th
Py
o
nt
ge

53
Learn Data Analytics Together

Selenium
Selenium is a powerful library that allows you to automate web browsers. Selenium is used to
interact with web pages, such as clicking buttons, filling out forms, and scraping data from
websites requiring dynamic interactions (JavaScript).

Initializing a Browser and Opening a Web Page:

Chrome( ) : Initializes a Chrome browser session.


driver.get(url) : Opens the web page specified by the URL in the browser.

Find an Element on the Page:

e
ng
driver.find_element(By.ID, 'element_id') : Find an element on the page by its ID.
driver.find_element(By.NAME, 'element_name') : Find an element on the page by its

lle
NAME.
driver.find_element(By.XPATH, '//tagname[@attribute="value"]') : Find an element
on the page by XPATH.

Get the Text of an Element:


ha
C
element.text : Returns the text content of the found element.
n

Enter Text into a Text Field:


o
th

element.send_keys('text') : Enter the string `text` into the selected text field.
Py

Click a Button or Link:

element.click( ) : Click on the selected element, such as a button or link.


o

Get an Element’s Attribute:


nt

element.get_attribute('attribute_name') : Get the value of the attribute_name


ge

attribute of the element, e.g., href , src , value , etc.

Wait for an Element to Appear on the Page:

WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID,
'element_id'))) : Wait up to 10 seconds for an element with a specific ID to appear on the
page.

Switch to an iFrame:

driver.switch_to.frame('frame_name') : Switch the control to the specified iFrame.

54
Learn Data Analytics Together

driver.switch_to.default_content() : Switch the control back to the main content (exit


the iFrame).

Scroll the Page to a Specific Element:

element.location_once_scrolled_into_view : Scroll the page down so that this


element appears in the browser's viewable area.

Close the Browser:

driver.quit( ) : Close the entire browser and end the session.

Take a Screenshot of the Page:

e
ng
driver.save_screenshot('screenshot.png') : Take a screenshot of the current page
and save it as screenshot.png .

lle
Handle Pop-up Windows:

ha
driver.switch_to.alert : Switch the control to a pop-up alert.
alert.accept( ) : Accept the pop-up alert.
C
alert.dismiss( ) : Dismiss the pop-up alert.

Interact with Dropdown Menus:


o n

Select(element).select_by_visible_text('Option Text') : Select an option from the


th

dropdown menu based on the visible text.


Select(element).select_by_value('value') : Select an option from the dropdown menu
Py

based on the value of the option.

Handle Multiple Tabs or Windows:


o

driver.window_handles : Returns a list of the currently open windows or tabs.


nt

driver.switch_to.window(driver.window_handles[1]) : Switch the control to the


ge

second window or tab.

Get the Current URL:

driver.current_url : Returns the URL of the current page the browser is accessing.

55
Learn Data Analytics Together

Get the Page Title:

driver.title : Returns the title of the current page.

e
ng
lle
ha
C
o n

Output
th

Example Domain
Py
o
nt
ge

56
Learn Data Analytics Together

Project 4 - Automated Crypto Web Scraper


Your task is to write Python code to automatically scrape all the information for Bitcoin. The
minimum requirements are outlined below.

1 point: Meet the minimum requirements. (In the photo below)

3 points: Retrieve all Bitcoin information, including price, 1h %, 24h %, 7d %, market cap, and
24h volume.

6 points: Gather the same information as in point 3 for three different cryptocurrencies: Bitcoin,
Ethereum, and BNB.

e
ng
Please use https://coinmarketcap.com/

Does this explanation clarify the requirements? Please let me know if you have any questions.

lle
Please ensure you read the Project Rules carefully.

GUIDE:
ha
C
1. Send a GET Request to the Website:
URL: "https://coinmarketcap.com/"
n

2. Parse the HTML Content:


o

soup = BeautifulSoup(response.text, 'html.parser') : Parse the HTML content of


the page.
th

3. Find the Row Containing Cryptocurrency Information:


Py

row = soup.find('a', href=f'/currencies/{crypto_name}/') : Find the link to the


page of the specified cryptocurrency ( crypto_name ) and store the result.
parent = row.find_parent('tr') : If the link is found, find its parent element ( tr tag),
o

which contains all information about the cryptocurrency.


nt

4. Extract Data:
Price:Extract the cryptocurrency price from the specific div tag.
ge

Change in 1h, 24h, 7d:Extract all price changes over 1 hour, 24 hours, and 7 days.
Market Cap:Extract the market cap value.
24h Trading Volume:Extract the 24-hour trading volume value.
Timestamp:Get the current timestamp.
5. Return the Results:
Return a dictionary containing all extracted information.
6. Case of Not Found:
If information is not found, return None .
7. Cryptocurrency List ( crypto_list )

57
Learn Data Analytics Together

crypto_list = ['bitcoin', 'ethereum', 'bnb'] : List of cryptocurrencies you want to get


data for.

8. Create a DataFrame

Iterate Through Each Cryptocurrency in the List:


for stt, crypto_name in enumerate(crypto_list, start=1) : Use enumerate to
loop through the cryptocurrency list while providing an index ( stt ).
Call the get_crypto_data Function:
result = get_crypto_data(crypto_name) : Call the function to get data for each
cryptocurrency.

e
Update the DataFrame:
If data is available ( result is not None ), append it to the data list.

ng
result['STT'] = stt : Update the index ( STT ) in the returned result.
Create DataFrame from the Data List:

lle
df = pd.DataFrame(data) : Convert the data list into a DataFrame df .
Print the DataFrame:

ha
print(df.to_string(index=False)) : Print the DataFrame to the screen without
showing the index column.
C
1. Results
n

- When running the code, the program will access the CoinMarketCap page, retrieve
o

information about the cryptocurrencies in the


th

crypto_list
, and organize this information into a DataFrame. The DataFrame will include columns:
Py

STT, Cryptocurrency Name, Price, Change 1h, Change 24h, Change 7d, Market Cap, 24h
Trading Volume, and Timestamp.
o
nt
ge

58
Learn Data Analytics Together

e
ng
lle
ha
C
o n
th
Py
o
nt
ge

59
Learn Data Analytics Together

Ouput CSV:

STT,Crypto Name,Price,1h %,24h %,7d %,Market Cap,24h


Volume,Timestamp

1,Bitcoin,$61,698.87,0.09%,2.33%,6.55%,$1.22T,$28,781,554,627,2
024-08-23 22:46:03.599016

2,Ethereum,$2,672.56,0.16%,2.81%,3.94%,$321.55B,$12,642,405,175
,2024-08-23 22:46:03.780685

e
3,Bnb,$581.44,0.08%,0.32%,13.34%,$84.83B,$1,910,870,318,2024-

ng
08-23 22:46:03.956293

lle
SUMMARY:
This code scrapes cryptocurrency data from CoinMarketCap, extracting prices, percentage

ha
changes, market capitalization, and 24-hour trading volume for cryptocurrencies like Bitcoin and
Ethereum. The data is organized into a table with timestamps. The method can be adapted for
other websites by adjusting the URL and HTML parsing logic, making it versatile for various web
C
scraping tasks.
o n
th
Py
o
nt
ge

60
Learn Data Analytics Together

Regular Expression
A Regular Expression (RegEx) is a special string of characters used to define a pattern for
searching, matching, and manipulating text. In Python, the re library provides powerful tools for
working with Regular Expressions.

RegEx is widely used in many applications, such as validating email formats, searching for
substrings within text, replacing parts of text, and more.

Basic Components of Regular Expression:

e
Literal Characters: Matches the exact character. For example, 'a' will match 'a' .

ng
Dot (.): Matches any single character except for a newline character ( \n ).
Caret (^): Matches the position at the start of the string.

lle
Dollar Sign ($): Matches the position at the end of the string.
Asterisk (*): Matches the preceding character or pattern 0 or more times.

ha
Plus (+): Matches the preceding character or pattern 1 or more times.
Question Mark (?): Matches the preceding character or pattern 0 or 1 time.
C
Square Brackets ([]): Define a set of characters to match:
[abc] : Matches any single character within this set (a, b, or c).
n

abc : Matches any single character not in this set (not a, b, or c). The ^ inside
o

square brackets negates the character set.


th

[a-z] : Matches any lowercase letter from a to z.


Py

[A-Z] : Matches any uppercase letter from A to Z.


[0-9] : Matches any digit from 0 to 9.
[a-zA-Z] : Matches any letter, both lowercase and uppercase.
o

Curly Braces ({}): Specify the exact number of repetitions.


nt
ge

Special Characters in RegEx:


\d : Matches any digit. Equivalent to [0-9] .
\D : Matches any character that is not a digit. Equivalent to [^0-9] .
\w : Matches any word character (letters, digits, or underscore). Equivalent to [a-zA-Z0-
9_] .
\W : Matches any character that is not a word character. Equivalent to [^a-zA-Z0-9_] .
\s : Matches any whitespace character (space, tab, newline, etc.).
\S : Matches any character that is not a whitespace character.
\b : Matches a word boundary (typically a space).

61
Learn Data Analytics Together

\B : Matches a position where there is no word boundary (typically a non-space


character).
\Z : Matches the end of the string.

Functions in the re module:


group( ) : Returns the matched string.
start( ) : Returns the starting position of the matched string.
end( ) : Returns the ending position of the matched string.
span( ) : Returns a tuple containing the start and end positions of the matched string.

e
groups( ) : Returns a tuple containing all captured groups in the matched string.

ng
group(index) : Returns the captured group at the specified position.
groupdict( ) : Returns a dictionary containing captured groups with group names as

lle
keys.
expand(template) : Returns the matched string with any backreferences replaced by the
specified template.

ha
C
Modifiers (Flags) in the re module:
n

re.IGNORECASE or re.I : Ignores case when matching characters. For example, /abc/i
o

will match "abc", "AbC", "ABC", etc.


th

re.MULTILINE or re.M : Enables multi-line mode. Changes the behavior of ^ and $ to


match the start and end of each line, rather than the start and end of the whole string.
Py

re.DOTALL or re.S : Makes the dot (.) match any character, including newline characters
( \n ).
re.VERBOSE or re.X : Allows the use of whitespace and comments in regular
o

expression patterns to increase readability. Whitespace and comments are ignored


nt

unless they are within square brackets or escaped.


re.ASCII or re.a : Restricts character matching to ASCII. Overrides Unicode for
ge

character classes like \w , \W , \b , etc.


re.UNICODE or re.U : Enables Unicode matching. By default, \w , \W , \b , etc., use
Unicode properties for matching characters and word boundaries.
re.DEBUG : Prints debugging information about the regular expression pattern
compilation.

To use Regular Expressions in Python, you need to import the re library. You can then use
functions like math( ) , search( ) , findall( ) , and sub( ) to work with patterns.

62
Learn Data Analytics Together

Main Functions in re:


re.match(pattern, string) : Checks if the pattern matches the beginning of the string.
re.search(pattern, string) : Searches for the pattern within the string, returning the first
match found.
re.findall(pattern, string) : Returns a list of all matches of the pattern in the string.
re.sub(pattern, repl, string) : Replaces all occurrences of the pattern in the string with
another string.

Example of Regular Expressions in Python

e
ng
lle
ha
C
Explanation:
n

^[a-zA-Z0-9_.+-]+ : Matches the start of the string, followed by one or more letters,
o

numbers, dots, underscores, plus signs, or hyphens.


th

@[a-zA-Z0-9-]+ : Matches the @ symbol followed by one or more letters, numbers, or


hyphens.
Py

\.[a-zA-Z0-9-.]+$ : Matches a dot followed by one or more letters, numbers, dots, or


hyphens, and then ensures the end of the string.
o

Find All Numbers in a String:


nt
ge

Explanation:

\d+ : Matches one or more digits. The findall( ) function finds all occurrences of numbers
in the string and returns them as a list.

63
Learn Data Analytics Together

Replace All Whitespace with Underscores:

Explanation:

\s : Matches any whitespace character (spaces, tabs, etc.).


re.sub( ) : Replaces all whitespace characters in the string with underscores.

e
ng
Split a String by Commas:

lle
ha
C
Explanation:
n

,\s* : Matches a comma followed by any number of whitespace characters (zero or more).
o

re.split( ) : Splits the string into a list of elements based on the pattern.
th

SUMMARY:
Py

Regular Expression (Regex) is a powerful tool used for searching, replacing, validating, and
manipulating text, often applied in tasks such as text search and replace, data validation (like
emails or phone numbers), extracting specific patterns from strings, parsing complex text, and
o

web scraping to retrieve data from websites.


nt
ge

64
Learn Data Analytics Together

Project 5:Web Scraping + Regular Expression


Focus on extracting key information such as:

Job title

Company name

Job location

Job description

e
Date posted

ng
Job link

Use regular expressions to filter the job listings based on certain criteria. For example:

lle
Jobs that require specific programming languages (e.g., Python, JavaScript).

ha
Jobs that include certain keywords (e.g., "remote", "full-time").
C
Exclude jobs that contain unwanted terms (e.g., "internship", "part-time").

Store the filtered job listings in a CSV file or display them in a readable format.
o n

Bonus points:
th

Pagination: Handle pagination to scrape multiple pages of job listings. (+1)


Py

Scheduling: Set up your script to run periodically to scrape and update the job listings daily or
weekly. (+3)
o

Notification System: Send an email notification with the latest job listings that match your
nt

criteria. (+5)
ge

GUIDE
1/ Libraries Used

csv: For working with CSV files, helping to store the collected job data.
selenium: Used for automating web browsers to fetch HTML content from websites.
BeautifulSoup: Parses HTML and extracts specific elements from the webpage source.
time and sleep: Used to pause the program for a period, ensuring the webpage is fully
loaded before proceeding with further tasks.
re: Uses Regular Expressions to search and match text patterns in the job data.
smtplib and getpass: Used to send email notifications about newly found jobs.
schedule: Helps schedule function execution at fixed intervals.

65
Learn Data Analytics Together

2/ The scrape_jobs( ) Function

e
Purpose:

ng
This function is designed to collect information about jobs on the VietnamWorks website, filter
the jobs according to specific requirements, save the results to a CSV file, and send email

lle
notifications.
Steps:

Initialize the Chrome browser:


ha
driver = webdriver.Chrome( ) : Launches the Chrome browser using Selenium.
C
o n
th

Create a list to store job information:


filtered_jobs = [ ] : This list will contain jobs that have been filtered according to the
Py

requirements.
o
nt
ge

Loop through pages to fetch job information:


for page in range(1, 21) : Iterates through the first 20 pages of search results on
VietnamWorks.
get(f'https://www.vietnamworks.com/viec-lam?q=it&page={page}') : Opens each
search results page.
sleep(3) : Waits for 3 seconds to ensure the webpage is fully loaded before
proceeding.

66
Learn Data Analytics Together

Parse HTML and extract job information:


soup = BeautifulSoup(driver.page_source, 'html.parser') : Parses the HTML
content of the webpage to find necessary elements.
jobdata = soup.find('div', class='block-job-list') : Finds the container that holds the
job listings.

e
jobitems = job_data.find_all('div', class='sc-fwwElh iAsyDt') : Retrieves the list

ng
of jobs from the container.

lle
ha
C
o n
th
Py

Extract detailed job information:


job_title, company_name, location, date_posted, salary, skills, job_link :
o

Extracts specific details such as job title, company name, location, date posted,
salary, skills, and job link.
nt
ge

67
Learn Data Analytics Together

Filter jobs by programming languages:


required_languages = ['Python', 'JavaScript'] : List of required programming
languages.

e
if not found_languages : Skips jobs that do not contain the required languages.

ng
lle
ha
C
o n
th
Py

Store information in the list:


If a job meets the filtering criteria, its information will be added to the filtered_jobs
o

list.
nt
ge

68
Learn Data Analytics Together

Save job data to a CSV file:


csv_file = "vietnamworks_job_listings.csv" : Specifies the CSV file name to
store the data.
csv_columns = ['Job Title', 'Company Name', 'Location', 'Date Posted',
'Salary', 'Skills', 'Job Link'] : Defines the columns in the CSV file.
with open(csv_file, mode='w', newline='', encoding='utf-8-sig') as file : Opens
the CSV file for writing data.
writer = csv.DictWriter(file, fieldnames=csv_columns) : Uses DictWriter to
write rows of data from the filtered_jobs list to the CSV file.

e
ng
lle
ha
C
o n
th
Py
o
nt
ge

Prepare email content:


email_content : This string contains the entire email content, including details about
the found jobs.

69
Learn Data Analytics Together

Send email notifications:


SMTP('smtp.gmail.com', 587) : Sets up the connection with Gmail's SMTP server.
starttls( ) : Activates security for the SMTP session.
login(sender_email, password) : Logs into the Gmail account with the email

e
address and password.

ng
sendmail(sender_email, receiver_email, message) : Sends an email to the
recipient with the prepared content.

lle
ha
C
o n
th
Py
o
nt
ge

3/Running the scrape_jobs( ) Function

scrape_jobs( ) : This function will be called when the code is executed, helping to collect
job data, save it to a CSV file, and send email notifications.

70
Learn Data Analytics Together

If you want to automatically run this function daily at a fixed time, you can use the schedule
library:

every().day.at("10:00").do(scrape_jobs) : Schedules the scrape_jobs( ) function to


run daily at 10:00 AM.
while True: Keeps the program running continuously to check and perform scheduled
tasks.

e
ng
lle
SUMMARY: This script automates the scraping of job listings from VietnamWorks, filters them

ha
based on programming languages (Python, JavaScript), saves the data to a CSV file, and
sends an email with the job details. It runs daily at 10:00 AM using Selenium, BeautifulSoup,
and smtplib
C
. The approach can be adapted to other websites and use cases, such as monitoring product
prices, aggregating news, or tracking social media trends, making it versatile for various web
n

scraping and automation tasks.


o
th
Py
o
nt
ge

71

You might also like