Python - Learn Data Analytics Together's Group
Python - Learn Data Analytics Together's Group
Disclaimer
Leveraging insights gained from a remarkable Python challenge hosted by Eric in the
Learn Data Analytics Together Group, I've compiled the following information. A special
thanks to Eric & oducal for their meticulous proofreading support.
Compiler: gento
e
Proofreaders: Eric and oducal
ng
Full credit goes to Alex The Analyst and Corey Schafer, our dedicated instructors.
lle
Self-Study Data
Learn Data Analytics Together
ha
C
o n
th
Py
o
nt
ge
1
Learn Data Analytics Together
Introduction
Python is a high-level programming language known for its clear syntax, making it easy to learn
and widely used in fields like web development, data science, AI, and automation
Easy to Learn and Read: Python's syntax is very close to natural language, making it easy
for beginners to approach.
Cross-Platform: Python can run on various operating systems such as Windows, macOS,
and Linux.
e
Rich Library Ecosystem: Python has a rich library ecosystem, supporting almost any task,
ng
from scientific computing to web development.
Open Source: Python is an open-source language with a large and strong development
community.
lle
Basic Example: Below is a simple example of how to print "Hello, World!" on the screen in
Python.
ha
C
n
1. Data Science:
th
Data Analysis: Widely used with libraries like Pandas, NumPy, and Scipy.
Py
Data Visualization: Tools like Matplotlib, Seaborn, and Plotly for creating charts.
Machine Learning: Libraries such as Scikit-learn, TensorFlow, and PyTorch are
essential for building and deploying models.
o
2. Artificial Intelligence:
nt
Computer Vision: OpenCV and deep learning libraries help with image recognition
and facial recognition.
3. Web Development:
Frameworks: Django, Flask, and FastAPI enable rapid and secure web
development.
Backend Development: Commonly used for managing databases, handling HTTP
requests, and building APIs.
4. Automation:
Scripting: Used for automating tasks such as file management and software testing.
2
Learn Data Analytics Together
Web Scraping: Libraries like BeautifulSoup and Selenium allow data collection
from websites.
5. Game Development:
Pygame Library: Suitable for developing simple games and game development
tools.
Game Logic Development: Used for creating game logic, especially in indie or
educational games.
6. Finance:
Financial Analysis: Python is used to build financial models, forecasts, and risk
analysis.
Algorithmic Trading: Supports developing and testing automated trading
e
strategies.
ng
7. Internet of Things (IoT):
Microcontroller Programming: MicroPython and CircuitPython are used for
lle
programming IoT devices like Raspberry Pi.
8. Education:
ha
Learning to Code: Popular in schools due to its easy syntax and abundant
resources.
C
Developing Educational Applications: Used to create learning apps and
educational games.
n
9. Software Development:
o
development tools.
Project Management: Used to develop project management tools, bug trackers,
Py
visualization.
nt
Research Tools: Supports developing tools for research in various fields like
ge
Summary: Python is a versatile tool with widespread applications across modern fields,
making it a top choice for many projects and applications due to its flexibility and strong
development community.
There are many ways to write Python programs using software such as:
1. Python IDE
2. Visual Studio Code
3. Jupyter notebook
4. ….
3
Learn Data Analytics Together
Online Compilers:
1. Datacamp
2. Google Colab
3. w3schools
4. …
e
Python is fully object-oriented. Variables in Python do not need to be declared before use or
ng
have their type specified. Every variable is an object, created by assigning a value using the =
operator. Variable names must start with a letter (a-z, A-Z) or an underscore , and subsequent
lle
characters can be letters, numbers (0-9), or underscores.
ha
C
Tips for Naming Variables
n
Use clear and descriptive variable names that reflect their purpose.
o
Use for class names (capitalize the first letter of each word).
Avoid special characters and numbers in variable names (do not start with a number).
Maintain consistency in naming conventions.
o
4
Learn Data Analytics Together
Slicing Variables
Slicing in Python lets you extract parts of a sequence, like strings or lists, using indices
without changing the original data.
e
ng
lle
ha
C
n
Data Types
o
th
Numeric Types:
Py
A string is used to store sequences of characters, including letters, numbers, and symbols.
Strings are enclosed in either single quotes ' ' or double quotes " ".
5
Learn Data Analytics Together
Logical values that can only be either True or False, commonly used in conditional
statements and loops.
List :
A list is a data structure that allows you to store an ordered collection of values that are
e
mutable (can be changed).
ng
lle
Tuple
ha
A tuple is similar to a list, but it cannot be changed once created. Tuples are ordered and
C
can hold different types of values.
o n
th
Dictionnary
Py
A dictionary is an unordered data structure that holds key-value pairs. Keys can be strings,
numbers, or tuples, while values can be of any data type.
o
nt
Set
ge
A set is an unordered data structure that does not contain duplicate elements and supports
set operations like union, intersection, and difference.
6
Learn Data Analytics Together
NoneType
Operators
In Python, operators are special symbols that perform operations on variables and values.
Python supports several types of operators:
e
ng
Operator Name operation Example
+ Add x+y
lle
- Minus x-y
* Multiply x*y
/
%
Divide
Modulo
ha x/y
x%y
C
** Exponentiation x ** y
n
// Integer division x // y
o
Assignment Operators
th
+= x += 3 x=x+3
nt
-= x -= 3 x=x-3
*= x *= 3 x=x*3
ge
/= x /= 3 x=x/3
%= x %= 3 x=x%3
//= x //= 3 x = x // 3
**= x **= 3 x = x ** 3
&= x &= 3 x=x& 3
^= x ^= 3 x=x^3
>>= x >>= 3 x = x >> 3
<<= x <<= 3 x = x << 3
7
Learn Data Analytics Together
Bitwise Operators
e
ng
Types of Loops
lle
In Python, loops are used to repeat a block of code multiple times until a specific
condition is met.
If-Else
ha
C
o n
th
The else statement complements the if statement. An else statement contains a block of code
that will be executed if the if statement's condition is false. Here is the basic syntax:
Py
o
nt
ge
The elif statement (short for "else if") is a combination of else and if. It allows you to check
multiple expressions to determine if they are true and execute a block of code as soon as one
of the conditions is evaluated as true.
8
Learn Data Analytics Together
For
This for loop will iterate through each character in the string word and print each character.
e
Using for with range( ) : This for loop will iterate through values from 0 to 4 (a total of 5 values)
ng
and print each one.
lle
Tuple ha
C
o n
th
Dictionary
Py
This for loop will iterate through each element in the numbers tuple and print the value of
each element.
When iterating through a dictionary, this for loop will go through each key-value pair in the
o
student dictionary and print both the key and the corresponding value.
nt
ge
9
Learn Data Analytics Together
While
The while loop repeatedly executes code while the condition is true and stops when it's false.
The program asks the user to enter a number. If it's even, it prints "This is an even number" and
continues. If it's odd, it prints "This is an odd number" and prompts for another input. The
program stops when the user enters 0.
e
ng
lle
ha
C
o n
A while True loop creates an infinite loop that continues until a break statement is encountered.
th
The code checks if the user enters 0; if so, the loop ends with break. If the number is even
Py
(number % 2 == 0), it prints "is even" and starts a new iteration with continue. If the number is
odd, it prints "is odd."
List Comprehension
o
nt
List comprehension is a concise way to create and transform lists in Python, allowing for filtering
and modifying elements from sequences or other iterables.
ge
10
Learn Data Analytics Together
For example, converting between 4 units will earn you 2 bonus point, 5 units will earn you 3
bonus points, and 6 units will earn you 4 bonus points.
GUIDE:
e
1. Identify Units: Choose the units to convert between, such as meters, kilometers, and
ng
centimeters.
2. Write Conversion Functions: Create functions for converting between each pair of units.
lle
3. General Conversion Function: Combine individual functions into a general one that takes
the value, original unit, and target unit, and performs the conversion.
ha
4. User Interface: Create an interface for users to input values, original units, and target units.
5. Run and Display Results: The program will convert the input value based on the specified
C
units and display the result.
o n
th
Py
o
nt
ge
11
Learn Data Analytics Together
e
ng
lle
ha
C
o n
th
Py
o
nt
ge
12
Learn Data Analytics Together
Modules
In Python, built-in modules are libraries that are pre-installed, offering various useful functions
and utilities for common tasks without needing additional external packages. They help you
perform tasks such as data processing, file handling, time management, and more.
Provides basic mathematical functions such as calculating square roots, exponents, and
e
mathematical constants.
ng
lle
datetime
os
Py
Provides functions for interacting with the operating system, such as working with files and
directories.
o
nt
ge
sys
13
Learn Data Analytics Together
json
re
e
ng
random
lle
Provides functions for generating random numbers.
ha
C
collections
n
Built-in Modules: These are libraries that come pre-installed in Python, requiring no additional
installation.
o
nt
Importing modules is the process of bringing libraries or collections of functions, classes, and
variables into your Python program to use their functionalities. This helps in organizing code and
reusing available libraries or custom modules.
14
Learn Data Analytics Together
e
ng
Summary:
lle
Importing Modules: Provides a way to use the functions and classes available in libraries
or modules.
ha
Standard Library: A collection of built-in modules in Python that help perform many
common tasks.
C
OS Module - Use Underlying Operating System
n
Functionality
o
th
The os module in Python allows interaction with the operating system, enabling tasks like file
Py
and directory management, system information retrieval, and running OS commands from within
a Python program.
15
Learn Data Analytics Together
e
ng
lle
ha
C
o n
16
Learn Data Analytics Together
Specific Import:
e
ng
lle
ha
C
o n
th
Py
o
nt
ge
17
Learn Data Analytics Together
Function in Python
Funtion
In Python, a function is defined using the def keyword, followed by the function name and
parentheses containing input parameters. The function body is indented relative to the def
keyword.
e
Simple Function Without Parameters:
ng
lle
Function with Parameters:
ha
C
o n
18
Learn Data Analytics Together
The make_pizza function takes a size and an indefinite number of toppings (toppings). These
toppings are gathered into a tuple.
e
Function with kwargs (Variable Keyword Arguments):
ng
lle
ha
C
o n
th
The build_profile function takes two required parameters, first and last, along with an indefinite
number of additional key-value pairs (user_info). These key-value pairs are gathered into a
Py
dictionary.
o
Lambda Functions
nt
Lambda functions are concise, anonymous functions used for simple and quick operations,
ge
often when passing a function as an argument to other functions. They are defined using the
lambda keyword and can only contain a single expression.
19
Learn Data Analytics Together
e
ng
When to Use Lambda Functions:
lle
When you want to write quick code without defining a full function with def.
Print Statement ha
C
In Python, the print( ) statement is used to output information to the screen. It's a fundamental
tool for displaying text or the value of an expression in the console. Its main purpose is to
n
provide information to the user or assist with debugging. When used inside a function, print only
o
shows output on the console and does not affect the program's flow or the function's return
th
value.
Py
Return Statement
o
In Python, the return keyword is used in functions to end the function and return a value. It is
nt
essential for determining the output of a function and controlling the flow of execution in a
program.
ge
Sum Function
20
Learn Data Analytics Together
e
ng
lle
Summary:
ha
return with an expression: Ends the function and returns the value of the expression.
return without an expression: Returns the default value None.
C
return multiple values: Can return multiple values separated by commas, which can be
assigned to multiple variables.
o n
Return is crucial in function design and controlling execution flow in Python, enabling clear
th
input( )
nt
enumerate( )
21
Learn Data Analytics Together
append( )
len( )
Returns the length of an object (the number of elements in a list, string, dictionary, etc.)
e
ng
range( )
lle
Returns a sequence of numbers, starting from 0 (by default) and ending before a specified
number
ha
C
n
slice( )
o
th
round( )
ge
format( )
22
Learn Data Analytics Together
strip( )
Removes whitespace or specified characters from the beginning and end of a string.
replace
e
ng
join( )
lle
Joins the elements in an iterable into a single string, using a specified separator.
ha
C
o n
th
Py
o
nt
ge
23
Learn Data Analytics Together
Variable Scope
Variable Scope in Python refers to the region of code where a variable can be accessed.
Python has a scope system to determine where a variable can be used. The variable scope can
be global or local. Common types of scope include:
1. Local Scope: Variables declared within a function can only be used inside that function.
2. Global Scope: Variables declared outside of all functions can be used anywhere in the
code.
3. Enclosing Scope: Variables in the scope of an outer function when there is a nested
function inside.
e
4. Built-in Scope: Contains names built into Python, such as functions like len(), print(), and
ng
keywords like True, False.
Python follows the LEGB rule to determine the order of variable lookup:
lle
Local: Local scope.
Enclosing: Enclosing scope (outer function).
Global: Global scope. ha
C
Built-in: Built-in scope.
o n
th
Py
o
nt
ge
Explanation:
x = " global ": This is a global variable, accessible from anywhere in the code.
x = " outer ": This is a local variable within the outer_function( ) . It can only be accessed
inside outer_function( ) and any nested functions within it.
24
Learn Data Analytics Together
x = " inner ": This is a local variable within the inner_function( ) , accessible only inside
inner_function( ) .
e
ng
lle
ha
C
o n
th
Py
o
nt
ge
25
Learn Data Analytics Together
Project 2 - Calculator
Project #2 is to create a Calculator. Specific points for project 2 are as follows:
1 bonus point: Successfully create an input-output for the four basic operations: addition,
subtraction, multiplication, and division.
3 bonus points: Allow inputting calculations with 3 or more number pairs, e.g., aa + bb + cc.
e
5 bonus points: Include all the above bonus requirements and add continuous calculation
ng
functionality, e.g., A + B = AB, AB + C = ABC.
GUIDE: Function continuous_calculator( )
lle
Purpose: Allow users to continuously input mathematical expressions for calculation.
Steps:
Start with a greeting: "Calculator!". ha
Create a while True: loop to keep the program running until the user chooses to exit.
C
Display a menu with 3 options for the user:
Continue with the previous result.
n
If continuing with the previous result, concatenate the previous result with the
new expression and compute.
If entering a new expression, start fresh and compute.
o
26
Learn Data Analytics Together
e
ng
lle
ha
C
o n
th
Py
o
nt
ge
27
Learn Data Analytics Together
SUMMARY:
Using eval( ) : Be cautious when using eval( ) as it can execute any Python code, which
may lead to security issues if not properly controlled.
Checking for division by zero: The code handles division by zero simply, but it might
require more thorough checks for more complex expressions.
State management: The variable result is used to store the result of the previous
calculation, allowing the user to continue calculations easily.
Simple user interface: The program provides a simple command-line interface, making
e
ng
lle
ha
C
o n
th
Py
o
nt
ge
28
Learn Data Analytics Together
e
rb Open a file in binary read mode. The cursor is positioned at the
ng
beginning of the file.
rb+ Open a file for reading and writing in binary mode. The cursor is
lle
positioned at the beginning of the file.
r+b Open a file for reading and writing in binary mode. The cursor is
w ha
positioned at the beginning of the file.
Open a file for writing. If the file does not exist, it will be created; if the file
exists, it will be truncated and the old content will be overwritten.
C
w+ Open a file for reading and writing. If the file does not exist, it will be
created; if the file exists, it will be truncated and the old content will be
n
overwritten.
o
wb Open a file for writing in binary mode. If the file does not exist, it will be
th
created; if the file exists, it will be truncated and the old content will be
overwritten.
Py
wb+ Open a file for reading and writing in binary mode. If the file does not
exist, it will be created; if the file exists, it will be truncated and the old
content will be overwritten.
o
a Open a file in append mode. If the file already exists, the content will be
nt
added to the end of the file; if the file does not exist, a new file will be
created and the content will be written to it.
ge
a+ Open a file in read and append mode. If the file already exists, the
content will be added to the end of the file; if the file does not exist, a new
file will be created and the content will be written to it.
ab Open a file in binary append mode. If the file already exists, the content
will be added to the end of the file; if the file does not exist, a new file will
be created and the content will be written to it.
ab+ Open a file in read and append mode in binary format. If the file already
exists, the content will be added to the end of the file; if the file does not
exist, a new file will be created and the content will be written to it.
x Open a file in write mode with exclusive creation. If the file already exists,
an error will be raised; if not, a new file will be created and the content will
29
Learn Data Analytics Together
be written to it.
x+ Open a file in read and write mode with exclusive creation. If the file
already exists, an error will be raised; if not, a new file will be created and
the content will be written to it.
xb Open a file in binary write mode with exclusive creation. If the file already
exists, an error will be raised; if not, a new file will be created and the
content will be written to it.
xb+ Open a file in binary read and write mode with exclusive creation. If the
file already exists, an error will be raised; if not, a new file will be created
and the content will be written to it.
b Open a file in binary mode.
e
t Open a file in text mode (default).
ng
Output: file is opened successfully
lle
close( )
Method is provided by Python to close a file once all necessary operations have been
completed.
ha
C
o n
th
Py
After closing the file, the program cannot perform any operations on it. The file needs to be
properly closed. If any exception occurs while performing some operations on the file, the
program will terminate without closing the file.
o
nt
30
Learn Data Analytics Together
write( )
To write some text to a file, we need to open the file using the open method with one of the
following access modes:
w: This will overwrite the file if any existing file is present. The cursor is positioned at the
beginning of the file.
a: This will append to the existing file. The cursor is positioned at the end of the file. It
creates a new file if no file exists.
e
ng
lle
ha
C
Output: file2.txt
n
Output: file2.txt
We can see that the content of the file has been modified. We opened the file in a specific
mode, and it has appended content to the existing file file2.txt.
31
Learn Data Analytics Together
read( )
To read a file using a Python script, Python provides the read( ) method. The read() method
reads a string from the file. It can read data in both text and binary formats.
Here, the count refers to the number of bytes read from the file starting from the beginning. If the
count is not specified, it will read the contents of the file until the end.
e
ng
lle
ha
C
Output:
o n
Output: file2.txt
32
Learn Data Analytics Together
Python facilitates reading individual lines of a file using the readline( ) method. The readline()
method reads lines of the file from the beginning; that is, if we use the readline( ) method
twice, we will get the first two lines of the file.
e
ng
lle
Output: file2.txt ha
C
Python is the modern day language. It makes things so simple
n
easy
th
We called the readline( ) function twice, which is why it read two lines from the file.
Python also provides the readlines( ) method, which is used for reading lines. It returns a list of
lines until it reaches the end of the file (EOF).
o
nt
ge
33
Learn Data Analytics Together
Output: file2.txt
with( )
The with statement in Python ensures that a file is opened and automatically closed after the
code block is executed, helping manage resources and preventing errors like forgetting to
e
close the file.
ng
The advantage of using the with statement is that it guarantees the file will be closed regardless
of how the nested block exits.
lle
ha
C
o n
Creating a Custom Context Manager: You can create a custom context manager by using a
th
class with the _ _ enter _ _ ( ) and _ _ exit _ _( ) methods, or by using the @contextmanager
decorator from the contextlib module.
Py
o
nt
ge
Explanation:
The _ _ enter _ _( ) method is called at the start of the with block and can return a value (if
needed) to the variable used in the with statement.
The _ _ enxit _ _( ) method is called when the with block ends, and it handles cleanup
tasks. If an exception occurs, the parameters exc_type, exc_value, and traceback will
34
Learn Data Analytics Together
e
ng
Explanation:
lle
The @contextmanager decorator from the contextlib module provides a simpler way to
create a context manager by using a function with the yield keyword.
ha
The code before yield is executed when the with block begins, and the code after yield is
executed when the with block ends.
C
Exception
o n
Error and exception handling is the process of managing unexpected situations that may
th
occur during program execution, such as division by zero, accessing a non-existent file, or
passing an invalid value to a function. Python provides mechanisms to detect and handle these
Py
except `except`: This block executes if an error occurs in the try block.
nt
else `else`: This block executes if no error occurs in the try block.
finally : This block always executes, regardless of whether an error occurs or not. It's
ge
35
Learn Data Analytics Together
I/O operations and exception handling in Python. It explains how to open, read, write, and
close files using different access modes and highlights the importance of closing files properly
e
to avoid resource issues. The chapter also delves into exception handling, teaching how to use
ng
try, except, else, and finally blocks to manage errors effectively. This chapter equips learners
with essential skills for working with files and handling exceptions in Python, ensuring more
robust and reliable code.
lle
ha
C
o n
th
Py
o
nt
ge
36
Learn Data Analytics Together
CSV Module
The csv module in Python provides tools for reading and writing files in Comma-Separated
Values (CSV) format. CSV is a simple text format used to store tabular data, where each line is
a record and fields are separated by commas or other characters like semicolons, tabs, etc.
The csv module supports various CSV formats by customizing parameters such as delimiter,
quotechar, and more. It provides utility classes and functions to work with CSV data efficiently
and flexibly.
e
ng
Assume we have a file named data.csv with the following content:
lle
Alice 30 New York
Bob 25 Los Angeles
Charlie 35 ha
Chicago
C
o n
th
Py
o
nt
ge
Output:
37
Learn Data Analytics Together
e
ng
lle
ha
C
n
Python code to read a CSV file and convert each row into a dictionary
Py
o
nt
ge
Explain:
DictReader(file) reads the CSV file and uses the first row as keys for the dictionaries.
Each subsequent row is converted into a dictionary with the corresponding keys.
38
Learn Data Analytics Together
Output:
e
Charlie is 35 years old and lives in Chicago.
ng
lle
Write a CSV file using csv.DictWriter
ha
Python code to write data from dictionaries to a CSV file:
C
o n
th
Py
o
nt
ge
39
Learn Data Analytics Together
Explanation:
e
writerow(row) writes each dictionary to the file as a CSV row.
ng
Use custom delimiter and quotechar
lle
Read and write CSV files with custom delimiter and quotechar:
Python code:
ha
C
o n
th
Py
o
nt
ge
40
Learn Data Analytics Together
Explanation:
e
quotechar = ' " ' specifies using double quotes to enclose values when necessary.
ng
quoting=csv.QUOTE_MINIMAL only encloses values with quotechar when necessary
(e.g., when a value contains the delimiter character).
lle
When reading the file, we use the same configuration to ensure that the data is parsed
correctly.
Explanation:
Read one row at a time from the CSV file without loading the entire content into memory,
which is suitable for handling large files.
The
process_row(row)
function represents any processing you want to perform on each data row.
41
Learn Data Analytics Together
e
ng
lle
Explanation:
ha
FileNotFoundError is caught when the file does not exist.
Exception is caught for errors that occur during the process of reading the CSV file.
C
o n
th
Py
o
nt
ge
42
Learn Data Analytics Together
In simpler terms:
e
You will earn bonus points based on the number of different file types you can correctly sort into
separate folders. The more file types you can handle, the higher your score.
ng
If you can sort 3 different types of files: You'll earn 1 bonus point.
lle
If you can sort 4 different types of files: You'll earn 2 bonus points.
ha
If you can sort 6 different types of files: You'll earn 3 bonus points.
GUIDE
C
Define the source directory and destination directories
o n
th
Py
o
nt
First, you need to specify the path to the directory containing the files to be sorted and create a
dictionary to map the destination directories for each file type:
ge
source_directory
destination_directory
43
Learn Data Analytics Together
Next, you need to define file extensions for each file type:
file_extensions
e
Create Destination Directories if They Do Not Exist
ng
Before moving files, you need to ensure that the destination directories are created:
lle
Write the File Sorting Function ha
C
This function will perform the file movement based on their extensions:
o n
th
Py
o
nt
ge
44
Learn Data Analytics Together
e
Check if the extension is in the list of extensions for the current file type.
ng
The destination directory corresponding to the file type is determined through the
destination_directories dictionary.
lle
The shutil.move function moves the file from its current location to the destination
directory.
ha
Print a message indicating that the file has been successfully moved.
Exit the loop once the appropriate file type is found to avoid checking additional types.
C
The final step is to call the function.
n
SUMMARY:
o
This script automates file organization by categorizing files based on their extensions (e.g.,
th
images, documents, CSV files) and moving them to designated folders. It’s adaptable for
various scenarios like document management, media sorting, or preparing files for backup,
Py
45
Learn Data Analytics Together
Virtual Environment
A Virtual Environment in Python is a tool that allows you to create an isolated workspace
separate from the global Python system. It helps you manage libraries and dependencies for
each project individually, avoiding conflicts between different versions of libraries that various
projects might require.
Manage dependencies: Easily manage packages and their versions for each project.
Avoid conflicts: Prevent version conflicts between different projects.
e
Easy deployment: Ensures that development and production environments use the
ng
same library versions, simplifying project deployment and sharing.
lle
Create a Virtual Environment
To create a virtual environment in Python, you can use the
venv ha
C
module (available in Python 3.3 and later).
Explanation:
Py
python -m venv myenv : This command creates a directory named myenv containing
an independent copy of Python and package management tools.
o
nt
After creating the virtual environment, you need to activate it to start working within it.
On Windows:
46
Learn Data Analytics Together
Explanation:
After activation, you will see the name of the virtual environment (e.g., (myenv) appear at
the beginning of the command line, indicating that you are working in that environment.
e
ng
Explanation:
lle
This command installs the requests library into the current virtual environment.
requests==2.25.1
Py
beautifulsoup4==4.9.3
pandas==1.2.3
o
To install all the packages listed in the requirements.txt file, you use the command:
nt
ge
47
Learn Data Analytics Together
When you finish your work and want to exit the virtual environment, you can deactivate it by:
Explanation:
After deactivation, you will return to the system's default Python environment.
e
ng
Window
lle
macOS/Linux:
ha
C
Explanation:
o n
This command deletes the entire myenv directory and all its contents.
th
Py
o
nt
ge
48
Learn Data Analytics Together
Web Scraping
Web Scraping is the process of automatically collecting data from websites. In this process, a
program or script sends a request to a website, retrieves the HTML content of the page, and
then extracts the necessary data from this content.
Beautiful Soup
Beautiful Soup is a Python library used to parse HTML and XML documents. It creates a
hierarchical tree from a web page, allowing you to easily search, navigate, and modify elements
e
on the page.
ng
Extracting the Page Title:
lle
soup.title.text : Retrieves the content within the title tag and returns it as a string.
ha
soup.h1.text : Retrieves the content within the h1 tag and returns it as a string.
C
Extracting All Paragraphs ( p ):
n
soup.find_all('p') : Finds all p tags in the HTML page and returns a list of p tag objects.
o
p.text : Retrieves the text content inside each p tag in the list.
th
soup.find_all('a') : Finds all a tags in the document and returns a list of a tag objects.
link['href'] : Retrieves the value of the href attribute in an a tag.
o
soup.find_all('a h2') : Finds all h2 tags in the document and returns a list of h2 tag
objects.
heading.text : Retrieves the text content inside each h2 tag in the list.
soup.find('div', id='main') : Finds a div tag with the id = "main" attribute and returns
this tag object.
soup.find_all('div', class_='sub') : Finds all div tags with the class="sub" attribute
and returns a list of these tag objects.
div.text : Retrieves the text content inside each div tag found.
49
Learn Data Analytics Together
soup.find('table') : Finds a table tag in the document and returns this tag object.
table.find_all('tr') : Finds all tr rows in the table and returns a list of tr tag objects.
row.find_all('td') : Finds all td data cells in each row and returns a list of td tag objects.
cell.text : Retrieves the text content inside each td tag in the list.
Extracting Content from Nested Tags:
soup.find('span').text : Retrieves the text content inside the span tag, including the
content of nested tags like b and i .
e
ng
lle
ha
C
o n
th
Py
o
nt
ge
50
Learn Data Analytics Together
Output:
Requests:
Requests is a Python library used to send HTTP requests in a simple way. Requests allows
you to send GET, POST, PUT, DELETE requests, and more.
e
ng
Sending a GET Request to a Website:
requests.get(url) : Sends a GET request to the specified URL and returns a Response
lle
object.
response.status_code : Returns the HTTP status code of the response (e.g., 200 for
th
response.ok : Returns True if the HTTP status code indicates a successful request (i.e.,
in the range of 200 to 299).
51
Learn Data Analytics Together
e
Sending a Request with JSON Data:
ng
requests.post(url, json={'key': 'value'}) : Sends a POST request with JSON data
provided as a Python dictionary. The data is automatically converted to JSON format.
lle
Handling Connection Errors:
ha
requests.exceptions.RequestException : Catches all types of exceptions related to
HTTP requests, including connection errors, timeouts, etc.
C
o n
th
Py
o
nt
Output:
ge
52
Learn Data Analytics Together
e
ng
lle
ha
C
o n
th
Py
o
nt
ge
53
Learn Data Analytics Together
Selenium
Selenium is a powerful library that allows you to automate web browsers. Selenium is used to
interact with web pages, such as clicking buttons, filling out forms, and scraping data from
websites requiring dynamic interactions (JavaScript).
e
ng
driver.find_element(By.ID, 'element_id') : Find an element on the page by its ID.
driver.find_element(By.NAME, 'element_name') : Find an element on the page by its
lle
NAME.
driver.find_element(By.XPATH, '//tagname[@attribute="value"]') : Find an element
on the page by XPATH.
element.send_keys('text') : Enter the string `text` into the selected text field.
Py
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID,
'element_id'))) : Wait up to 10 seconds for an element with a specific ID to appear on the
page.
Switch to an iFrame:
54
Learn Data Analytics Together
e
ng
driver.save_screenshot('screenshot.png') : Take a screenshot of the current page
and save it as screenshot.png .
lle
Handle Pop-up Windows:
ha
driver.switch_to.alert : Switch the control to a pop-up alert.
alert.accept( ) : Accept the pop-up alert.
C
alert.dismiss( ) : Dismiss the pop-up alert.
driver.current_url : Returns the URL of the current page the browser is accessing.
55
Learn Data Analytics Together
e
ng
lle
ha
C
o n
Output
th
Example Domain
Py
o
nt
ge
56
Learn Data Analytics Together
3 points: Retrieve all Bitcoin information, including price, 1h %, 24h %, 7d %, market cap, and
24h volume.
6 points: Gather the same information as in point 3 for three different cryptocurrencies: Bitcoin,
Ethereum, and BNB.
e
ng
Please use https://coinmarketcap.com/
Does this explanation clarify the requirements? Please let me know if you have any questions.
lle
Please ensure you read the Project Rules carefully.
GUIDE:
ha
C
1. Send a GET Request to the Website:
URL: "https://coinmarketcap.com/"
n
4. Extract Data:
Price:Extract the cryptocurrency price from the specific div tag.
ge
Change in 1h, 24h, 7d:Extract all price changes over 1 hour, 24 hours, and 7 days.
Market Cap:Extract the market cap value.
24h Trading Volume:Extract the 24-hour trading volume value.
Timestamp:Get the current timestamp.
5. Return the Results:
Return a dictionary containing all extracted information.
6. Case of Not Found:
If information is not found, return None .
7. Cryptocurrency List ( crypto_list )
57
Learn Data Analytics Together
8. Create a DataFrame
e
Update the DataFrame:
If data is available ( result is not None ), append it to the data list.
ng
result['STT'] = stt : Update the index ( STT ) in the returned result.
Create DataFrame from the Data List:
lle
df = pd.DataFrame(data) : Convert the data list into a DataFrame df .
Print the DataFrame:
ha
print(df.to_string(index=False)) : Print the DataFrame to the screen without
showing the index column.
C
1. Results
n
- When running the code, the program will access the CoinMarketCap page, retrieve
o
crypto_list
, and organize this information into a DataFrame. The DataFrame will include columns:
Py
STT, Cryptocurrency Name, Price, Change 1h, Change 24h, Change 7d, Market Cap, 24h
Trading Volume, and Timestamp.
o
nt
ge
58
Learn Data Analytics Together
e
ng
lle
ha
C
o n
th
Py
o
nt
ge
59
Learn Data Analytics Together
Ouput CSV:
1,Bitcoin,$61,698.87,0.09%,2.33%,6.55%,$1.22T,$28,781,554,627,2
024-08-23 22:46:03.599016
2,Ethereum,$2,672.56,0.16%,2.81%,3.94%,$321.55B,$12,642,405,175
,2024-08-23 22:46:03.780685
e
3,Bnb,$581.44,0.08%,0.32%,13.34%,$84.83B,$1,910,870,318,2024-
ng
08-23 22:46:03.956293
lle
SUMMARY:
This code scrapes cryptocurrency data from CoinMarketCap, extracting prices, percentage
ha
changes, market capitalization, and 24-hour trading volume for cryptocurrencies like Bitcoin and
Ethereum. The data is organized into a table with timestamps. The method can be adapted for
other websites by adjusting the URL and HTML parsing logic, making it versatile for various web
C
scraping tasks.
o n
th
Py
o
nt
ge
60
Learn Data Analytics Together
Regular Expression
A Regular Expression (RegEx) is a special string of characters used to define a pattern for
searching, matching, and manipulating text. In Python, the re library provides powerful tools for
working with Regular Expressions.
RegEx is widely used in many applications, such as validating email formats, searching for
substrings within text, replacing parts of text, and more.
e
Literal Characters: Matches the exact character. For example, 'a' will match 'a' .
ng
Dot (.): Matches any single character except for a newline character ( \n ).
Caret (^): Matches the position at the start of the string.
lle
Dollar Sign ($): Matches the position at the end of the string.
Asterisk (*): Matches the preceding character or pattern 0 or more times.
ha
Plus (+): Matches the preceding character or pattern 1 or more times.
Question Mark (?): Matches the preceding character or pattern 0 or 1 time.
C
Square Brackets ([]): Define a set of characters to match:
[abc] : Matches any single character within this set (a, b, or c).
n
abc : Matches any single character not in this set (not a, b, or c). The ^ inside
o
61
Learn Data Analytics Together
e
groups( ) : Returns a tuple containing all captured groups in the matched string.
ng
group(index) : Returns the captured group at the specified position.
groupdict( ) : Returns a dictionary containing captured groups with group names as
lle
keys.
expand(template) : Returns the matched string with any backreferences replaced by the
specified template.
ha
C
Modifiers (Flags) in the re module:
n
re.IGNORECASE or re.I : Ignores case when matching characters. For example, /abc/i
o
re.DOTALL or re.S : Makes the dot (.) match any character, including newline characters
( \n ).
re.VERBOSE or re.X : Allows the use of whitespace and comments in regular
o
To use Regular Expressions in Python, you need to import the re library. You can then use
functions like math( ) , search( ) , findall( ) , and sub( ) to work with patterns.
62
Learn Data Analytics Together
e
ng
lle
ha
C
Explanation:
n
^[a-zA-Z0-9_.+-]+ : Matches the start of the string, followed by one or more letters,
o
Explanation:
\d+ : Matches one or more digits. The findall( ) function finds all occurrences of numbers
in the string and returns them as a list.
63
Learn Data Analytics Together
Explanation:
e
ng
Split a String by Commas:
lle
ha
C
Explanation:
n
,\s* : Matches a comma followed by any number of whitespace characters (zero or more).
o
re.split( ) : Splits the string into a list of elements based on the pattern.
th
SUMMARY:
Py
Regular Expression (Regex) is a powerful tool used for searching, replacing, validating, and
manipulating text, often applied in tasks such as text search and replace, data validation (like
emails or phone numbers), extracting specific patterns from strings, parsing complex text, and
o
64
Learn Data Analytics Together
Job title
Company name
Job location
Job description
e
Date posted
ng
Job link
Use regular expressions to filter the job listings based on certain criteria. For example:
lle
Jobs that require specific programming languages (e.g., Python, JavaScript).
ha
Jobs that include certain keywords (e.g., "remote", "full-time").
C
Exclude jobs that contain unwanted terms (e.g., "internship", "part-time").
Store the filtered job listings in a CSV file or display them in a readable format.
o n
Bonus points:
th
Scheduling: Set up your script to run periodically to scrape and update the job listings daily or
weekly. (+3)
o
Notification System: Send an email notification with the latest job listings that match your
nt
criteria. (+5)
ge
GUIDE
1/ Libraries Used
csv: For working with CSV files, helping to store the collected job data.
selenium: Used for automating web browsers to fetch HTML content from websites.
BeautifulSoup: Parses HTML and extracts specific elements from the webpage source.
time and sleep: Used to pause the program for a period, ensuring the webpage is fully
loaded before proceeding with further tasks.
re: Uses Regular Expressions to search and match text patterns in the job data.
smtplib and getpass: Used to send email notifications about newly found jobs.
schedule: Helps schedule function execution at fixed intervals.
65
Learn Data Analytics Together
e
Purpose:
ng
This function is designed to collect information about jobs on the VietnamWorks website, filter
the jobs according to specific requirements, save the results to a CSV file, and send email
lle
notifications.
Steps:
requirements.
o
nt
ge
66
Learn Data Analytics Together
e
jobitems = job_data.find_all('div', class='sc-fwwElh iAsyDt') : Retrieves the list
ng
of jobs from the container.
lle
ha
C
o n
th
Py
Extracts specific details such as job title, company name, location, date posted,
salary, skills, and job link.
nt
ge
67
Learn Data Analytics Together
e
if not found_languages : Skips jobs that do not contain the required languages.
ng
lle
ha
C
o n
th
Py
list.
nt
ge
68
Learn Data Analytics Together
e
ng
lle
ha
C
o n
th
Py
o
nt
ge
69
Learn Data Analytics Together
e
address and password.
ng
sendmail(sender_email, receiver_email, message) : Sends an email to the
recipient with the prepared content.
lle
ha
C
o n
th
Py
o
nt
ge
scrape_jobs( ) : This function will be called when the code is executed, helping to collect
job data, save it to a CSV file, and send email notifications.
70
Learn Data Analytics Together
If you want to automatically run this function daily at a fixed time, you can use the schedule
library:
e
ng
lle
SUMMARY: This script automates the scraping of job listings from VietnamWorks, filters them
ha
based on programming languages (Python, JavaScript), saves the data to a CSV file, and
sends an email with the job details. It runs daily at 10:00 AM using Selenium, BeautifulSoup,
and smtplib
C
. The approach can be adapted to other websites and use cases, such as monitoring product
prices, aggregating news, or tracking social media trends, making it versatile for various web
n
71