python notes
python notes
Smart Syntax
with_suffix
## Bad
new_filepath = str(Path("file.txt"))[:4] + ".md"
## Good
new_filepath = Path("file.txt").with_suffix(".md")
New Concepts
Walrus Operator
The walrus operator (:=) is an assignment expression introduced in Python 3.8. It allows you to
assign values to variables as part of an expression. Before its introduction, assignment was only
allowed in standalone statements, but the walrus operator enables assignment within
It enhances readability and efficiency, reducing the number of lines of code by eliminating the
need for separate assignment steps. Additionally, it helps streamline code when you need to
while True:
user_input = input("Enter a valid string (non-empty): ")
if len(user_input) > 0:
break
print(f"Valid input received: {user_input}")
In the above code, the string is evaluated twice — once when reading the input and again in
Memory Slots
You knew saving memory is cool, but saving millions of bytes is even cooler.
In Python, the __slots__ feature is used to reduce memory usage by limiting the attributes an
object can have. Normally, Python uses a dynamic dictionary (__dict__) to store attributes of an
Memory Efficient: Objects with __slots__ use less memory because they avoid the overhead
Well Optimized: Access to attributes is faster since there’s no need to look them up in a
dynamic dictionary.
Imagine you’re creating millions of lightweight User objects in a system where each user only
needs a few attributes (name, email). Reducing memory usage can drastically improve
performance.
class User:
def __init__(self, name, email):
self.name = name
self.email = email
In this case, every User object stores attributes in a dynamic dictionary, which consumes more
memory. Using __slots__ significantly reduces the memory overhead since objects get stored in
predefined slots without the need for a dictionary making the system more efficient when
class User:
__slots__ = ['name', 'email'] # Declare fixed attributes
def __init__(self, name, email):
self.name = name
self.email = email
functools.lru_cache
In Python, functools.lru_cache is a decorator that provides a simple yet effective way to add
the results of expensive function calls and returning the cached result when the same inputs
occur again.
This can drastically improve performance for functions that are computationally expensive or
The lru_cache stands for "Least Recently Used Cache", which means that if the cache reaches its
maximum size, it will discard the least recently used items first.
Look, there is nothing inherently wrong with Requests. It’s intuitive, it has a great API, and it’s
practically the mascot of Python HTTP libraries. But it’s overkill for when you just need to make
simple GET/POST requests, and it will lag in environments where you want asynchronous
performance.
Blocking IO: Requests is synchronous, which means each call waits for the previous call to finish.
Heavy: It’s got loads of convenience baked in, but it does have a cost in terms of speed and
memory footprint. Not a big deal on a simple script, but on larger systems this can be a resource
hog.
What You Should Instead Use: httpx
For parallel processing of requests, httpxprovides a similar API but with asynchronous support.
So, if you make many API calls, it’ll save you some time and resources because it will process
Pro Tip: Asynchronous requests can reduce the processing time by a great amount if the task at
Alright, I know this is controversial. BeautifulSoup has been the standard library to tackle HTML
parsing for years, but it’s not really performing as well as it used to. Large or complex documents
have the tendency to make Beauti-fulSoup feel sluggish, and it hasn’t evolved to keep up with
Speed: Not very fast, when the size of a document is very big.
Thread blocking: Much like Requests itself, it is not designed with async in mind, which certainly
selectolax is a less famous library that uses libxml2 for better performance and with less memory
consumption.
html_content = "<html><body><p>Test</p></body></html>"
tree = HTMLParser(html_content)
text = tree.css("p")[0].text()
print(text) # Output: Test
As it will turn out, by using Selectolax, you retain the same HTML parsing capabilities but with
much-enhanced speed, making it ideal for web scraping tasks that are quite data-intensive.
“Do not fall in love with the tool; rather, fall in love with the outcome.” Choosing the proper
Now, listen up-the thing is, Pandas is great at data exploration and for middle-sized datasets. But
people just use it for everything, like it’s some magic solution that’s going to solve every problem
in data, and quite frankly, it isn’t. Working with Pandas on huge datasets can turn your machine
into a sputtering fan engine, and memory overhead just doesn’t make sense for some
workflows.
Why It Is Overrated:
Memory Usage: As Pandas operates mainly in-memory, any operation on a large dataset will
Limited Scalability: Scaling with Pandas isn’t easy. It was never designed for big data.
Polars is an ultra-fast DataFrame library in Rust using Apache Arrow. Optimized for memory
efficiency and multithreaded performance, this makes it perfect for when you want to crunch
import polars as pl
df = pl.read_csv("big_data.csv")
filtered_df = df.filter(pl.col("value") > 50)
print(filtered_df)
Why Polars? It will process data that would bring Pandas to its knees, and it handles operations
in a fraction of the time. Besides that, it also has lazy evaluation-meaning it is only computing
what’s needed.
Dictionary
- Hashing is calculated on the key and if keys have same has values then last key is kept as
it is but value is replaced with latest one. Hence for visually same key, last value is taken
in case of multiple key entries
- Key can only be added to the dictionary if hashing is possible for the key. Like list cannot
be added as key as it’s hashing is not possible. But behavior can be imposed see
example function listaskey ().
- whenever we add an object as a dictionary’s key, Python invokes the __𝐡𝐚𝐬𝐡__
function of that object’s class.
Hashing
Reason
- In Python, dictionaries find a key based on the equivalence of hash (computed using
hash()), but not identity (computed using id()).
- Hash of True. 1 and 1.0 are same (1).
- Now as hash is same so key Is considered same and last kept value is considered.
- But key remains with initial value only because it overwrite values
This is because, at first, True is added as a key and its value is 'bool'. Next, while adding the
key 1, python recognizes it as an equivalence of the hash value.
Thus, the value corresponding to True is overwritten by 'int', while the key (True) is kept as
is.
Finally, while adding 1.0, another hash equivalence is encountered with an existing key of
True. Yet again, the value corresponding to True, which was updated to 'int' in the previous
step, is overwritten by 'float'.
__missing__
class customMissing(dict):
def __missing__(self,key):
self[key] = ls = []
return ls
def implementMissing(self):
d = customMissing()
d['1'] = 'a'
d['2'] = 'b'
print(d['fd'])
Refer
https://www.analyticsvidhya.com/blog/2024/01/datetime-in-python/
1. The date data type stores calendar date information, including the year, month, and
day. It allows you to represent a specific date on the calendar.
2. The time data type stores the time of day, including the hour, minute, second, and
microsecond. It allows you to represent a specific point in time each day.
3. The datetime data type combines the date and time data types to store both calendar
date and time of day information together. It allows you to represent a full timestamp,
specifying both when something happened and what day it occurred on.
4. The timedelta data type is used to compute the difference between two dates, times, or
datetimes. It allows you to calculate the amount of time between two points in time so
you can determine how much time has passed or how much time remains until a future
date.
5. The tzinfo data type is used to store timezone information. It allows you to specify the
timezone for a particular date, time, or datetime value so you know the local time
represented and can correctly handle daylight saving time and other timezone-related
adjustments.
import datetime