0% found this document useful (0 votes)
11 views

python notes

Uploaded by

Niranjan Patidar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

python notes

Uploaded by

Niranjan Patidar
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

https://github.com/LPU-Org/Professional-Notes/blob/main/GitNotesForProfessionals.

pdf

Smart Syntax

with_suffix
## Bad
new_filepath = str(Path("file.txt"))[:4] + ".md"

## Good
new_filepath = Path("file.txt").with_suffix(".md")

New Concepts
Walrus Operator

Make value assignment on the go!!

The walrus operator (:=) is an assignment expression introduced in Python 3.8. It allows you to

assign values to variables as part of an expression. Before its introduction, assignment was only

allowed in standalone statements, but the walrus operator enables assignment within

expressions like loops, conditions, and function calls.


The walrus operator reduces redundancy by combining assignment and evaluation in a single

step, which is especially useful in loops and conditional statements.

It enhances readability and efficiency, reducing the number of lines of code by eliminating the

need for separate assignment steps. Additionally, it helps streamline code when you need to

evaluate and assign a value that will be used multiple times.

while True:
user_input = input("Enter a valid string (non-empty): ")
if len(user_input) > 0:
break
print(f"Valid input received: {user_input}")

In the above code, the string is evaluated twice — once when reading the input and again in

the if condition to check its length.

while (user_input := input("Enter a valid string (non-empty): ")) and len(user_input) == 0:


print("Invalid input, try again.")
print(f"Valid input received: {user_input}")

Memory Slots

You knew saving memory is cool, but saving millions of bytes is even cooler.

In Python, the __slots__ feature is used to reduce memory usage by limiting the attributes an

object can have. Normally, Python uses a dynamic dictionary (__dict__) to store attributes of an

object, which allows for flexibility but consumes more memory.


By defining __slots__, you explicitly declare a fixed set of attributes, eliminating the use of

a __dict__ and reducing the memory footprint.

 Memory Efficient: Objects with __slots__ use less memory because they avoid the overhead

of the attribute dictionary.

 Well Optimized: Access to attributes is faster since there’s no need to look them up in a

dynamic dictionary.

Imagine you’re creating millions of lightweight User objects in a system where each user only

needs a few attributes (name, email). Reducing memory usage can drastically improve

performance.

class User:
def __init__(self, name, email):
self.name = name
self.email = email

# Creating a million users without `__slots__`


users = [User(f"User{i}", f"@example.com">user{i}@example.com") for i in range(1000000)]

In this case, every User object stores attributes in a dynamic dictionary, which consumes more

memory. Using __slots__ significantly reduces the memory overhead since objects get stored in

predefined slots without the need for a dictionary making the system more efficient when

creating large numbers of objects.

class User:
__slots__ = ['name', 'email'] # Declare fixed attributes
def __init__(self, name, email):
self.name = name
self.email = email

# Creating a million users with `__slots__`


users = [User(f"User{i}", f"@example.com">user{i}@example.com") for i in range(1000000)]

functools.lru_cache

In Python, functools.lru_cache is a decorator that provides a simple yet effective way to add

memoization to a function. Memoization is a technique used to speed up programs by caching

the results of expensive function calls and returning the cached result when the same inputs

occur again.

This can drastically improve performance for functions that are computationally expensive or

frequently called with the same arguments.

The lru_cache stands for "Least Recently Used Cache", which means that if the cache reaches its

maximum size, it will discard the least recently used items first.

from functools import lru_cache


@lru_cache(maxsize=None) # maxsize=None means the cache can grow indefinitely
def fibonacci(n):
if n < 2:
return n
return fibonacci(n-1) + fibonacci(n-2)

# Calculate Fibonacci numbers


print(fibonacci(10)) # Output: 55
print(fibonacci(15)) # Output: 610

requests -> httpx

Requests (Yes, Really!)

Look, there is nothing inherently wrong with Requests. It’s intuitive, it has a great API, and it’s

practically the mascot of Python HTTP libraries. But it’s overkill for when you just need to make

simple GET/POST requests, and it will lag in environments where you want asynchronous

performance.

Why It’s Overrated:

Blocking IO: Requests is synchronous, which means each call waits for the previous call to finish.

This is less than ideal when working with I/O-bound programs.

Heavy: It’s got loads of convenience baked in, but it does have a cost in terms of speed and

memory footprint. Not a big deal on a simple script, but on larger systems this can be a resource

hog.
What You Should Instead Use: httpx

For parallel processing of requests, httpxprovides a similar API but with asynchronous support.

So, if you make many API calls, it’ll save you some time and resources because it will process

those requests concurrently.


import httpx

async def fetch_data(url):


async with httpx.AsyncClient() as client:
response = await client.get(url)
return response.json()

# Simple and non-blocking


data = fetch_data("https://api.example.com/data")

Pro Tip: Asynchronous requests can reduce the processing time by a great amount if the task at

hand is web scraping or ingesting data from somewhere.

BeautifulSoup -> selectolax

2. BeautifulSoup (Yup, This One Too)

Alright, I know this is controversial. BeautifulSoup has been the standard library to tackle HTML

parsing for years, but it’s not really performing as well as it used to. Large or complex documents

have the tendency to make Beauti-fulSoup feel sluggish, and it hasn’t evolved to keep up with

Python’s async-first landscape.

Why It’s Overrated:

Speed: Not very fast, when the size of a document is very big.

Thread blocking: Much like Requests itself, it is not designed with async in mind, which certainly

makes it ill-suited for scraping dynamic websites.


Instead What you should use: selectolax

selectolax is a less famous library that uses libxml2 for better performance and with less memory

consumption.

from selectolax.parser import HTMLParser

html_content = "<html><body><p>Test</p></body></html>"
tree = HTMLParser(html_content)
text = tree.css("p")[0].text()
print(text) # Output: Test

As it will turn out, by using Selectolax, you retain the same HTML parsing capabilities but with

much-enhanced speed, making it ideal for web scraping tasks that are quite data-intensive.

“Do not fall in love with the tool; rather, fall in love with the outcome.” Choosing the proper

tool is half the battle.


Pandas -> polaris

3. Pandas for All Data Manipulation Tasks

Now, listen up-the thing is, Pandas is great at data exploration and for middle-sized datasets. But

people just use it for everything, like it’s some magic solution that’s going to solve every problem

in data, and quite frankly, it isn’t. Working with Pandas on huge datasets can turn your machine

into a sputtering fan engine, and memory overhead just doesn’t make sense for some

workflows.
Why It Is Overrated:

Memory Usage: As Pandas operates mainly in-memory, any operation on a large dataset will

badly hit performance.

Limited Scalability: Scaling with Pandas isn’t easy. It was never designed for big data.

What You Should Use Instead: Polars

Polars is an ultra-fast DataFrame library in Rust using Apache Arrow. Optimized for memory

efficiency and multithreaded performance, this makes it perfect for when you want to crunch

data without heating up your CPU.

import polars as pl

df = pl.read_csv("big_data.csv")
filtered_df = df.filter(pl.col("value") > 50)
print(filtered_df)

Why Polars? It will process data that would bring Pandas to its knees, and it handles operations

in a fraction of the time. Besides that, it also has lazy evaluation-meaning it is only computing

what’s needed.

Dictionary
- Hashing is calculated on the key and if keys have same has values then last key is kept as
it is but value is replaced with latest one. Hence for visually same key, last value is taken
in case of multiple key entries
- Key can only be added to the dictionary if hashing is possible for the key. Like list cannot
be added as key as it’s hashing is not possible. But behavior can be imposed see
example function listaskey ().
- whenever we add an object as a dictionary’s key, Python invokes the __𝐡𝐚𝐬𝐡__
function of that object’s class.
Hashing

my_dict = {'1': 'string', True: 'bool', 1: 'int', 1.0: float}


print(my_dict)

# o/p: {'1': 'string', True: <class 'float'>}

Reason
- In Python, dictionaries find a key based on the equivalence of hash (computed using
hash()), but not identity (computed using id()).
- Hash of True. 1 and 1.0 are same (1).
- Now as hash is same so key Is considered same and last kept value is considered.
- But key remains with initial value only because it overwrite values

This is because, at first, True is added as a key and its value is 'bool'. Next, while adding the
key 1, python recognizes it as an equivalence of the hash value.

Thus, the value corresponding to True is overwritten by 'int', while the key (True) is kept as
is.

Finally, while adding 1.0, another hash equivalence is encountered with an existing key of
True. Yet again, the value corresponding to True, which was updated to 'int' in the previous
step, is overwritten by 'float'.

__missing__
class customMissing(dict):
def __missing__(self,key):
self[key] = ls = []
return ls
def implementMissing(self):
d = customMissing()
d['1'] = 'a'
d['2'] = 'b'
print(d['fd'])

Date and Time

Refer
https://www.analyticsvidhya.com/blog/2024/01/datetime-in-python/

Classes of DateTime Python Module

1. The date data type stores calendar date information, including the year, month, and
day. It allows you to represent a specific date on the calendar.
2. The time data type stores the time of day, including the hour, minute, second, and
microsecond. It allows you to represent a specific point in time each day.
3. The datetime data type combines the date and time data types to store both calendar
date and time of day information together. It allows you to represent a full timestamp,
specifying both when something happened and what day it occurred on.
4. The timedelta data type is used to compute the difference between two dates, times, or
datetimes. It allows you to calculate the amount of time between two points in time so
you can determine how much time has passed or how much time remains until a future
date.
5. The tzinfo data type is used to store timezone information. It allows you to specify the
timezone for a particular date, time, or datetime value so you know the local time
represented and can correctly handle daylight saving time and other timezone-related
adjustments.
import datetime

# Get current date and time


now = datetime.datetime.now()

# Get date from now


date = now.date()

# Get time from now


time = now.time()

# Get datetime from now


datetime_now = now.datetime()

# Find difference between two datetimes


difference = datetime.datetime.now() - datetime.datetime(2017, 7, 1)

print("difference - ", difference)

# Create a timezone aware datetime


tz_aware_datetime = datetime.datetime(2017, 7, 1,
tzinfo=datetime.timezone.utc)

print("tz_aware_datetime - ", tz_aware_datetime)

You might also like