0% found this document useful (0 votes)
24 views

Improve Your Python Code Automatically

Uploaded by

Al Wikah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Improve Your Python Code Automatically

Uploaded by

Al Wikah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Efficient Python Tricks and Tools for Data

Scientists - By Khuyen Tran

Code Review
GitHub View on GitHub Book View Book

This section covers some tools to automatically review and improve your
code such as sorting imports, check for missing docstrings, etc.
isort: Automatically Sort your Python Imports in 1
Line of Code
As your codebase expands, you may find yourself importing numerous
libraries, which can become overwhelming to navigate. To avoid arranging
your imports manually, use isort.

isort is a Python library that automatically sorts imports alphabetically,


grouping them by section and type.

Consider the following example where your imports are unsorted:

from sklearn.metrics import confusion_matrix, fl_score,


classification_report, roc_curve
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV,
StratifiedKFold
from sklearn import svm
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import TimeSeriesSplit
By running isort name_of_your_file.py, isort can sort your imports
automatically into the following:

from sklearn import svm


from sklearn.metrics import (classification_report,
confusion_matrix, fl_score,
roc_curve)
from sklearn.model_selection import (GridSearchCV,
StratifiedKFold,
TimeSeriesSplit,
train_test_split)
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier

You can use isort with pre-commit by adding the following to your .pre-
commit-config.yaml file:

- repo: https://github.com/timothycrosley/isort
rev: 5.12.0
hooks:
- id: isort

Link to isort.
interrogate: Check your Python Code for Missing
Docstrings
!pip install interrogate

Sometimes, you might forget to include docstrings for classes and functions.
Instead of manually searching through all your functions and classes for
missing docstrings, use interrogate.

Consider the following example where there are missing docstrings:

# interrogate_example.py
class Math:
def __init__(self, num) -> None:
self.num = num

def plus_two(self):
"""Add 2"""
return self.num + 2

def multiply_three(self):
return self.num * 3
You can use interrogate to identify missing docstrings:

$ interrogate interrogate_example.py

Output:

RESULT: FAILED (minimum: 80.0%, actual: 20.0%)

You can use interrogate with pre-commit by adding the following to your
.pre-commit-config.yaml file:

- repo: https://github.com/pre-commit/mirrors-interrogate
rev: v1.4.0
hooks:
- id: interrogate

Link to interrogate.
mypy: Static Type Checker for Python

!pip install mypy

Type hinting in Python is useful for other developers to understand the


expected data types to be used in your functions. To automate type checking
in your code, use mypy.

Consider the following file that includes type hinting:

# mypyÏ_example.py
from typing import List, Union

def get_name_price(fruits: list) -> Union[list, tuple]:


return zip(*fruits)

fruits = [('apple', 2), ('orange', 3), ('grape', 2)]


names, prices = get_name_price(fruits)
print(names) # ('apple', 'orange', 'grape')
print(prices) # (2, 3, 2)
When typing the following command on your terminal:

$ mypy mypy_example.py

you will get the output similar to this:

mypy_example.py:4: error: Incompatible return value type


(got "zip[Any]", expected "Union[List[Any], Tuple[Any,
...]]")

You can use mypy with pre-commit by adding the following to your .pre-
commit-config.yaml file:

repos:
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.910
hooks:
- id: mypy

Link to mypy.
Refurb: Refurbish and Modernize Python Codebases
If you want to have some guidelines to improve and optimize your code, try
Refurb.

For example, if you have a file like this:

# test_refurb.py
for n in [1, 2, 3, 4]:
if n == 2 or n == 4:
res = n/2

You can use Refurb to refurbish your code.

$ refurb test_refurb.py

test_refurb.py:1:10 [FURB109]: Replace `in [x, y, z]` with


`in (x, y, z)`
test_refurb.py:2:8 [FURB108]: Use `x in (y, z)` instead of
`x == y or x == z`

Run `refurb --explain ERR` to further explain an error.


Use `--quiet` to silence this message
$refurb test_refurb.py --explain FURB109

['Since tuple, list, and set literals can be used with the
`in` operator, it',
'is best to pick one and stick with it.',
'',
'Bad:',
'',
'```',
'for x in [1, 2, 3]:',
' pass',
'',
'nums = [str(x) for x in [1, 2, 3]]',
'```',
'',
'Good:',
'',
'```',
'for x in (1, 2, 3):',
' pass',
'',
'nums = [str(x) for x in (1, 2, 3)]',
'```']

Refurb only works with Python 3.10 and above.


You can use Refurb with pre-commit by adding the following to your .pre-
commit-config.yaml file:

repos:
- repo: https://github.com/dosisod/refurb
rev: REVISION
hooks:
- id: refurb

Link to Refurb.
Pydantic: Enforce Data Types on Your Function
Parameters at Runtime
!pip install pydantic

If you want to enforce data types on your function parameters and validate
their values at runtime, use Pydantic.

In the code below, since the value of test_size is a string, Pydantic raises a
ValidationError.

from pydantic import BaseModel

class ProcessConfig(BaseModel):
drop_columns: list = ["a", "b"]
target: str = "y"
test_size: float = 0.3
random_state: int = 1
shuffle: bool = True
def process(config: ProcessConfig = ProcessConfig()):
target = config.target
test_size = config.test_size
...

process(ProcessConfig(test_size="a"))

ValidationError: 1 validation error for ProcessConfig


test_size
value is not a valid float (type=type_error.float)

Link to Pydantic.

Build a full-stack ML application with Pydantic and Prefect.


perfplot: Performance Analysis for Python Snippets
!pip install perfplot

If you want to compare the performance between different snippets and plot
the results, use perfplot.

Consider the following file that includes three functions that create a list.

import perfplot

def append(n):
l = []
for i in range(n):
l.append(i)
return l

def comprehension(n):
return [i for i in range(n)]

def list_range(n):
return list(range(n))
To visualize the perfomance of these functions, use the perfplot.show
method.

perfplot.show(
setup=lambda n: n,
kernels=[
append,
comprehension,
list_range,
],
n_range=[2**k for k in range(25)],
)

Link to perfplot.
Analyze the Memory Usage of Your Python Code
!pip install memory_profiler

If you want to analyze the memory consumption of your Python code line-
by-line, use memory_profiler. This package allows you to generate a full
memory usage report of your executable and plot it.

$ mprof run memory_profiler_test.py

mprof: Sampling memory every 0.1s

Line # Mem usage Increment Line Contents


========================================================
4 41.9 MiB 41.9 MiB @profile
5 def func():
6 49.5 MiB 7.6 MiB a = [1]*(10**6)
7 202.1 MiB 152.6 MiB b = [2]*(2*10**7)
8 49.5 MiB -152.6 MiB del b
9 49.5 MiB 0.0 MiB return a
Plot the memory usage:

$ mprof plot

Link to memory_profiler.

You might also like