0% found this document useful (0 votes)

177 views12 pages

Essential Python Libraries and Functions For Data Science 1706295212

This document provides a summary of key Python libraries and tools for data science. It covers libraries for data manipulation (Pandas), numerical operations (NumPy), data visualization (Matplotlib, Seaborn), statistical analysis (SciPy, Statsmodels), machine learning (Scikit-learn, XGBoost, LightGBM), deep learning (TensorFlow, Keras, PyTorch), natural language processing (NLTK, spaCy), and working with databases (SQLAlchemy). For each library, it lists important functions and classes. The goal is to serve as a cheat sheet for the Python data scientist's toolbox.

Uploaded by

Prasad Yarra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

177 views12 pages

Essential Python Libraries and Functions For Data Science 1706295212

Uploaded by

Prasad Yarra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

# [ The Data Scientist's Python Toolbox ] [ cheatsheet ]

1. Data Manipulation and Analysis

● Pandas: Core library for data manipulation and analysis.

○ pd.read_csv(): Read data from a CSV file into a DataFrame.
○ pd.read_excel(): Read data from an Excel file into a
DataFrame.
○ df.head(): View the first few rows of the DataFrame.
○ df.describe(): Get a summary of statistics.
○ df.info(): Get concise summary of the DataFrame.
○ df['column'].value_counts(): Count unique values in a column.
○ df.groupby(): Group data using a mapper or by a series of
columns.
○ df.pivot_table(): Create a spreadsheet-style pivot table.
○ df.merge(): Merge DataFrame objects.
○ df.to_csv(): Write DataFrame to a comma-separated values
(csv) file.
○ pd.DataFrame(): Create a DataFrame from various data sources.
○ df.filter(): Subset the data.
○ df.sort_values(): Sort data by a column.
○ df.groupby().agg(): Aggregation after grouping.
○ df.join(), df.merge(): Join/Merge operations.
○ df.plot(): Basic plotting.
○ df.apply(): Apply functions.
○ df.to_sql(), df.read_sql(): Interaction with SQL databases.
○ df.to_datetime(): Convert a column to DateTime.
○ pd.get_dummies(df): Convert categorical variable into
dummy/indicator variables.

2. Numerical Operations

● NumPy: Fundamental package for numerical computations.

○ np.array(): Create an array.
○ np.reshape(): Change array shape.

By: Waleed Mousa

○ np.concatenate(): Concatenate arrays.
○ np.where(): Return elements chosen from x or y depending on
condition.
○ np.linalg.inv(): Compute the multiplicative inverse of a
matrix.
○ np.linalg.eig(): Compute the eigenvalues and right
eigenvectors of a square array.
○ np.arange(): Return evenly spaced values within a given
interval.
○ np.zeros(), np.ones(): Create arrays of zeros or ones.
○ np.linspace(): Create evenly spaced numbers over a specified
interval.
○ np.random.rand(), np.random.randn(): Create arrays of random
values.
○ np.dot(): Dot product of two arrays.
○ np.sqrt(), np.log(), np.exp(): Square root, logarithm,
exponentiation.

3. Data Visualization

● Matplotlib: Basic plotting library.

○ plt.plot(): Plot y versus x as lines and/or markers.
○ plt.scatter(): Make a scatter plot of x vs y.
○ plt.hist(): Plot a histogram.
○ plt.bar(): Make a bar plot.
○ plt.xlabel(), plt.ylabel(): Set the labels for x and y axes.
○ plt.title(): Set a title for the axes.
○ plt.legend(): Place a legend on the axes.
○ plt.figure(): Create a new figure.
○ plt.subplot(): Add a subplot to the current figure.
○ plt.xscale(), plt.yscale(): Set the scaling of the x-axis or
y-axis.
○ plt.xlim(), plt.ylim(): Get or set the x/y limits of the
current axes.
○ plt.colorbar(): Add a colorbar to a plot.

By: Waleed Mousa

○ plt.errorbar(): Plot y versus x as lines and/or markers with
attached errorbars.
● Seaborn: Statistical data visualization based on Matplotlib.
○ sns.set(): Set aesthetic parameters in one step.
○ sns.pairplot(): Plot pairwise relationships in a dataset.
○ sns.distplot(): Flexibly plot a univariate distribution of
observations.
○ sns.boxplot(): Draw a box plot to show distributions with
respect to categories.
○ sns.heatmap(): Heatmap representation of data.
○ sns.lmplot(): Plot data and regression model fits.
○ sns.clustermap(): Clustered heatmap.
○ sns.jointplot(): Draw a plot of two variables with bivariate
and univariate graphs.
○ sns.swarmplot(): Draw a categorical scatterplot with
non-overlapping points.
○ sns.countplot(): Show the counts of observations in each
categorical bin using bars.

4. Statistical Analysis

● SciPy: Library for scientific computing.

○ stats.ttest_ind(): Calculate the T-test for the means of two
independent samples of scores.
○ stats.pearsonr(): Pearson correlation coefficient and p-value
for testing non-correlation.
○ stats.norm(): Normal continuous random variable.
○ scipy.integrate.quad(): General purpose integration.
○ scipy.optimize.minimize(): Minimization of scalar functions
of one or more variables.
○ scipy.signal.convolve(): Convolve two N-dimensional arrays.
○ scipy.interpolate.interp1d(): Interpolate a 1-D function.
○ scipy.spatial.distance.euclidean(): Computes the Euclidean
distance between two 1-D arrays.

5. Machine Learning

By: Waleed Mousa

● Scikit-learn: Core library for machine learning.
○ train_test_split(): Split arrays or matrices into random
train and test subsets.
○ LinearRegression(), LogisticRegression(): Linear and Logistic
Regression models.
○ RandomForestClassifier(), RandomForestRegressor(): Random
Forest models for classification and regression.
○ KMeans(): K-Means clustering.
○ cross_val_score(): Evaluate a score by cross-validation.
○ GridSearchCV(): Search over specified parameter values for an
estimator.
○ confusion_matrix(), classification_report(): Compute confusion
matrix and a text report showing the main classification
metrics.
○ sklearn.preprocessing.StandardScaler(): Standardize features
by removing the mean and scaling to unit variance.
○ sklearn.decomposition.PCA(): Principal component analysis
(PCA).
○ sklearn.cluster.KMeans(): K-Means clustering.
○ sklearn.model_selection.cross_val_score(): Evaluate a score by
cross-validation.
○ sklearn.metrics.accuracy_score(), roc_auc_score():
Classification metrics.
● XGBoost: Gradient boosting framework.
○ XGBClassifier(), XGBRegressor(): XGBoost classifier and
regressor.
○ xgb.train(): Train a gradient boosting model.
○ xgb.DMatrix(): Optimized data structure for XGBoost.
● LightGBM: Light Gradient Boosting Machine.
○ LGBMClassifier(), LGBMRegressor(): LightGBM classifier and
regressor.
○ lgb.train(): Train a gradient boosting model.
○ lgb.Dataset(): Dataset for LightGBM.
● Statsmodels: Library for statistical models, hypothesis tests, and
data exploration.

By: Waleed Mousa

○ sm.OLS(), sm.Logit(): Models for linear regression and
logistic regression.
○ sm.tsa.ARIMA(): ARIMA model for time series analysis.

6. Deep Learning

● TensorFlow: Open-source machine learning framework.

○ tf.keras.models.Sequential(): Sequential model for linear
stack of layers.
○ tf.keras.layers.Dense(): Regular densely-connected NN layer.
○ tf.keras.layers.Conv2D(), tf.keras.layers.MaxPooling2D(): 2D
Convolutional and Pooling layers.
○ tf.GradientTape(): Record operations for automatic
differentiation.
○ tf.data.Dataset: Create a dataset from tensors.
○ tf.keras.Sequential(): Linear stack of layers.
○ tf.keras.models.Model(): Model class with Keras functional
API.
○ tf.keras.layers.LSTM(): Long Short-Term Memory layer.
○ tf.train.AdamOptimizer(): Adam optimizer.
● Keras: High-level neural networks API.
○ keras.models.load_model(): Load a Keras model.
○ keras.preprocessing.image.ImageDataGenerator(): Generate
batches of tensor image data with real-time data
augmentation.
○ keras.Model(): Group layers into an object with training and
inference features.
○ keras.layers.Conv2D(): 2D convolution layer.
○ keras.activations.relu, sigmoid: Activation functions.
○ keras.callbacks.ModelCheckpoint, EarlyStopping: Callbacks for
model training.
● PyTorch: Open source machine learning library.
○ torch.nn.Module: Base class for all neural network modules.
○ torch.Tensor: Multi-dimensional matrix containing elements of
a single data type.
○ torch.optim: Optimization algorithms.

By: Waleed Mousa

○ torch.utils.data.DataLoader: Combine a dataset and a sampler.

7. Natural Language Processing (NLP)

● NLTK: Leading platform for building Python programs to work with

human language data.
○ nltk.tokenize.word_tokenize(): Tokenize a string to split off
punctuation other than periods.
○ nltk.corpus.stopwords.words(): List of stopwords.
○ nltk.FreqDist(): Frequency distribution of words within a
text.
○ nltk.tag.pos_tag(): Part-of-speech tagging.
○ nltk.corpus: Access to large text corpora.
○ nltk.tokenize.sent_tokenize(): Tokenizer for sentences.
○ nltk.stem.PorterStemmer(): Porter word stemmer.
○ nltk.pos_tag(): Part-of-speech tagging.
○ nltk.NaiveBayesClassifier(): Naive Bayes classifier.
○ nltk.chunk(): Chunking for entity recognition.
● spaCy: Industrial-strength Natural Language Processing.
○ spacy.load(): Load a model.
○ doc = nlp(text): Process a text.
○ doc.ents: Named entities.
○ doc.sents: Sentence segmentation.
○ nlp(): Process raw text.
○ doc.sents: Generate sentence spans.
○ doc.ents: Named entity recognition.
○ doc.similarity(): Similarity between two documents.
○ spacy.lang: Language-specific models.

8. Working with Databases

● SQLAlchemy: SQL toolkit and Object-Relational Mapping (ORM)

library.
○ create_engine(): Database engine.
○ sessionmaker(): Session factory.
○ Base: Declarative base class for ORM.

By: Waleed Mousa

● sqlite3: SQLite database library.
○ sqlite3.connect(): SQLite database connection.
○ cursor.execute(): Execute a SQL command.
○ cursor(): Create a cursor object to call its execute() method
to perform SQL commands.

9. Web Scraping

● BeautifulSoup: Library for pulling data out of HTML and XML files.
○ BeautifulSoup(): Parse an HTML/XML document.
○ .find(), .find_all(): Find elements by tags.
● Scrapy: Open source and collaborative framework for extracting
data from websites.
○ scrapy.Spider: Base class for spiders.
○ response.css(), response.xpath(): Querying the data.
○ yield scrapy.Request(): Generate Requests.
○ parse(): Method to handle responses.

10. Data Visualization (Advanced)

● Plotly: Interactive graphing library.

○ plotly.graph_objs.Scatter(), plotly.graph_objs.Bar(): Create
scatter and bar plots.
○ plotly.subplots.make_subplots(): Create subplots.
○ plotly.express.scatter(), bar(), line(): Quick functions for
scatter, bar, and line plots.
○ plotly.graph_objs.Figure(): Create figures for a more custom
approach.
○ plotly.io.write_html(): Save plot as HTML file.
○ plotly.subplots.make_subplots(): Make figures with subplots.
● Bokeh: Interactive visualization library.
○ figure(): Create a new figure for plotting.
○ output_file(), output_notebook(): Output to static HTML file or
Jupyter Notebook.
○ ColumnDataSource(): Map names of columns to sequences or
arrays.

By: Waleed Mousa

○ show(), save(): Display or save plots.
○ widgets: Interactive widgets for plots.
● Altair: Declarative statistical visualization library.
○ alt.Chart(): Create a Chart object.
○ mark_point(), mark_line(), mark_bar(): Different types of
marks for visualization.
○ encode(): Encode visual channels.
● Folium: Map plotting library.
○ folium.Map(): Create a base map.
○ folium.Marker(), folium.CircleMarker(): Add markers to the
map.

11. Data Reporting and Business Intelligence

● Dash: Web application framework.

○ dash.Dash(): Create a Dash application.
○ dash.html.Div, dash_core_components: HTML components and core
components for Dash.
○ dash.dcc.Graph: Graph components for Dash.
○ app.callback(): Decorator for callbacks.
● Streamlit: App framework for Machine Learning and Data Science
teams.
○ st.write(): Write data or text.
○ st.dataframe(): Display a dataframe.
○ st.plotly_chart(): Display a Plotly chart.
○ st.sidebar.selectbox(): Add a select box to the sidebar.

12. Advanced Machine Learning

● CatBoost: Gradient boosting on decision trees library.

○ CatBoostClassifier(), CatBoostRegressor(): CatBoost models for
classification and regression.
○ catboost.Pool(): Data structure to store dataset.
● Hyperopt: Distributed Asynchronous Hyperparameter Optimization.
○ fmin(): Minimize a function over a hyperparameter space.
○ hp.choice(), hp.uniform(): Define hyperparameter space.

By: Waleed Mousa

● Optuna: Hyperparameter optimization framework.
○ create_study(): Create a study for hyperparameter
optimization.
○ optimize(): Optimize the objective function.

13. Model Interpretability and Explainability

● SHAP (SHapley Additive exPlanations): Explain the output of

machine learning models.
○ shap.TreeExplainer(): Explain predictions of tree-based
models.
○ shap_values(): Compute SHAP values.
● LIME (Local Interpretable Model-agnostic Explanations): Explain
individual predictions.
○ lime.lime_tabular.LimeTabularExplainer(): Explainer for
tabular data.
○ explain_instance(): Explain an individual instance.

14. Data Imputation

● Imputer from Scikit-learn: Imputation for completing missing

values.
○ SimpleImputer(): Imputation transformer for completing
missing values.
● KNNImputer from Scikit-learn: Imputation for filling in missing
values using the k-Nearest Neighbors approach.
○ KNNImputer(): Impute missing values using k-NN.

15. Feature Engineering and Selection

● Feature-engine: Feature engineering library.

○ CategoricalImputer(), NumericalImputer(): Impute missing
categorical or numerical values.
○ MathematicalCombination(): Create new features by combining
mathematical operations.
● SelectKBest from Scikit-learn: Select features according to the k
highest scores.
By: Waleed Mousa
○ SelectKBest(): Select features according to the top k scores.
● RFE (Recursive Feature Elimination) from Scikit-learn: Feature
ranking with recursive feature elimination.
○ RFE(): Recursive feature elimination.

16. Time Series Analysis

● Prophet: Forecasting time series data.

○ Prophet(): Create a new Prophet object.
○ model.fit(): Fit the Prophet model.
○ model.predict(): Make a future prediction.
● tsfresh: Automatic extraction of relevant features from time
series.
○ extract_features(): Automatically extract time series
features.

17. Image and Video Processing

● OpenCV (cv2): Open Source Computer Vision Library.

○ cv2.imread(), cv2.imshow(): Read and display images.
○ cv2.VideoCapture(): Capture video from a camera or a file.
○ cv2.cvtColor(): Color space conversion.
○ cv2.CascadeClassifier(): Haar cascade classifiers for object
detection.
○ cv2.findContours(): Find contours in a binary image.
● Pillow (PIL): Python Imaging Library.
○ Image.open(), Image.save(): Open and save images.
○ Image.filter(), ImageEnhance: Apply image filters and
enhancements.
○ image.rotate(), image.resize(): Rotate or resize an image.
○ ImageDraw.Draw(): Create object to draw on the image.

18. Model Deployment

● Flask: Micro web framework for building web applications.

○ flask.Flask(): Create a Flask application.
○ app.route(): Define routes for your application.
By: Waleed Mousa
○ flask.request: Request object to handle query parameters,
URLs, etc.
● FastAPI: Modern, fast (high-performance) web framework.
○ FastAPI(): Create a FastAPI application.
○ @app.get(), @app.post(): Define GET and POST endpoints.
○ pydantic.BaseModel: Define data models.

19. Working with Data Streams

● Apache Kafka for Python (confluent_kafka): Client for Apache Kafka.

○ Producer(), Consumer(): Produce and consume messages.
● PySpark Streaming: Processing real-time data streams.
○ StreamingContext(): Main entry point for streaming
functionality.
○ DStream: Discretized stream for processing.
○ SparkContext(): Entry point for Spark functionality.
○ RDD: Resilient Distributed Dataset for fault-tolerant
processing.
○ spark.sql(): Running SQL queries.
○ DataFrame: Distributed collection of data organized into
named columns.

20. Geospatial Data Analysis

● Geopandas: Work with geospatial data in Python.

○ geopandas.read_file(): Read geospatial data.
○ GeoDataFrame(): Geospatial dataframe.
● Rasterio: Access to geospatial raster data.
○ rasterio.open(): Open raster files.
● Folium (continued): Build interactive maps.
○ folium.Map(): Create a base map.
○ folium.features.GeoJson(): Add GeoJSON to a map.

21. Advanced Data Storage and Retrieval

● HDF5 for Python (h5py): Work with HDF5 binary data format.
○ h5py.File(): Open an HDF5 file.
By: Waleed Mousa
○ create_dataset(): Create a new dataset in an HDF5 file.
● PyTables: Manage large datasets and hierarchical databases.
○ tables.open_file(): Open an HDF5 file.
○ create_table(), create_array(): Create tables and arrays in
the file.

22. Cloud Services and APIs

● Boto3 for AWS: Amazon Web Services SDK for Python.

○ boto3.client(), boto3.resource(): Access AWS services.
● google-cloud-python: Client libraries for Google Cloud services.
○ from google.cloud import storage: Access Google Cloud Storage.

23. Optimization and Solvers

● CVXPY: Domain-specific language for convex optimization problems.

○ cvxpy.Problem(): Create an optimization problem.
○ problem.solve(): Solve the optimization problem.

By: Waleed Mousa

Mining Your Data Lake For Analytics Insights v3 101420
No ratings yet
Mining Your Data Lake For Analytics Insights v3 101420
16 pages
Senior Big Data Engineer Profile
No ratings yet
Senior Big Data Engineer Profile
6 pages
Azure Devops Artifacts Azure Devops
100% (1)
Azure Devops Artifacts Azure Devops
318 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
6 pages
Building A Career in Data Science
No ratings yet
Building A Career in Data Science
15 pages
Tungban Machine Learning Math Course
No ratings yet
Tungban Machine Learning Math Course
124 pages
Build Reliable Machine Learning Pipelines With Continuous Integration
No ratings yet
Build Reliable Machine Learning Pipelines With Continuous Integration
22 pages
Machine Learning Journey Logs
No ratings yet
Machine Learning Journey Logs
15 pages
Unit 3 - IoT-new
No ratings yet
Unit 3 - IoT-new
31 pages
Datawarehouse Tools
No ratings yet
Datawarehouse Tools
8 pages
Python for Business Analytics
No ratings yet
Python for Business Analytics
46 pages
Kubernetes
No ratings yet
Kubernetes
42 pages
Data Visualization in Python
No ratings yet
Data Visualization in Python
11 pages
DEV UNIT 1&2 Notes
No ratings yet
DEV UNIT 1&2 Notes
118 pages
Aws ML
No ratings yet
Aws ML
125 pages
Lesson 1 - Course - Introduction
No ratings yet
Lesson 1 - Course - Introduction
9 pages
Data Wrangling With Python Tips and Tools To Make Your Life Easier Test Bank Available Instantly
No ratings yet
Data Wrangling With Python Tips and Tools To Make Your Life Easier Test Bank Available Instantly
407 pages
Yahya Thesis - Draft
100% (1)
Yahya Thesis - Draft
58 pages
Python Decorators for Developers
No ratings yet
Python Decorators for Developers
12 pages
Machine Learning For Business: Using Amazon SageMaker and Jupyter 1st Edition Doug Hudgeon Full Chapters Instanly
100% (1)
Machine Learning For Business: Using Amazon SageMaker and Jupyter 1st Edition Doug Hudgeon Full Chapters Instanly
129 pages
Machine Learning + Devops Using Azure ML Services
No ratings yet
Machine Learning + Devops Using Azure ML Services
17 pages
Data Science Links
No ratings yet
Data Science Links
1 page
Data Modeling: Jak Na Cheatsheet
No ratings yet
Data Modeling: Jak Na Cheatsheet
3 pages
High-Performance Web Apps With FastAPI: The Asynchronous Web Framework Based On Modern Python 1st Edition Malhar Lathkar Newest Edition 2025
0% (1)
High-Performance Web Apps With FastAPI: The Asynchronous Web Framework Based On Modern Python 1st Edition Malhar Lathkar Newest Edition 2025
127 pages
NoSQL Data Analytics Guide
0% (1)
NoSQL Data Analytics Guide
50 pages
Azure Devops Pipelines Azure Devops
No ratings yet
Azure Devops Pipelines Azure Devops
2,441 pages
Mastering PyTorch Create and Deploy Deep Learning Models From CNNs To Multimodal Models LLMs and Beyond 2nd Edition Ashish Ranjan Jha Download
100% (1)
Mastering PyTorch Create and Deploy Deep Learning Models From CNNs To Multimodal Models LLMs and Beyond 2nd Edition Ashish Ranjan Jha Download
57 pages
Real Python Cheat Sheet
No ratings yet
Real Python Cheat Sheet
3 pages
Lecture 4 - Pair RDD and DataFrame
No ratings yet
Lecture 4 - Pair RDD and DataFrame
38 pages
Python Interview Questions
No ratings yet
Python Interview Questions
121 pages
Free Professional Cloud Architect Exam Questions
No ratings yet
Free Professional Cloud Architect Exam Questions
14 pages
Fast Python High Performance Techniques For Large Datasets MEAP V10 Tiago Rodrigues Antao Instant Download
No ratings yet
Fast Python High Performance Techniques For Large Datasets MEAP V10 Tiago Rodrigues Antao Instant Download
110 pages
Big Data Topic3 (Spark) (Thanh Binh Nguyen) .TextMark
No ratings yet
Big Data Topic3 (Spark) (Thanh Binh Nguyen) .TextMark
60 pages
Functools - Wraps: Wraps and It Too Is A Part of The Functools Module. You Can Use Wraps As A
No ratings yet
Functools - Wraps: Wraps and It Too Is A Part of The Functools Module. You Can Use Wraps As A
2 pages
SMOTE For Imbalanced Classification With Python
No ratings yet
SMOTE For Imbalanced Classification With Python
8 pages
Stream Processing at Lyft
No ratings yet
Stream Processing at Lyft
20 pages
SAFe 4 Agilist Exam Study Guide (4.6)
No ratings yet
SAFe 4 Agilist Exam Study Guide (4.6)
14 pages
Tensorlayer Documentation: Release 1.11.1
No ratings yet
Tensorlayer Documentation: Release 1.11.1
258 pages
Generative AI Essentials - Priyanka Singh
No ratings yet
Generative AI Essentials - Priyanka Singh
305 pages
Amruta Academy Brochure - Artificial Intelligence
100% (1)
Amruta Academy Brochure - Artificial Intelligence
18 pages
EAI & Data Source Patterns Guide
No ratings yet
EAI & Data Source Patterns Guide
20 pages
Big Data and Visualization
No ratings yet
Big Data and Visualization
141 pages
Slide 3 Hadoop MapReduce Tutorial
No ratings yet
Slide 3 Hadoop MapReduce Tutorial
119 pages
DS ML CompleteSlides PDF
No ratings yet
DS ML CompleteSlides PDF
211 pages
AI-900 Exam Study Guide Even
No ratings yet
AI-900 Exam Study Guide Even
4 pages
Databricks 101
No ratings yet
Databricks 101
16 pages
1.introduction To Python For Data Science
No ratings yet
1.introduction To Python For Data Science
6 pages
Lesson 4 Deep Neural Network and Tools
100% (1)
Lesson 4 Deep Neural Network and Tools
159 pages
GANs for Financial Data Augmentation
No ratings yet
GANs for Financial Data Augmentation
8 pages
PWC AI Engineer Interview Assignment Guidelines
No ratings yet
PWC AI Engineer Interview Assignment Guidelines
18 pages
Machine Learning Part 02
No ratings yet
Machine Learning Part 02
161 pages
Data Platform and Analytics Foundational Training: (Speaker Name)
100% (1)
Data Platform and Analytics Foundational Training: (Speaker Name)
23 pages
Slide 13 - Kafka
No ratings yet
Slide 13 - Kafka
109 pages
7 Steps For A Developer To Learn Apache Spark
No ratings yet
7 Steps For A Developer To Learn Apache Spark
30 pages
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
No ratings yet
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
112 pages
Lec 37
No ratings yet
Lec 37
13 pages
CB Queryoptimization 01
No ratings yet
CB Queryoptimization 01
78 pages
ML Lab File
No ratings yet
ML Lab File
33 pages
Python Library Functions
No ratings yet
Python Library Functions
12 pages
Acids Bases
No ratings yet
Acids Bases
34 pages
Per I Cyclic
No ratings yet
Per I Cyclic
44 pages
Cengage Inorganic Chemistry PDF
No ratings yet
Cengage Inorganic Chemistry PDF
468 pages
Reaction Mechanism PDF
33% (3)
Reaction Mechanism PDF
57 pages
Required Viteee Notes
No ratings yet
Required Viteee Notes
9 pages
Surface Chemistry Easy Notes
100% (1)
Surface Chemistry Easy Notes
11 pages
Chapter 3
No ratings yet
Chapter 3
2 pages
Physics: Class Puc 2 Yr
No ratings yet
Physics: Class Puc 2 Yr
4 pages
Alkene and Alkyne - by Resonance PDF
100% (1)
Alkene and Alkyne - by Resonance PDF
45 pages
Exercise - 5: Passage: I Passage: II Passage: III Passage: IV Passage: V Passage: VI
No ratings yet
Exercise - 5: Passage: I Passage: II Passage: III Passage: IV Passage: V Passage: VI
1 page
1 Goc 060412
No ratings yet
1 Goc 060412
5 pages
Chemistry XII 2011 12
No ratings yet
Chemistry XII 2011 12
167 pages
7 Polymers
No ratings yet
7 Polymers
25 pages
Outside Delhi - Set 2-15-03-14
No ratings yet
Outside Delhi - Set 2-15-03-14
8 pages
Amines
No ratings yet
Amines
3 pages
Properties of Compounds
No ratings yet
Properties of Compounds
15 pages
Chem Paper
No ratings yet
Chem Paper
4 pages
Unit 09 Organic Chemistry
No ratings yet
Unit 09 Organic Chemistry
13 pages
Alkyl and Aryl Halides
No ratings yet
Alkyl and Aryl Halides
1 page
UG Final Exam Grade Sheet
No ratings yet
UG Final Exam Grade Sheet
2 pages
Qualimap 1.0: Installation & Usage Guide
No ratings yet
Qualimap 1.0: Installation & Usage Guide
35 pages
Anika Raj Class 4 TEA-2-Math-RWS 5-Perimter and Area-2021-22
No ratings yet
Anika Raj Class 4 TEA-2-Math-RWS 5-Perimter and Area-2021-22
3 pages
350 Interview
No ratings yet
350 Interview
88 pages
4th Unit Test MATH 4
No ratings yet
4th Unit Test MATH 4
3 pages
Chicago Undergraduate Physics Bibliography
No ratings yet
Chicago Undergraduate Physics Bibliography
12 pages
Parts List: Tfmx-Ii C
No ratings yet
Parts List: Tfmx-Ii C
87 pages
Linux OS Case Study & Features
No ratings yet
Linux OS Case Study & Features
61 pages
ImageJ Analysis of Dentin Tubule Distribution in H
No ratings yet
ImageJ Analysis of Dentin Tubule Distribution in H
7 pages
Java Unit 4 - 23410738 - 2023 - 11 - 22 - 12 - 44
No ratings yet
Java Unit 4 - 23410738 - 2023 - 11 - 22 - 12 - 44
32 pages
BS en Iso 10579 2013
No ratings yet
BS en Iso 10579 2013
16 pages
Softwre Project Test Case
No ratings yet
Softwre Project Test Case
13 pages
Cable Manufacturing Equipment List
No ratings yet
Cable Manufacturing Equipment List
87 pages
Stain Removal Effect of Novel Papain-And Bromelain-Containing Gels Applied To Enamel
No ratings yet
Stain Removal Effect of Novel Papain-And Bromelain-Containing Gels Applied To Enamel
6 pages
01 Cadfil - NC - Controller - Requirements
No ratings yet
01 Cadfil - NC - Controller - Requirements
3 pages
ME130-2: Fluid Mechanics: Fluid Properties & Fluid Statics
No ratings yet
ME130-2: Fluid Mechanics: Fluid Properties & Fluid Statics
9 pages
Reconquest Manuscript PDF
No ratings yet
Reconquest Manuscript PDF
78 pages
ACCO 20103 Intermediate Accounting 3 Midterm Quicknotes
No ratings yet
ACCO 20103 Intermediate Accounting 3 Midterm Quicknotes
28 pages
A Presentation On Open Ended Project Topic:-Fluidization Subject: - Fluid Flow Operation
No ratings yet
A Presentation On Open Ended Project Topic:-Fluidization Subject: - Fluid Flow Operation
20 pages
Mastercam 2021
100% (1)
Mastercam 2021
28 pages
Data Insights for Auto Parts Firm
100% (3)
Data Insights for Auto Parts Firm
46 pages
Servo Motor Guide for Hobbyists
100% (1)
Servo Motor Guide for Hobbyists
5 pages
Bluetooth Low Energy in IoT
No ratings yet
Bluetooth Low Energy in IoT
11 pages
Haber Process For The Production of Ammonia 1
No ratings yet
Haber Process For The Production of Ammonia 1
4 pages
Instrukcja Obslugi - TT-S6D-eng
No ratings yet
Instrukcja Obslugi - TT-S6D-eng
2 pages
Aermec FCX Technical Manual Eng
0% (1)
Aermec FCX Technical Manual Eng
72 pages
4.2.2.7 Lab - Configuring Frame Relay and Subinterfaces
No ratings yet
4.2.2.7 Lab - Configuring Frame Relay and Subinterfaces
19 pages
Xii Functions Answers
No ratings yet
Xii Functions Answers
10 pages
Answers To Practice Questions: Risk and Return
No ratings yet
Answers To Practice Questions: Risk and Return
10 pages
"Bridge B2HZ" For The Control of A DC Motor
No ratings yet
"Bridge B2HZ" For The Control of A DC Motor
16 pages

Essential Python Libraries and Functions For Data Science 1706295212

Uploaded by

Essential Python Libraries and Functions For Data Science 1706295212

Uploaded by

# [ The Data Scientist's Python Toolbox ] [ cheatsheet ]

1. Data Manipulation and Analysis

● Pandas: Core library for data manipulation and analysis.

● NumPy: Fundamental package for numerical computations.

By: Waleed Mousa

● Matplotlib: Basic plotting library.

By: Waleed Mousa

● SciPy: Library for scientific computing.

By: Waleed Mousa

By: Waleed Mousa

● TensorFlow: Open-source machine learning framework.

By: Waleed Mousa

7. Natural Language Processing (NLP)

● NLTK: Leading platform for building Python programs to work with

8. Working with Databases

● SQLAlchemy: SQL toolkit and Object-Relational Mapping (ORM)

By: Waleed Mousa

10. Data Visualization (Advanced)

● Plotly: Interactive graphing library.

By: Waleed Mousa

11. Data Reporting and Business Intelligence

● Dash: Web application framework.

12. Advanced Machine Learning

● CatBoost: Gradient boosting on decision trees library.

By: Waleed Mousa

13. Model Interpretability and Explainability

● SHAP (SHapley Additive exPlanations): Explain the output of

14. Data Imputation

● Imputer from Scikit-learn: Imputation for completing missing

15. Feature Engineering and Selection

● Feature-engine: Feature engineering library.

16. Time Series Analysis

● Prophet: Forecasting time series data.

17. Image and Video Processing

● OpenCV (cv2): Open Source Computer Vision Library.

18. Model Deployment

● Flask: Micro web framework for building web applications.

19. Working with Data Streams

● Apache Kafka for Python (confluent_kafka): Client for Apache Kafka.

20. Geospatial Data Analysis

● Geopandas: Work with geospatial data in Python.

21. Advanced Data Storage and Retrieval

22. Cloud Services and APIs

● Boto3 for AWS: Amazon Web Services SDK for Python.

23. Optimization and Solvers

● CVXPY: Domain-specific language for convex optimization problems.

By: Waleed Mousa

You might also like