0% found this document useful (0 votes)

105 views23 pages

The 30 Most Useful Python Libraries For Data Engineering - by ODSC - Open Data Science - Medium

This document discusses popular Python libraries for data engineering. It provides a list of 30 of the most useful libraries organized into categories like data workflow, data analysis, cloud, and data/big data libraries. Key libraries mentioned include Apache Airflow, Pandas, PyArrow, Boto3, Google API Core, and Google BigQuery. The document provides descriptions of each library's purpose and usefulness for data engineering tasks.

Uploaded by

ravinder singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views23 pages

The 30 Most Useful Python Libraries For Data Engineering - by ODSC - Open Data Science - Medium

Uploaded by

ravinder singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

The 30 Most Useful Python

Libraries for Data Engineering

ODSC - Open Data Science · Following
14 min read · Jan 12

495 4

For the upcoming Data Engineering Summit on January 18th, we’ve reached
out to some of the top experts in the field to speak on the topic. We observed
from our discussions and research that the most popular data engineering
programming languages include Python, Java, Scala, R, Julia, and C++.
However, Python continues to lead the pack thanks to its growing ecosystem
of libraries, tools, and frameworks for data engineering and related areas
such as machine learning and data science.

Regardless of metric use, many python libraries for data engineering are
useful. The importance of a Python library will depend on the content of the
task at hand. Data gleaned from our upcoming summit and also the Data
Engineering (DE) track at ODSC East 2023 identify these as some of the most
useful and popular include:

DATA WORKFLOW AND PIPELINE LIBRARIES

1. Library: apache-airflow

The apache-airflow library is a widely used scheduler and monitors for

executing and managing tasks, batch jobs, and orchestrating data pipelines.
Data engineers can use it to manage tasks and dependencies within a data
workflow that can handle a large number of tasks. It provides a simple UI
and API that includes scripting for failure handling and error recovery, all
wrapped in a high-performance framework. It allows one to define complex
workflows as directed acyclic graphs (DAGs) of tasks, where the edges
between tasks represent dependencies and the nodes represent the actual
tasks that are to be executed.

PyPI Page: https://pypi.org/project/apache-airflow

Home Page: https://airflow.apache.org

1. Library: luigi
First released by Spotify in 2011, Luigi is yet another open-source data
pipeline Python library. Similar to Airflow, it allows DEs to build and define
complex pipelines that execute a series of dependencies between tasks,
ensuring that tasks are executed in the correct order while managing
failures. Luigi also includes event monitoring that can trigger task execution.
It can be used for ETL and data ingestion and provides data cleaning and
transformation service before persisting it to data stores such as data lakes
and warehouses.

PyPI Page: https://pypi.org/project/luigi/

Home Page: https://github.com/spotify/luigi

1. Library: prefect: a library for building data pipelines

For data engineers, Airflow is a trusted tool but sometimes lacks the features
necessary for the modern data stack. Prefect was designed with these
shortcomings in mind. Prefect seeks to provide a simple, intuitive way to
build and manage complex data workflows and pipelines. It allows data
engineers to define and orchestrate pipelines, schedule, and trigger tasks,
and handle error handling and retries. Similar to other workflow python
libraries for data engineering, it can be used to extract data from various
sources, transform and clean the data, and load it into a target system or
database. It can also be used to monitor the status and progress of tasks, and
provide alerts and notifications when needed.

PyPI Page: https://pypi.org/project/prefect/

Home Page: https://github.com/PrefectHQ/prefect/

1. Library: kafka-python

Apache Kafka is a popular distributed messaging platform used for building

real-time data pipelines and streaming applications that stores data and
replicates it across multiple servers, providing high availability and
durability in case of server failures. The Kafka-python library provides a
high-level API for producing and consuming messages from Apache Kafka,
as well as lower-level APIs for more advanced use cases such as
asynchronous processing that facilitates sending and receiving messages
without blocking the main thread of execution.

PyPl Page: https://pypi.org/project/kafka-python

Home Page: https://pypistats.org/packages/kafka-python

1. Library: kombu

Kombu and Kafka-python are similar in that they are both libraries for
working with messaging systems in Python. However, Kombu is a Python
messaging library that provides a high-level API for interacting with message
brokers such as RabbitMQ and AMQP and support for message serialization,
connection pooling, and retry handling with these brokers. Data engineers
can use Kombu to produce and consume messages from message brokers,
which can be used to build data pipelines and stream data between systems
such as producing data from a database and sending it to a message broker,
whose messages can then be consumed by another application in the
pipeline.

PyPI Page: https://pypi.org/project/kombu

Home Page: https://docs.celeryq.dev/projects/kombu/en/stable

DATA ANALYSIS LIBRARIES

1. Library: pandas

Pandas is one of the most popular Python libraries for working with small-
and medium-sized datasets. Built on top of NumPy, Pandas (abbreviation for
Python Data Analysis Library) is ideal for data analysis and data
manipulation. It’s considered a must-have given its large collection of
powerful features such as data merging, handling missing data, data
exploration, and overall efficiency. Data engineers use it to quickly read data
from various sources, perform analysis and transformation operations on
the data, and output the results in various formats. Pandas is also frequently
paired with other python libraries for data engineering, such as scikit-learn
for data analysis and machine learning tasks.

PyPI Page: https://pypi.org/project/pandas

Home Page: https://pandas.pydata.org/

1. Library: pyarrow

Developed by some of the same authors of Pandas (Wes McKinney), to solve

some of the scalability issues of Pandas, Apache Arrow uses the now popular
columnar data store for better performance and flexibility. The PyArrow
library provides a Python API for the functionality provided by the Arrow
libraries, along with tools for Arrow integration and interoperability with
pandas, NumPy, and other software in the Python ecosystem. For data
engineers, pyarrow provides a scalable library to easily integrate data from
multiple sources into a single, unified, and large dataset for easy
manipulation and analysis.

PyPI Page: https://pypi.org/project/pyarrow

Home Page: https://arrow.apache.org/

CLOUD LIBRARIES
1. Library: boto3

Open in app
AWS is one of the most popular cloud service providers so there’s no surprise
that boto3 is on top of the list. Boto3 is a Software Development Kit (SDK)
Search Write
library for programmers to write software that makes use of a long list of
Amazon services including data engineer favorites such as Glue, EC2, RDS,
S3, Kinesis, Redshift, and Athena. In addition to performing common tasks
such as uploading and downloading data, and launching and managing EC2
instances, data engineers can leverage Boto3 to programmatically access and
manage many AWS services, that can be used to build data pipelines and
automate data workflow tasks.

PyPI Page: https://pypi.org/project/boto3/

Home Page: https://github.com/boto/boto3

1. Library: google-API-core

Data engineering is performed mainly on the cloud, and Google Cloud

Platform (GCP) is one of the top five providers which includes AWS, Azure,
IBM, and Oracle. The google-cloud-core package wraps services are common
to all Google cloud APIs such as authentication and authorization, HTTP
client request and response handling, data extraction (Google Drive, etc),
data transformation, and data management. For data engineers it can be
used to access data from Google Cloud Storage or BigQuery, Google’s cloud-
based data warehousing and analytics platform, or machine learning APIs
such as Cloud ML Engine.

PyPI Page: https://pypi.org/project/google-api-core

Home Page: https://github.com/googleapis/python-api-core

1. Library: Azure-core

From another of the top 5 cloud providers, Azure Core is a python library
and API for interacting with the Azure cloud services and is used by data
engineers for accessing resources and automating engineering tasks.
Common tasks include submitting and monitoring batch jobs, accessing
databases, data containers, and data lakes, and generally managing
resources such as virtual machines and containers. A related library for
Python is azure-storage-blob, a library built to manage retrieve, and store
large amounts of unstructured data such as images, audio, video, or text.

PyPI Page: https://pypi.org/project/azure-core

Home Page: https://github.com/Azure/azure-sdk-for-

python/tree/main/sdk/core/azure-core

DATA AND BIG DATA LIBRARIES

1. Library: google-cloud-bigquery
Created by Google for doing large-scale data analysis for its search and ad
business data, it was first announced in 2010. Once released, BigQuery
quickly became popular for its ability to perform fast SQL queries on
massive datasets (petabytes). Its performance is due to how it stores and
queries data. BigQuery stores data in shards in a columnar format and its
distributed query engine process queries across these shards in parallel,
allowing it to query and return results on even massive datasets. Now its
been widely adopted as a data warehouse and is popular due to its easy setup
and intuitive interfaces

PyPI Page: https://pypi.org/project/google-cloud-bigquery/

Home Page: https://github.com/googleapis/python-bigquery

1. Library: grpcio

Building distributed API systems or microservices are a few of the use cases
that drive the popularity of the gRPC Python package. gRPC is a modern
open-source high-performance Remote Procedure Call (RPC) framework
that can run in any environment. Features, such as load balancing, health
checks, authentication bidirectional streaming, and automatic retries, make
it a powerful tool for building secure, scalable, and reliable applications. In
summary, data engineers can use grpcio to build efficient, scalable data
pipelines for distributed systems.

PyPI Page: https://pypi.org/project/grpcio/

Home Page: https://grpc.io

1. Library: SQLAlchemy
SQLAlchemy is the Python SQL toolkit that provides a high-level interface for
interacting with databases. It allows data engineers to query data from a
database using SQL-like statements and perform common operations such
as inserting, updating, and deleting data from a database. SQLAlchemy also
provides support for object-relational mapping (ORM), which allows data
engineers to define the structure of their database tables as Python classes
and map those classes to the actual database tables. SQLAlchemy provides a
full suite of well-known enterprise-level persistence patterns, designed for
efficient and high-performing database access such as connection pooling
and connection reuse.

PyPI Page: https://pypi.org/project/SQLAlchemy

Home Page: https://www.sqlalchemy.org

Other notable python libraries for data engineering include PyMySQL and
sqlparse

1. Library: redis-py

Redis is a popular in-memory data store widely used in data engineering due
to its ability to scale and handle high volumes of data. It can be installed
locally or is already available on the major cloud providers. Redis-py is a
Python library that allows users to connect to a Redis database and perform
various operations such as storing and retrieving data, data transformations,
and data analysis. Redis-py can also be used to automate data engineering
tasks such as scheduling and integrating data from other sources including
extracting data from a database or API and storing it in Redis.

PyPI Page: https://pypi.org/project/redis

Home Page: https://github.com/redis/redis-py

1. Library: pyspark

Apache Spark is one of the most popular open-source data engineering

platforms thanks to its scalable design that lets it process large amounts of
data fast, and makes it ideal for tasks that require real-time processing or big
data analysis including ETL, machine learning, and stream processing. It
can also easily integrate with other platforms, such as Hadoop and other big
data platforms, making it easier for data engineers to work with a variety of
data sources and technologies. The PySpark library allows data engineers to
work with a wide range of data sources and formats, including structured
data, unstructured data, and streaming data.

PyPI Page: https://pypi.org/project/pyspark

Home Page: https://github.com/apache/spark/tree/master/python

DATA PARSING AND ETL LIBRARIES

1. Library: beautifulsoup4

Data engineering doesn’t always mean sourcing data from data stores and
warehouses. Often, data has to be extracted from unstructured sources such
as the web or documents. Beautiful Soup is a library that makes it easy to
scrape information from web pages. It sits atop an HTML or XML parser,
providing Pythonic idioms for iterating, searching, and modifying the parse
tree. This makes Beautiful Soup a popular Python library for data
engineering because it is easy to use and allows developers to easily extract
and manipulate data from unstructured sources.
PyPI Page: https://pypi.org/project/beautifulsoup4

Home Page: https://www.crummy.com/software/BeautifulSoup

MACHINE LEARNING AND DEEP LEARNING LIBRARIES

1. Library: scikit-learn

Created in 2007 by David Cournapeau, Fabian Pedregosa, and Andreas

Müller, scikit-learn is a Python module for machine learning built on top of
SciPy and was a precursor of other frameworks such as PyTorch and
Tensorflow. It’s relevant today for classification, regression, and clustering,
as well as tools for preprocessing and feature engineering. This allows data
engineers to build machine learning models and pipelines quickly and
easily.

PyPI Page: https://pypi.org/project/scikit-learn

Home Page: https://scikit-learn.org/stable/

1. Library: TensorFlow and Keras

TensorFlow is a well know machine-learning library that allows engineers to

build and train models. It provides a flexible platform for training and
serving models, with a focus on training and interfacing on deep neural
nets. TensorFlow is often paired with Keras, a high-level API written in
Python for building and training deep learning models. It wraps the efficient
numerical computation libraries Theano and TensorFlow and allows
engineers to build and train models using only a few lines of code. Data
engineering can also use TensorFlow for tasks such as data preprocessing,
data transformation, data analytics, and data visualization.

PyPl Page: https://pypi.org/project/tensorflow

Home Page: https://www.tensorflow.org

PyPl Page: https://pypi.org/project/keras

Home Page: https://keras.io

1. Library: PyTorch

Despite massive adaption, TensoFflow offered a steep learning curve and

PyTorch was created to be a more flexible and user-friendly alternative to
other established deep learning frameworks. Thanks to its ease of use,
PyTorch is now one of the fastest-growing platforms, providing enhanced
performance and expanded integration with other tools such as NumPy,
Pandas, and TensorFlow. Data engineers adapted the platform as it was one
of the first to offer a dynamic computational graph framework that allows
for flexible and efficient model building and training.

PyPI Page: https://pypi.org/project/torch

Home Page: https://pytorch.org

1. Library: virtualenv

Data engineers have to work with different python libraries for data
engineering and package versions, so having an isolated virtual environment
is essential. Virtualenv is a tool to create separated Python environments to
ensure no interference across your various system setups. Since Python 3.3,
a subset of it has been integrated into the standard library under the venv
module. Virtualenv is especially important for projects that have complex
dependencies or that need to be run on different versions of Python.

PyPI Page: https://pypi.org/project/virtualenv

Home Page: https://virtualenv.pypa.io/en/latest

ENVIRONMENTS, DEPLOYMENT, AND DISTRIBUTION

LIBRARIES
1. Library: Docker and Kubernetes

Containers like the Docker library have become essential in engineering

because they make it an easy package to deploy an application or service
with all of the necessary parts it needs to run in a consistent and predictable
manner. This can include runtime environments (Python etc), libraries,
databases, configuration files, and other dependencies. Containers like
Docker are often used in combination with container orchestration tools like
Kubernetes to manage the deployment and scaling of containerized
applications. Kubernetes automates the deployment, scaling, and
management of containerized applications allowing developers to deploy
and manage applications at scale, with features such as load balancing, auto-
scaling, and self-healing capabilities.

PyPI Page: https://pypi.org/project/docker

Home Page: https://github.com/docker/docker-py

PyPI Page: https://pypi.org/project/kubernetes

PyPI Page: https://kubernetes.io

1. Library: Dask

Dask was created to parallelize NumPy (the prolific Python library used for
scientific computing and data analysis) on multiple CPUs and has now
evolved into a general-purpose library for parallel computing that includes
support for Pandas DataFrames, and efficient model training on XGBoost
and scikit-learn. Data engineers have also adapted Dask due to its built-in
functions and parallel processing capabilities that make these large dataset
tasks such as data cleaning, transformation, aggregation, analysis, and
exploration (Matplotlib and Seaborn support) more efficient and faster. Data
engineers can also use Dask to scale workloads via a distributed scheduler
that can be used to schedule jobs across a cluster of machines.

PyPI Page: https://pypi.org/project/dask

Home Page: https://github.com/dask/dask

1. Library: Ray

Ray, incubated at UC Berkeley, had a mission to “simplify distributed

computing” and easily scale Python workloads including machine learning
workloads. In particular, and similar to Dask, Ray is designed to make it easy
to parallelize Python code and to build distributed applications from the
ground up. Ray doesn’t try to replace popular Python workloads tools but
rather provide a general low-level framework that is more of a general-
purpose clustering and parallelization framework that can be used to build
and run any type of distributed application. Thus, there is also a growing
number of projects that integrate with Ray in order to leverage accelerated
GPU and parallelized computing alongside Dask, Ludwig, spaCy, Hugging
Face, and scikit-learn.

PyPI Page: https://pypi.org/project/ray

Home Page: https://github.com/ray-project/ray

1. Library: Ansible

Another trending Python library for automation is Ansible for cloud

provisioning, configuration management, application deployment, intra-
service orchestration, and managing multiple servers or environments. The
Ansible library is similar to other configuration management and
orchestration push-based tools such as Chef, SaltStack, and Puppet.
However, Ansible differs from these tools in that it is agentless and uses a
simple, human-readable language (YAML) to describe automation tasks.
Ansible also ensures that operations are idempotent which it defines as “An
operation is idempotent if the result of performing it once is exactly the same as the
result of performing it repeatedly without any intervening actions.”

PyPI Page: https://pypi.org/project/ansible

Home Page: https://www.ansible.com

1. Library: python-jenkins and jenkinsapi

Jenkins is an established continuous integration/continuous delivery tool for

automating the building, testing, and deployment of applications and
services to a server. Two popular Python libraries for interacting with
Jenkins are python-jenkins and jenkinsapi. In the context of data
engineering, the python-jenkins library can be used to automate various
tasks related to data pipelines and data processing including testing, job
configuration, data ingestion, data cleansing, and data transformation. You
can use the library to monitor the status of Jenkins jobs, retrieve job logs,
and cancel running jobs. Similarly, the JenkinsAPI library can be used in
data engineering to automate the building and deployment of data pipelines
and other related tasks.

PyPI Page: https://pypi.org/project/jenkinsapi

Home Page: https://www.jenkins.io

UTILITY LIBRARIES
1. Library: psutil

psutil (process and system utilities) is a cross-platform library for retrieving

information on running processes and system utilization (CPU, memory,
disks, network, sensors) in Python. It is useful mainly for system
monitoring, profiling and management of running processes. It implements
many functionalities offered by classic UNIX command line tools such as ps,
top, iotop, lsof, netstat, ifconfig, free, and others. For data engineering it
provides a variety of tools for limiting the resources used by a process,
including CPU, memory, disk, and network usage, thus allowing engineers to
ensure that processes do not consume too many resources and potentially
impact the performance of a system.

PyPI Page: https://pypi.org/project/psutil/

Home Page: https://github.com/giampaolo/psutil

1. Library: urllib3

urllib3 is a powerful, user-friendly HTTP client for Python with thread

safety, support for compression, client-side verification, and many other
utilities that are missing from the Python standard libraries. Key
functionality includes support for HTTP requests (GET, PUT, POST, DELETE),
manipulating headers, enabling timeouts, and supporting cookies.

PyPI Page: https://pypi.org/project/urllib3

Home Page: https://urllib3.readthedocs.io/en/stable

1. Library: python-dateutil

The need to manipulate date and time is ubiquitous in Python, and often the
built-in datetime module doesn’t suffice. The dateutil module is a popular
extension to the standard datetime module. If you’re seeking to implement
timezones, calculate time deltas, or want more powerful generic parsing,
then this library is a good choice.

PyPI Page: https://pypi.org/project/python-dateutil

Home Page: https://github.com/dateutil/dateutil

1. Library: pyyaml

Most developers are familiar with YAM, the human-readable data

serialization format that is a popular choice for storing configuration data
that was originally used to build configuration files, but since it’s a
serialization language, its use has expanded and is now also popular for
object serialization in place of file formats such as JSON. In data
engineering, pyyaml is often used to configure container orchestration, data
pipelines, batch jobs, and general workflow for data processing.

PyPI Page: https://pypi.org/project/PyYAML

Home Page: https://pyyaml.org/

1. Library: pyparsing

This module is a popular alternative to regular expressions and can be used

to build and executed basis text parsers. It can be used for evaluating user-
definable expressions, processing custom application language commands,
or extracting data from formatted reports.

PyPI Page: https://pypi.org/project/pyparsing

Home Page: https://github.com/pyparsing/pyparsing

How to learn more about data engineering

As you can see above, there are numerous python libraries for data
engineering that go into the stack and use of data engineering tools and
workflows. This makes learning all about data engineering difficult to do
with just books or videos. This January 18th, we’re hosting the first-ever Data
Engineering Live Summit, a free virtual conference designed to help you
make data actionable across the board.
This exciting new conference will cover pivotal topics related to data
engineering, including, but not limited to:

Cloud Engineering | Database Infrastructure | Data Orchestration | Data

Start off your new year right and make 2023 the year you make a difference
with your data. Register here for the free Data Engineering Live Summit!

Originally posted on OpenDataScience.com

Data Science Data Engineering Artificial Intelligence Python

Machine Learning
Written by ODSC - Open Data Science Following

139K Followers

Our passion is bringing thousands of the best and brightest data scientists together
under one roof for an incredible learning and networking experience.

ODSC - Open Data Science ODSC - Open Data Science in ODSCJournal

10 Essential Topics to Master LLMs 10 Essential LLMs Topics to Know,

and Generative AI LLMOps and MLOps, and Trendin…
Generative AI is a new field. Over the past 10 Essential Topics to Master LLMs and
year, new terms, developments, algorithms,… Generative AI

6 min read · Nov 8 3 min read · Oct 26

134 1 73
ODSC - Open Data Science ODSC - Open Data Science

7 Trending Open Source Tools for Intelligent Document Processing

Data Visualization in 2023 with AWS AI Services and Amazo…
Data visualization tools turn insights and data Companies in sectors like healthcare, finance,
into something understandable, especially f… legal, retail, and manufacturing frequently…

3 min read · Nov 3 12 min read · Oct 27

71 46 2

See all from ODSC - Open Data Science

Recommended from Medium

Jeremy George Kamtziridis in MLearning.ai

What I learned after one year of ETL Pipelines With Python Azure
building a Data Platform from… Functions
My key learnings on building a Data platform, Managing data workflows isn’t something
from the tech side to the business side new. We’ve been doing it for years by utilizin…

9 min read · Nov 14 5 min read · Jul 8

843 11 28

Lists

Predictive Modeling w/ Practical Guides to Machine

Python Learning
20 stories · 625 saves 10 stories · 709 saves

Natural Language Processing New_Reading_List

889 stories · 412 saves 174 stories · 198 saves

Sofia Pinto in Data science at Nesta Dogukan Ulu

Python packages for assessing the Data Engineering End-to-End

quality of your data Project — Spark, Kafka, Airflow,…
As data scientists, we spend a lot of our time First of all, please visit my repo to be able to
doing exploratory data analysis (EDA),… understand the whole process better. This…

9 min read · Jun 13 9 min read · Jul 15

62 1.3K 5

Limoo Kiplimo Suffyan Asad

Automate Web Scraping with Getting Started with dbt (Data

Beautiful Soup and Apache Airflow Build Tool): A Beginner’s Guide to…
News Update Looking to get started with dbt? Check out
this beginner’s guide to building data…

5 min read · Jul 3 19 min read · Jul 7

3 22

See more recommendations

Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
Python Data Associate Certification Study Guide
No ratings yet
Python Data Associate Certification Study Guide
2 pages
Sample Outline Azure Machine Learning Engineering
No ratings yet
Sample Outline Azure Machine Learning Engineering
17 pages
MIT Software Testing
No ratings yet
MIT Software Testing
24 pages
Informatica Big Data Management Course Agenda
100% (2)
Informatica Big Data Management Course Agenda
4 pages
3.keylabs Training Parameterization
No ratings yet
3.keylabs Training Parameterization
42 pages
Troubleshooting SQL
No ratings yet
Troubleshooting SQL
1 page
Chapter 1 1 PDF
No ratings yet
Chapter 1 1 PDF
60 pages
Testing in Python - Unit Test & Script
No ratings yet
Testing in Python - Unit Test & Script
5 pages
GDL GIQAR GCP AuditComputerizedSystems Danilo Neri
No ratings yet
GDL GIQAR GCP AuditComputerizedSystems Danilo Neri
37 pages
Methodology For Data Validation v1.0 Rev-2016-06 Final
No ratings yet
Methodology For Data Validation v1.0 Rev-2016-06 Final
76 pages
Big Data Government Use Case Gartner
No ratings yet
Big Data Government Use Case Gartner
40 pages
Hands On Scripting
No ratings yet
Hands On Scripting
24 pages
Basics of Database Testing Contains The Following
No ratings yet
Basics of Database Testing Contains The Following
4 pages
Data Processing, Data Transformation and Data Analysis
No ratings yet
Data Processing, Data Transformation and Data Analysis
31 pages
SQL SERVER - Import CSV File Into SQL Server Using Bulk Insert
No ratings yet
SQL SERVER - Import CSV File Into SQL Server Using Bulk Insert
94 pages
Model Test Paper Dbms
No ratings yet
Model Test Paper Dbms
14 pages
Rapids Cheatsheet
100% (1)
Rapids Cheatsheet
2 pages
Extract Transform Load
No ratings yet
Extract Transform Load
80 pages
Python Exit Slip - May 2016
No ratings yet
Python Exit Slip - May 2016
8 pages
Master Data Manager 3.2.0.1 Installation Guide
No ratings yet
Master Data Manager 3.2.0.1 Installation Guide
135 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
62 pages
Manual Testing Cheat Sheet
No ratings yet
Manual Testing Cheat Sheet
9 pages
Python For EveryBody
0% (1)
Python For EveryBody
8 pages
Data Wrangling With Python Tips and Tools To Make Your Life Easier Test Bank Available Instantly
No ratings yet
Data Wrangling With Python Tips and Tools To Make Your Life Easier Test Bank Available Instantly
407 pages
Ad3002 - Question Bank Health Care
100% (1)
Ad3002 - Question Bank Health Care
16 pages
Agile Programming: A Brief Presentation by Pradeep
No ratings yet
Agile Programming: A Brief Presentation by Pradeep
18 pages
Kyle Mcevoy - Test Automation in Python
No ratings yet
Kyle Mcevoy - Test Automation in Python
144 pages
SW Project
No ratings yet
SW Project
19 pages
PV Database and Signal Detection
No ratings yet
PV Database and Signal Detection
34 pages
A Beginner's Notes On Bayesian Econometrics (Art)
No ratings yet
A Beginner's Notes On Bayesian Econometrics (Art)
21 pages
Fundamentals of Database Systems: Lesson 1: Introduction
No ratings yet
Fundamentals of Database Systems: Lesson 1: Introduction
35 pages
Chartered Data Scientists Curriculum 2020 PDF
No ratings yet
Chartered Data Scientists Curriculum 2020 PDF
4 pages
LabWare 8 Brochure
No ratings yet
LabWare 8 Brochure
8 pages
1) What Is Database: Mysql Oracle
No ratings yet
1) What Is Database: Mysql Oracle
58 pages
Configuring SFTP Access in SAS
No ratings yet
Configuring SFTP Access in SAS
15 pages
DW Olap
No ratings yet
DW Olap
57 pages
Basic Data Types in Python - Ipynb - Colaboratory
No ratings yet
Basic Data Types in Python - Ipynb - Colaboratory
5 pages
Data Warehouses and Data Cubes
No ratings yet
Data Warehouses and Data Cubes
21 pages
Data Analytics-Lab Manual
No ratings yet
Data Analytics-Lab Manual
19 pages
Testing Tools Material
80% (5)
Testing Tools Material
30 pages
MyRocks LSM Tree Database Storage Engine Serving Facebooks Social Graph
No ratings yet
MyRocks LSM Tree Database Storage Engine Serving Facebooks Social Graph
14 pages
STLC - Software Testing Life Cycle Phases & Entry, Exit Criteria
No ratings yet
STLC - Software Testing Life Cycle Phases & Entry, Exit Criteria
9 pages
Data Integrity
No ratings yet
Data Integrity
2 pages
Bayesian Machine Learning
No ratings yet
Bayesian Machine Learning
127 pages
5 Tips To Optimize Your SQL Queries
No ratings yet
5 Tips To Optimize Your SQL Queries
4 pages
A Comparison of in Memory Databases
No ratings yet
A Comparison of in Memory Databases
6 pages
Developing A Test Plan For Automated Ticket Issuing System For Dhaka Subway Systems PDF
100% (2)
Developing A Test Plan For Automated Ticket Issuing System For Dhaka Subway Systems PDF
14 pages
Structured Approach To Bi Testing
No ratings yet
Structured Approach To Bi Testing
13 pages
Introduction To Clinical Trials
No ratings yet
Introduction To Clinical Trials
50 pages
Informatica Installation and Configuration
No ratings yet
Informatica Installation and Configuration
42 pages
DAX Functions - Math and Statistical Functions
No ratings yet
DAX Functions - Math and Statistical Functions
9 pages
Administration OPEN DATABASE 2024
No ratings yet
Administration OPEN DATABASE 2024
66 pages
Microsoft Entra ID A Comprehensive Guide
No ratings yet
Microsoft Entra ID A Comprehensive Guide
8 pages
MongoDB Manual Master
No ratings yet
MongoDB Manual Master
1,142 pages
Aws Interview
No ratings yet
Aws Interview
4 pages
Apache Calcite Tutorial
No ratings yet
Apache Calcite Tutorial
83 pages
Bechmarking and Establishing Engineering Specifications PDF
No ratings yet
Bechmarking and Establishing Engineering Specifications PDF
46 pages
Experiment No 2 Introduction To Various Python Packages and Their Basic Use
No ratings yet
Experiment No 2 Introduction To Various Python Packages and Their Basic Use
5 pages
Dsbda Unit4
No ratings yet
Dsbda Unit4
110 pages
TGS Besar ML 8488 8684 8861 9010 9027
No ratings yet
TGS Besar ML 8488 8684 8861 9010 9027
8 pages
ICT 9 Activity Sheet: Quarter 3 - Week 2
No ratings yet
ICT 9 Activity Sheet: Quarter 3 - Week 2
6 pages
Connectionism
No ratings yet
Connectionism
9 pages
Laddernet: Multi-Path Networks Based On U-Net For Medical Image Segmentation Juntang Zhuang Biomedical Engineering, Yale University, New Haven, CT, USA
No ratings yet
Laddernet: Multi-Path Networks Based On U-Net For Medical Image Segmentation Juntang Zhuang Biomedical Engineering, Yale University, New Haven, CT, USA
4 pages
Homework 2 Answer: BME-AP-K53: Image Processing Course
No ratings yet
Homework 2 Answer: BME-AP-K53: Image Processing Course
9 pages
Bmgt107l Business-Analytics TH 1.0 71 Bmgt107l 66 Acp
No ratings yet
Bmgt107l Business-Analytics TH 1.0 71 Bmgt107l 66 Acp
2 pages
The Ultimate Guide To ML For Embedded Systems
No ratings yet
The Ultimate Guide To ML For Embedded Systems
22 pages
Mei3c Glossary Chapter 1 Lorena Vega Limón
No ratings yet
Mei3c Glossary Chapter 1 Lorena Vega Limón
3 pages
Lec 05
No ratings yet
Lec 05
53 pages
ML Session 1
No ratings yet
ML Session 1
22 pages
Berlo's SMCR Model of Communication
No ratings yet
Berlo's SMCR Model of Communication
13 pages
Business Communication
No ratings yet
Business Communication
13 pages
Listening, Speaking, Reading, and Writing Effectively, Are His Most Important Accomplishment and
No ratings yet
Listening, Speaking, Reading, and Writing Effectively, Are His Most Important Accomplishment and
1 page
AI Chapter1 SAV
No ratings yet
AI Chapter1 SAV
28 pages
Ai Powered Search
No ratings yet
Ai Powered Search
9 pages
DeepTrading With TensorFlow 3 - TodoTrader
No ratings yet
DeepTrading With TensorFlow 3 - TodoTrader
10 pages
Ict Home Task
No ratings yet
Ict Home Task
8 pages
Topics For TQM
No ratings yet
Topics For TQM
4 pages
On-Line Building Energy Optimization Using Deep Reinforcement Learning
No ratings yet
On-Line Building Energy Optimization Using Deep Reinforcement Learning
11 pages
Multi-Band Power System Stabilizer Design by Using CPCE Algorithm For Multi-Machine Power System
No ratings yet
Multi-Band Power System Stabilizer Design by Using CPCE Algorithm For Multi-Machine Power System
13 pages
Process Control
No ratings yet
Process Control
21 pages
1.factor Analysis and Dimension Reduction in R by G. David... - Z-Library
No ratings yet
1.factor Analysis and Dimension Reduction in R by G. David... - Z-Library
4 pages
Chapter Proposal Topic: Importance of Natural Language Processing in Cognitive Systems
No ratings yet
Chapter Proposal Topic: Importance of Natural Language Processing in Cognitive Systems
3 pages
ML Model Question Paper
0% (1)
ML Model Question Paper
2 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
155 pages
Sample Code
No ratings yet
Sample Code
8 pages
Siamese Neural Networks For Content Base
No ratings yet
Siamese Neural Networks For Content Base
5 pages
AI in Chemical Engineering
100% (2)
AI in Chemical Engineering
17 pages
CS 230 - Convolutional Neural Networks Cheatsheet
No ratings yet
CS 230 - Convolutional Neural Networks Cheatsheet
7 pages
Info 159/259 HW 2
No ratings yet
Info 159/259 HW 2
3 pages