About

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets. We are happy to receive feedback and contributions. Deequ depends on Java 8. Deequ version 2.x only runs with Spark 3.1, and vice versa. If you rely on a previous Spark version, please use a Deequ 1.x version (legacy version is maintained in legacy-spark-3.0 branch). We provide legacy releases compatible with Apache Spark versions 2.2.x to 3.0.x. The Spark 2.2.x and 2.3.x releases depend on Scala 2.11 and the Spark 2.4.x, 3.0.x, and 3.1.x releases depend on Scala 2.12. Deequ's purpose is to "unit-test" data to find errors early, before the data gets fed to consuming systems or machine learning algorithms. In the following, we will walk you through a toy example to showcase the most basic usage of our library.

About

PySpark is an interface for Apache Spark in Python. It not only allows you to write Spark applications using Python APIs, but also provides the PySpark shell for interactively analyzing your data in a distributed environment. PySpark supports most of Spark’s features such as Spark SQL, DataFrame, Streaming, MLlib (Machine Learning) and Spark Core. Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrame and can also act as distributed SQL query engine. Running on top of Spark, the streaming feature in Apache Spark enables powerful interactive and analytical applications across both streaming and historical data, while inheriting Spark’s ease of use and fault tolerance characteristics.

About

The core of extensible programming is defining functions. Python allows mandatory and optional arguments, keyword arguments, and even arbitrary argument lists. Whether you're new to programming or an experienced developer, it's easy to learn and use Python. Python can be easy to pick up whether you're a first-time programmer or you're experienced with other languages. The following pages are a useful first step to get on your way to writing programs with Python! The community hosts conferences and meetups to collaborate on code, and much more. Python's documentation will help you along the way, and the mailing lists will keep you in touch. The Python Package Index (PyPI) hosts thousands of third-party modules for Python. Both Python's standard library and the community-contributed modules allow for endless possibilities.

About

Scikit-learn provides simple and efficient tools for predictive data analysis. Scikit-learn is a robust, open source machine learning library for the Python programming language, designed to provide simple and efficient tools for data analysis and modeling. Built on the foundations of popular scientific libraries like NumPy, SciPy, and Matplotlib, scikit-learn offers a wide range of supervised and unsupervised learning algorithms, making it an essential toolkit for data scientists, machine learning engineers, and researchers. The library is organized into a consistent and flexible framework, where various components can be combined and customized to suit specific needs. This modularity makes it easy for users to build complex pipelines, automate repetitive tasks, and integrate scikit-learn into larger machine-learning workflows. Additionally, the library’s emphasis on interoperability ensures that it works seamlessly with other Python libraries, facilitating smooth data processing.

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Platforms Supported

Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook

Audience

Anyone looking for an Unit Testing solution that measures data quality in large datasets

Audience

Application development solution for DevOps teams

Audience

Developers interested in a beautiful but advanced programming language

Audience

Engineers and data scientists requiring a solution to manage and improve their machine learning research

Support

Phone Support
24/7 Live Support
Online

Support

Phone Support
24/7 Live Support
Online

Support

Phone Support
24/7 Live Support
Online

Support

Phone Support
24/7 Live Support
Online

API

Offers API

API

Offers API

API

Offers API

API

Offers API

Screenshots and Videos

Screenshots and Videos

Screenshots and Videos

Screenshots and Videos

Pricing

No information available.
Free Version
Free Trial

Pricing

No information available.
Free Version
Free Trial

Pricing

Free
Free Version
Free Trial

Pricing

Free
Free Version
Free Trial

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Reviews/Ratings

Overall 5.0 / 5
ease 5.0 / 5
features 5.0 / 5
design 5.0 / 5
support 5.0 / 5

Reviews/Ratings

Overall 0.0 / 5
ease 0.0 / 5
features 0.0 / 5
design 0.0 / 5
support 0.0 / 5

This software hasn't been reviewed yet. Be the first to provide a review:

Review this Software

Training

Documentation
Webinars
Live Online
In Person

Training

Documentation
Webinars
Live Online
In Person

Training

Documentation
Webinars
Live Online
In Person

Training

Documentation
Webinars
Live Online
In Person

Company Information

Deequ
github.com/awslabs/deequ

Company Information

PySpark
spark.apache.org/docs/latest/api/python/

Company Information

Python
Founded: 1991
www.python.org

Company Information

scikit-learn
United States
scikit-learn.org/stable/

Alternatives

Spark Streaming

Spark Streaming

Apache Software Foundation

Alternatives

Alternatives

Alternatives

Gensim

Gensim

Radim Řehůřek
ML.NET

ML.NET

Microsoft
Apache Spark

Apache Spark

Apache Software Foundation
MLlib

MLlib

Apache Software Foundation
MLlib

MLlib

Apache Software Foundation
Apache Spark

Apache Spark

Apache Software Foundation
Apache Mahout

Apache Mahout

Apache Software Foundation
Spark Streaming

Spark Streaming

Apache Software Foundation
Keepsake

Keepsake

Replicate

Categories

Categories

Categories

Categories

Integrations

AnyIP
Buffer Editor
Cegal Prizm
Clutch
Devika
EOD Historical Data
GPT-5.1-Codex-Max
Gemini 2.5 Pro Preview (I/O Edition)
Golf
Grok 4 Fast
Maps Scraper AI
Muscula
PlatformIO
Routefusion
Spheron
Strands Agents
Ultralytics
liblab
parsel
scikit-image

Integrations

AnyIP
Buffer Editor
Cegal Prizm
Clutch
Devika
EOD Historical Data
GPT-5.1-Codex-Max
Gemini 2.5 Pro Preview (I/O Edition)
Golf
Grok 4 Fast
Maps Scraper AI
Muscula
PlatformIO
Routefusion
Spheron
Strands Agents
Ultralytics
liblab
parsel
scikit-image

Integrations

AnyIP
Buffer Editor
Cegal Prizm
Clutch
Devika
EOD Historical Data
GPT-5.1-Codex-Max
Gemini 2.5 Pro Preview (I/O Edition)
Golf
Grok 4 Fast
Maps Scraper AI
Muscula
PlatformIO
Routefusion
Spheron
Strands Agents
Ultralytics
liblab
parsel
scikit-image

Integrations

AnyIP
Buffer Editor
Cegal Prizm
Clutch
Devika
EOD Historical Data
GPT-5.1-Codex-Max
Gemini 2.5 Pro Preview (I/O Edition)
Golf
Grok 4 Fast
Maps Scraper AI
Muscula
PlatformIO
Routefusion
Spheron
Strands Agents
Ultralytics
liblab
parsel
scikit-image
Claim Deequ and update features and information
Claim Deequ and update features and information
Claim PySpark and update features and information
Claim PySpark and update features and information
Claim Python and update features and information
Claim Python and update features and information
Claim scikit-learn and update features and information
Claim scikit-learn and update features and information