Python | Create Test DataSets using Sklearn Last Updated : 21 Apr, 2023 Comments Improve Suggest changes Like Article Like Report Python's Sklearn library provides a great sample dataset generator which will help you to create your own custom dataset. It's fast and very easy to use. Following are the types of samples it provides.For all the above methods you need to import sklearn.datasets.samples_generator. Python3 # importing libraries from sklearn.datasets import make_blobs # matplotlib for plotting from matplotlib import pyplot as plt from matplotlib import style sklearn.datasets.make_blobs Python3 # Creating Test DataSets using sklearn.datasets.make_blobs from sklearn.datasets import make_blobs from matplotlib import pyplot as plt from matplotlib import style style.use("fivethirtyeight") X, y = make_blobs(n_samples = 100, centers = 3, cluster_std = 1, n_features = 2) plt.scatter(X[:, 0], X[:, 1], s = 40, color = 'g') plt.xlabel("X") plt.ylabel("Y") plt.show() plt.clf() Output: make_blobs with 3 centers sklearn.datasets.make_moon Python3 # Creating Test DataSets using sklearn.datasets.make_moon from sklearn.datasets import make_moons from matplotlib import pyplot as plt from matplotlib import style X, y = make_moons(n_samples = 1000, noise = 0.1) plt.scatter(X[:, 0], X[:, 1], s = 40, color ='g') plt.xlabel("X") plt.ylabel("Y") plt.show() plt.clf() Output: make_moons with 1000 data points sklearn.datasets.make_circle Python3 # Creating Test DataSets using sklearn.datasets.make_circles from sklearn.datasets import make_circles from matplotlib import pyplot as plt from matplotlib import style style.use("fivethirtyeight") X, y = make_circles(n_samples = 100, noise = 0.02) plt.scatter(X[:, 0], X[:, 1], s = 40, color ='g') plt.xlabel("X") plt.ylabel("Y") plt.show() plt.clf() Output: make _circle with 100 data points Scikit-learn (sklearn) is a popular machine learning library for Python that provides a wide range of functionalities, including data generation. In order to create test datasets using Sklearn, you can use the following code: Advantages of creating test datasets using Sklearn:Time-saving: Sklearn provides a quick and easy way to generate test datasets for machine learning tasks, which saves time compared to manually creating datasets.Consistency: The datasets generated by Sklearn are consistent and reproducible, which helps ensure consistency in your experiments and results.Flexibility: Sklearn provides a wide range of functions for generating datasets, including functions for classification, regression, clustering, and more, which makes it a flexible tool for generating test datasets for different types of machine learning tasks.Control over dataset parameters: Sklearn allows you to customize the generation of datasets by specifying parameters such as the number of samples, the number of features, and the level of noise, which gives you greater control over the test datasets you create.Disadvantages of creating test datasets using Sklearn:Limited dataset complexity: The datasets generated by Sklearn are typically simple and may not reflect the complexity of real-world datasets. Therefore, it may not be suitable for testing the performance of machine learning algorithms on complex datasets.Lack of diversity: Sklearn datasets may not reflect the diversity of real-world datasets, which may limit the generalizability of your machine learning models.Overfitting risk: If you generate test datasets that are too similar to your training datasets, there is a risk of overfitting your machine learning models, which can result in poor performance on new and unseen data.Overall, Sklearn provides a useful tool for generating test datasets quickly and efficiently, but it's important to keep in mind the limitations and potential drawbacks of using synthetic datasets for machine learning testing. It's recommended to use real-world datasets whenever possible to ensure the most accurate representation of the problem you are trying to solve. Comment More info P Praveen Sinha Follow Improve Article Tags : Machine Learning Explore Machine Learning BasicsIntroduction to Machine Learning8 min readTypes of Machine Learning13 min readWhat is Machine Learning Pipeline?7 min readApplications of Machine Learning3 min readPython for Machine LearningMachine Learning with Python Tutorial5 min readNumPy Tutorial - Python Library3 min readPandas Tutorial6 min readData Preprocessing in Python4 min readEDA - Exploratory Data Analysis in Python6 min readFeature EngineeringWhat is Feature Engineering?5 min readIntroduction to Dimensionality Reduction4 min readFeature Selection Techniques in Machine Learning6 min readSupervised LearningSupervised Machine Learning7 min readLinear Regression in Machine learning15+ min readLogistic Regression in Machine Learning11 min readDecision Tree in Machine Learning9 min readRandom Forest Algorithm in Machine Learning5 min readK-Nearest Neighbor(KNN) Algorithm8 min readSupport Vector Machine (SVM) Algorithm9 min readNaive Bayes Classifiers7 min readUnsupervised LearningWhat is Unsupervised Learning5 min readK means Clustering â Introduction6 min readHierarchical Clustering in Machine Learning6 min readDBSCAN Clustering in ML - Density based clustering6 min readApriori Algorithm6 min readFrequent Pattern Growth Algorithm5 min readECLAT Algorithm - ML3 min readPrincipal Component Analysis(PCA)7 min readModel Evaluation and TuningEvaluation Metrics in Machine Learning9 min readRegularization in Machine Learning5 min readCross Validation in Machine Learning5 min readHyperparameter Tuning7 min readML | Underfitting and Overfitting5 min readBias and Variance in Machine Learning10 min readAdvanced TechniquesReinforcement Learning8 min readSemi-Supervised Learning in ML5 min readSelf-Supervised Learning (SSL)6 min readEnsemble Learning8 min readMachine Learning PracticeTop 50+ Machine Learning Interview Questions and Answers15+ min read100+ Machine Learning Projects with Source Code [2025]6 min read Like