0% found this document useful (0 votes)

4 views9 pages

AI in HC - 1

The document outlines an experiment to generate a synthetic dataset using the Faker library in Python, which creates realistic-looking fake data. It provides installation instructions for necessary libraries and demonstrates how to generate various data types such as names, addresses, emails, and company information. The final section includes a function to create a custom dataset with specified formats and random values, showcasing the versatility of the Faker library.

Uploaded by

muni.kundalaiml2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views9 pages

AI in HC - 1

Uploaded by

muni.kundalaiml2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Experiment -1

Aim:- Generate a custom or synthetic data set using

Python.

Theory:- Faker is a Python library that generates fake data for you.
It is useful to create realistic-looking datasets and can generate all
types of data. We’ll explore those most relevant for customer demos
but the documentation details all the “providers” of fake data
available in the library.

To begin, let’s make sure we have the necessary libraries installed. In

addition to Faker and Numpy, we’ll also need the handy pandas
library. The hana_ml library will be used to upload the dataset we
create to SAP HANA Cloud.

!pip install numpy

!pip install faker
!pip install pandas
!pip install hana_ml

import pandas as pd
from faker import Faker
import numpy as np

fake = Faker()

# First name
for _ in range(3):
print(fake.first_name())
Output:-
Tyler
Mark
Susan
There are providers for different types of data we can
generate on a fake “customer” by calling the appropriate
Faker provider.

# There are specific versions of these generators

# It can generate names

print('Male first names: ' + fake.first_name_male())
print('Female first names: ' + fake.first_name_female())
print('Last names: ' + fake.last_name())
print('Full names: ' + fake.name())

# Generate prefixes and suffixes (there are also gender specific

versions e.g. prefix_female())
print('Prefix: ' + fake.prefix())
print('Suffix: ' + fake.suffix())

# Generate emails
print('Company emails: ' + fake.ascii_company_email())
print('Safe emails: ' + fake.ascii_safe_email())
print('Free emails: ' + fake.ascii_free_email())
print('ASCII Emails: ' + fake.ascii_email())
print('Emails: ' + fake.email())

Output:-

Male first names: Luis

Female first names: Lori
Last names: Burton
Full names: Mitchell Maynard
Prefix: Mr.
Suffix: DDS
Company emails: [email protected]
Safe emails: [email protected]
Free emails: [email protected]
ASCII Emails: [email protected]
Emails: [email protected]
If you prefer to create a company-focused dataset, you can do
that too.

# Company names
print('Company name: ' + fake.company())
print('Company suffix: ' + fake.company_suffix())

# Generate Address components

print('Street address: ' + fake.street_address())
print('Bldg #: ' + fake.building_number())
print('City: ' + fake.city())
print('Country: ' + fake.country())
print('Postcode: ' + fake.postcode())

# Or generate full addresses

print('Full address: ' + fake.address())

# Even generate motto, etc.

print('Catch phrase: ' + fake.catch_phrase())
print('Motto: ' + fake.bs())

Output:-

Company name: Park-Osborne

Company suffix: Ltd
Street address: 2694 Hughes View Suite 654
Bldg #: 5802
City: Craigfurt
Country: Iran
Postcode: 78482
Full address: 46463 Juan Fall Apt. 788
Port Benjamin, RI 60825
Catch phrase: Managed 5thgeneration adapter
Motto: redefine 24/365 markets

Generate columns that match specific formats If you need to

create fake data that needs a specific format, such as a product
code or iPhone model, you can do that too:

# Use bothify to generate random numbers(#) or letters(?). Can

limit the letters used with letters=
print(fake.bothify('PROD-??-##', letters='ABCDE'))
print(fake.bothify('iPhone-#'))

# Create fake True/False values

# Random True/False
print(fake.boolean())

# Specify % True
print(fake.boolean(chance_of_getting_true=25))

For categorical columns, you can specify a list of values to

randomly choose from. Optionally, you can also specify the
weights to give to each value if you don’t want each element in
the list to have an equal chance of being selected.

import numpy as np

industry = ['Automotive','Health Care','Manufacturing','High

Tech','Retail']
# Specify probabilities of each category (must sum to 1.0)
weights = [0.6, 0.2, 0.1, 0.07, 0.03]
# p= specifies the probabilities of each category. Must sum to
1.0
print(np.random.choice(industry, p=weights))

# Generating choice without weights (equal probability on all

elements)
print(np.random.choice(industry))
Output:- Health Care
Health Care

import numpy as np

industry = ['Automotive','Health Care','Manufacturing','High

Tech','Retail']
# Specify probabilities of each category (must sum to 1.0)
weights = [0.6, 0.2, 0.1, 0.07, 0.03]

# p= specifies the probabilities of each category. Must sum to

1.0
print(np.random.choice(industry, p=weights))

# Generating choice without weights (equal probability on all

elements)
print(np.random.choice(industry))

Output:- Automotive
Manufacturing

# 1st argument is mean of distribution, 2nd is standard deviation

print(np.random.normal(1000, 100))
# Rounded result
print(round(np.random.normal(1000, 100)))

# Generate random integer between 0 and 4

print(np.random.randint(5))

Output:-
1174.2251307283339
961
0

print(fake.date_this_century().strftime('%m-%d-%Y'))
print(fake.date_this_decade().strftime('%m-%d-%Y'))
print(fake.date_this_year().strftime('%m-%d-%Y'))
print(fake.date_this_month().strftime('%m-%d-%Y'))
print(fake.time())
import pandas as pd

# Start and end dates to generate data

my_start = pd.to_datetime('01-01-2021')
my_end = pd.to_datetime('12-31-2021')

print(f'Random date between {my_start} & {my_end}')

fake.date_between_dates(my_start, my_end).strftime('%m-%d-%Y')

Output:-

01-28-2005
07-16-2020
03-19-2023
11-04-2023
18:31:29
Random date between 2021-01-01 00:00:00 & 2021-12-31 00:00:00
'11-04-2021

print(fake.year())
print(fake.month())
print(fake.day_of_month())
print(fake.day_of_week())
print(fake.month_name())
print(fake.past_date('-1y'))
print(fake.future_date('+1d'))
Output:-
1994
11
20
Friday
January
2022-12-21
2023-11-25
Use all the above code to generate a custom
dataset.

from faker import Faker

import numpy as np
import pandas as pd

industry = ['Automotive','Health Care','Manufacturing','High

Tech', 'Retail']

fake = Faker()
def create_data(x):

# dictionary
b_user ={}
for i in range(0, x):
b_user[i] = {}
b_user[i]['name'] = fake.name()
b_user[i]['job'] = fake.job()
b_user[i]['birthdate'] =
fake.date_of_birth(minimum_age=18,maximum_age=65)
b_user[i]['email'] = fake.company_email()
b_user[i]['company'] = fake.company()
b_user[i]['industry'] = fake.random_element(industry)
b_user[i]['city'] = fake.city()
b_user[i]['state'] = fake.state()
b_user[i]['zipcode'] = fake.postcode()
b_user[i]['netNew'] =
fake.boolean(chance_of_getting_true=65)
b_user[i]['sales_rounded'] =
round(np.random.normal(1000,200))
b_user[i]['sales_decimal'] = np.random.normal(1000,200)
b_user[i]['priority'] = fake.random_digit()
b_user[i]['industry2'] = np.random.choice(industry)
return b_user
df = pd.DataFrame(create_data(5)).transpose()
df.head(5)
Output:-

AD-502 Machine Learning Lab - Exp 1-10
No ratings yet
AD-502 Machine Learning Lab - Exp 1-10
13 pages
Data Analytics Using Python Lab Manual
50% (2)
Data Analytics Using Python Lab Manual
8 pages
Python Lap Manual. (2) - 1
No ratings yet
Python Lap Manual. (2) - 1
42 pages
Python by Example Book 2 (Data Manipulation and Analysis)
No ratings yet
Python by Example Book 2 (Data Manipulation and Analysis)
105 pages
Fds Merged
No ratings yet
Fds Merged
102 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
FDS Final Manual
No ratings yet
FDS Final Manual
41 pages
Sarkar, DR Tirthajyoti - Roychowdhury, Shubhadeep - Data Wrangling With Python - Creating Actionable Data From Raw Sources-Packt Publishing (2019)
No ratings yet
Sarkar, DR Tirthajyoti - Roychowdhury, Shubhadeep - Data Wrangling With Python - Creating Actionable Data From Raw Sources-Packt Publishing (2019)
538 pages
Bizuayehu Getachew V.good
No ratings yet
Bizuayehu Getachew V.good
104 pages
Khadeeja - DS - PRACTICAL 4
No ratings yet
Khadeeja - DS - PRACTICAL 4
24 pages
Python For AIML2
No ratings yet
Python For AIML2
21 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Rufh 2
No ratings yet
Rufh 2
28 pages
Python
No ratings yet
Python
22 pages
DEV Manual - ESEC
No ratings yet
DEV Manual - ESEC
27 pages
NEEL (1) Edited Edited
No ratings yet
NEEL (1) Edited Edited
12 pages
ML Lab Manual
No ratings yet
ML Lab Manual
90 pages
Lab #2 - Data Analysis With NumPy and Pandas
No ratings yet
Lab #2 - Data Analysis With NumPy and Pandas
7 pages
Task 2 Python
No ratings yet
Task 2 Python
6 pages
Data Science Lab Exp Lis
No ratings yet
Data Science Lab Exp Lis
72 pages
Python For AIML1
No ratings yet
Python For AIML1
15 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
Hands-On Lab - API Examples Random User and Fruityvice API Examples
No ratings yet
Hands-On Lab - API Examples Random User and Fruityvice API Examples
6 pages
CS 3361 Set 2
No ratings yet
CS 3361 Set 2
3 pages
FDS Lab
No ratings yet
FDS Lab
43 pages
Edap Lab
No ratings yet
Edap Lab
47 pages
Foundation of Data Science Lab Manual Full
No ratings yet
Foundation of Data Science Lab Manual Full
8 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
NEEL
No ratings yet
NEEL
12 pages
NEEL (1) - Edited
No ratings yet
NEEL (1) - Edited
12 pages
Neel
No ratings yet
Neel
12 pages
AIYA Pre-Requisites Session 3
No ratings yet
AIYA Pre-Requisites Session 3
4 pages
ML IU48prac1,2
No ratings yet
ML IU48prac1,2
16 pages
Python Libraries Explained
No ratings yet
Python Libraries Explained
10 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
FODS Using Python Practical File
No ratings yet
FODS Using Python Practical File
18 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Conclusion
No ratings yet
Conclusion
3 pages
Capital Budgeting Sail
100% (1)
Capital Budgeting Sail
84 pages
5TH SFG, 1ST Special Forces Operational Report 31 July 1966
100% (1)
5TH SFG, 1ST Special Forces Operational Report 31 July 1966
54 pages
DS Final
No ratings yet
DS Final
46 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Ids 1
No ratings yet
Ids 1
30 pages
Data Toolkit Assignment
No ratings yet
Data Toolkit Assignment
30 pages
01 Introduction To Python
No ratings yet
01 Introduction To Python
36 pages
Instrumentation Module 3 Lesson 3
No ratings yet
Instrumentation Module 3 Lesson 3
40 pages
Instant Access To An Introduction To International Relations Theory Perspectives and Themes Third Edition Jill Steans Ebook Full Chapters
No ratings yet
Instant Access To An Introduction To International Relations Theory Perspectives and Themes Third Edition Jill Steans Ebook Full Chapters
55 pages
Untitled
No ratings yet
Untitled
300 pages
ML Manual
No ratings yet
ML Manual
21 pages
Rufh 4
No ratings yet
Rufh 4
24 pages
Manual
No ratings yet
Manual
48 pages
Computer Vision Lecture Notes All Compress
No ratings yet
Computer Vision Lecture Notes All Compress
17 pages
Random Numbers in Python: Riya Jacob K Dept of BCA 2020 - 21
No ratings yet
Random Numbers in Python: Riya Jacob K Dept of BCA 2020 - 21
18 pages
4BUIS014W Business Computing-Portfolio
No ratings yet
4BUIS014W Business Computing-Portfolio
7 pages
Fertilizer Brochure
No ratings yet
Fertilizer Brochure
8 pages
4 - Lighting and Energy Standards and Codes
No ratings yet
4 - Lighting and Energy Standards and Codes
34 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
BES Quality Teaching Diverse Students
No ratings yet
BES Quality Teaching Diverse Students
103 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
NumPy and Pandas Tutorial
No ratings yet
NumPy and Pandas Tutorial
8 pages
Predicting Adolescents Academic Achievement
No ratings yet
Predicting Adolescents Academic Achievement
192 pages
Berio Seq1
100% (1)
Berio Seq1
9 pages
PX - 120 - 01 - e Manual Casio Privia Px120
No ratings yet
PX - 120 - 01 - e Manual Casio Privia Px120
38 pages
Random Data Generation With NumPy
No ratings yet
Random Data Generation With NumPy
3 pages
Broker's Title, Inc. v. Ralph E. Main, JR., Robert H. Blodinger, Orbin F. Carter, 806 F.2d 257, 4th Cir. (1986)
No ratings yet
Broker's Title, Inc. v. Ralph E. Main, JR., Robert H. Blodinger, Orbin F. Carter, 806 F.2d 257, 4th Cir. (1986)
2 pages
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
No ratings yet
Exp - 1 - Introduction To Data Analytics and Python Fundamentals - SDK - Ok
9 pages
DL Experiment - 1
No ratings yet
DL Experiment - 1
10 pages
Unit 4 Grammar Summary
No ratings yet
Unit 4 Grammar Summary
14 pages
Pantasya
No ratings yet
Pantasya
33 pages
PH3094D Computational Lab - Exercise3
No ratings yet
PH3094D Computational Lab - Exercise3
3 pages
DL Experiment 2
No ratings yet
DL Experiment 2
9 pages
MAT 210 School Based
No ratings yet
MAT 210 School Based
3 pages
Data Science
No ratings yet
Data Science
3 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Transformation of Sentence - Magic Rules & Example
No ratings yet
Transformation of Sentence - Magic Rules & Example
9 pages
T 10D
No ratings yet
T 10D
7 pages
ICT Concept For FX
No ratings yet
ICT Concept For FX
24 pages
AI in HC 4
No ratings yet
AI in HC 4
5 pages
Self Crosslinking Acrylic Primer - HALOX 570 2
No ratings yet
Self Crosslinking Acrylic Primer - HALOX 570 2
1 page
AI in HC - 3
No ratings yet
AI in HC - 3
3 pages
DL Experiment 3
No ratings yet
DL Experiment 3
3 pages
Exercise and Tests of The Police Power
No ratings yet
Exercise and Tests of The Police Power
17 pages
Dame Project Machine Element Ii
No ratings yet
Dame Project Machine Element Ii
6 pages
Diesel SWD
No ratings yet
Diesel SWD
4 pages
Sales and Marketing: Process Map - Winter 08
100% (1)
Sales and Marketing: Process Map - Winter 08
7 pages
10997B Lab 03
No ratings yet
10997B Lab 03
9 pages
Discussion Board 3
No ratings yet
Discussion Board 3
9 pages
Rivers and Tributaries and National Parks of Peninsular India
No ratings yet
Rivers and Tributaries and National Parks of Peninsular India
1 page
Python BasicsGUIA PYTHON-01
No ratings yet
Python BasicsGUIA PYTHON-01
1 page
Check Answers
No ratings yet
Check Answers
2 pages
Epson WF C5790 Product Brochure
No ratings yet
Epson WF C5790 Product Brochure
2 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet

AI in HC - 1

Uploaded by

AI in HC - 1

Uploaded by

Experiment -1

Aim:- Generate a custom or synthetic data set using

To begin, let’s make sure we have the necessary libraries installed. In

!pip install numpy

# There are specific versions of these generators

# It can generate names

# Generate prefixes and suffixes (there are also gender specific

Male first names: Luis

# Generate Address components

# Or generate full addresses

# Even generate motto, etc.

Company name: Park-Osborne

Generate columns that match specific formats If you need to

# Use bothify to generate random numbers(#) or letters(?). Can

# Create fake True/False values

For categorical columns, you can specify a list of values to

industry = ['Automotive','Health Care','Manufacturing','High

# Generating choice without weights (equal probability on all

industry = ['Automotive','Health Care','Manufacturing','High

# p= specifies the probabilities of each category. Must sum to

# Generating choice without weights (equal probability on all

# 1st argument is mean of distribution, 2nd is standard deviation

# Generate random integer between 0 and 4

# Start and end dates to generate data

print(f'Random date between {my_start} & {my_end}')

from faker import Faker

industry = ['Automotive','Health Care','Manufacturing','High

You might also like