AI in HC - 1
AI in HC - 1
Theory:- Faker is a Python library that generates fake data for you.
It is useful to create realistic-looking datasets and can generate all
types of data. We’ll explore those most relevant for customer demos
but the documentation details all the “providers” of fake data
available in the library.
import pandas as pd
from faker import Faker
import numpy as np
fake = Faker()
# First name
for _ in range(3):
print(fake.first_name())
Output:-
Tyler
Mark
Susan
There are providers for different types of data we can
generate on a fake “customer” by calling the appropriate
Faker provider.
# Generate emails
print('Company emails: ' + fake.ascii_company_email())
print('Safe emails: ' + fake.ascii_safe_email())
print('Free emails: ' + fake.ascii_free_email())
print('ASCII Emails: ' + fake.ascii_email())
print('Emails: ' + fake.email())
Output:-
# Company names
print('Company name: ' + fake.company())
print('Company suffix: ' + fake.company_suffix())
Output:-
# Specify % True
print(fake.boolean(chance_of_getting_true=25))
import numpy as np
import numpy as np
Output:- Automotive
Manufacturing
Output:-
1174.2251307283339
961
0
print(fake.date_this_century().strftime('%m-%d-%Y'))
print(fake.date_this_decade().strftime('%m-%d-%Y'))
print(fake.date_this_year().strftime('%m-%d-%Y'))
print(fake.date_this_month().strftime('%m-%d-%Y'))
print(fake.time())
import pandas as pd
Output:-
01-28-2005
07-16-2020
03-19-2023
11-04-2023
18:31:29
Random date between 2021-01-01 00:00:00 & 2021-12-31 00:00:00
'11-04-2021
print(fake.year())
print(fake.month())
print(fake.day_of_month())
print(fake.day_of_week())
print(fake.month_name())
print(fake.past_date('-1y'))
print(fake.future_date('+1d'))
Output:-
1994
11
20
Friday
January
2022-12-21
2023-11-25
Use all the above code to generate a custom
dataset.
fake = Faker()
def create_data(x):
# dictionary
b_user ={}
for i in range(0, x):
b_user[i] = {}
b_user[i]['name'] = fake.name()
b_user[i]['job'] = fake.job()
b_user[i]['birthdate'] =
fake.date_of_birth(minimum_age=18,maximum_age=65)
b_user[i]['email'] = fake.company_email()
b_user[i]['company'] = fake.company()
b_user[i]['industry'] = fake.random_element(industry)
b_user[i]['city'] = fake.city()
b_user[i]['state'] = fake.state()
b_user[i]['zipcode'] = fake.postcode()
b_user[i]['netNew'] =
fake.boolean(chance_of_getting_true=65)
b_user[i]['sales_rounded'] =
round(np.random.normal(1000,200))
b_user[i]['sales_decimal'] = np.random.normal(1000,200)
b_user[i]['priority'] = fake.random_digit()
b_user[i]['industry2'] = np.random.choice(industry)
return b_user
df = pd.DataFrame(create_data(5)).transpose()
df.head(5)
Output:-