Salary Prediction

Uploaded by

Edwin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

61 views32 pages

Salary Prediction

Uploaded by

Edwin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 32

1810812023, 20:48 Salary Presicton # Import the required Libraries for data preprocessing import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns import warnings warnings. filterwarnings( ignore’) Data Preprocessing * Load the data using the Pandas read function print the dataset information © Print the stastics about the data using describe functior * Visualize the correlation map to understand the correlation with the columns Check for null values, and if the data contains any, remove them * Additionally, inspect for duplicate values and remove them if present. # Load the data set and print the top 5 rows data=pd.read_csv('C://Users//vinod//Downloads//salary csv" ) data.head() education- marital- age workclass fnlugt education ‘occupation relationship race sex num status © 39 Sategov 77516 Bachelors 13, Never Adm Notin- yhite Male maried ——dericl = family Martied 150 Seem 93311, Bachelors 13 ce SE Husbanc White Male spouse "99 23 inate 21564 Hod 9 ohaced Ma NOI ite Marted 3°53 Private 234721 Ith 7 cv. Mandlets: —usbané Black Male spouse “eaners Marted rot 4 28 Private 33640 Bachelors Bo cle fre Wife Black Female spouse SPecialty # about the data set data. info) localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso 328108/2023, 20:45 In [4] out[4] In [5] Salary Predicton Rangelndex: 32561 entries, @ to 3256€ Data colunns (total 15 columns) Column Non-Null Count Dtype 2 age 32561 non-null inte4 1 workclass 32561 non-null object 2 fnlwgt 32561 non-null intea 3 education 32561 non-null object 4 education-num 32561 non-null int64 5 marital-status 32561 non-null object & occupation 32561 non-null object 7 relationship 32561 non-null object 8 race 32561 non-null object 9 sex 32561 non-null object 1@ capital-gain 32561 non-null intéa 11 capital-loss 32561 non-null inté4 12 hours-per-week 32561 non-null inté4 13 native-country 32561 non-null object 14 salary 32561 non-null object dtypes: inte4(6), object(9) memory usage: 3.7+ ME # Under standing stastics in the data set data.describe(). style. background_gradient (cmap="tab20c’ ) education: age frlwgt Nem @pital-gain —capital-loss 199778366512 (RR ae ceo) SEEEM ha ee Cord cE) # Checking the correlation matirx ple. Figure (Figsize=(10,4)) ‘sns.heatmap(data.corr(),annot=True, cmap="winter_r’, fmt=' ple title("correlation map") pit show) 2" ,Linewidths=1) localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso1810812023, 20:48 Salary Presicton Correlation map. 10 08 06 | on oz 00 fhiwot —education-num capitakgain capita-tosshours-per week Data Cleaning Process In [6]: # Checking the null values in the data set data. isna().sum()/len(data)*10e age ae workclass ae frlwgt ee education ae education-num @.€ marital-status @.¢ ‘occupation ee relationship ee race ae sex ae ee ee ee ee ee out capital-gain capital-loss nours-per-week native-country salary dtype: floated In [7]: Checking the Percentage of the null values in the dataset null_values=data.isna().sun() total_shel1s-np. product (data. shape) total_missing_values=null_values.sum() percentage_missing_values=(total_missing_values/total_shells)*100 print(f'The data set contains {percentage_missing values} of values') The data set contains 0.0 of values In [8]: # Checking the duplicate values in the dataset duplicate=data.duplicated().sum() print(f'There is {duplicate} values in the data set we remove it’) There is 24 values in the data set we remove it In [9]: # Remove the duplicate values and store the data set as data variable data-data.drop_duplicates() after_renove_duplicates-data.duplicated().sum() print(f' There is {after_remove duplicates} values in the data set‘) Iocahost 8888inbconverthiml’Salary Precctonipynb ?éownload=false 1321810812023, 20:48 Salary Predicton There is @ values in the data set Explore Data Analysis Process Question asked from the data: | We use a for loop to print count plots for numerical columns to understand the most repeated values. * We also visualize a histogram for the number of hours employees work * Additionally, we visualize the output percentages using a pie chart * Furthermore, we perform data cleaning by replacing unwanted names in the dataset with “Others”. * We create a pie chart to understand the distribution of output in numerical columns * We generate a separate dataset for the USA to analyze the most demanded education anc jobs * We explore the education levels with the most hours worked in the USA # Using a for loop, we visualize selected numerical columns in the USA dataset using bar charts * Additionally, we utilize box plots with the dataset and hue values based on salary using ¢ for loop * Furthermore, we create a pivot table for better data understanding. © Lastly, we visualize the most demanded jobs with work hours ranging from less than 20 hours to 40 hours. # create a count plot to visualize the some numerical columns in the data numerical=[ ‘education-nun' , "capital-gain’, ‘hours-per-week' ] for i in numerical: pit. figure(figsiz sns..countplot (dat: pit.title([i]) pit .xticks (rotation=98) pit. show() localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso 1321810812023, 20:48 Salary Predicton Teducation-num'] ‘2000 6000 count 2000 a a ae a ee a education-num 2 B u 6 16 [capital-gain') 20000 35000 20000 5000 localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso 51321810812023, 20:48 Salary Predicton Chours-per-week'] 12000 19000 ‘8000 count ‘6000 4000 2000 plt. figure(figsize=(10,5)) sns-histplot (data-data, x=hours-per-week' , bin: plt.title( plt.xlabel( plt.ylabel ("Count of value") pit. show() Distribution of the hours-per-waek 18000 14000 12000 1g 10000 5 2000 § e000 000 2000 ° 3 2 0 6 ” 380 Hours # Let's find the percentage of the gender in the data set using the pie chat datal' sex" ].value_counts().plot(kind=' pie", explode=[0,2.21, labels=[‘Male", "Female" ], colors=["blue', ‘gray’ ], autopct="%1.2°%%" shadow=True, ) plt.title("Visualize the Gender percentage in the data") plt.show() localhost 888inbconverthiml’Salay Precctonipynb ?download=falso e1321810812023, 20:48 Salary Prediction Visualize the Gender percentage in the data Male sex Female In [13]: # Distibution of the age column with the gender plt. figure(Figsize=(10,6)) sns.countplot (data=data, x="age" ,hu plt.xticks(rotation=99) ‘deep') “sex' palett plt.show() 600 me Male mmm Female 300 400 3 8 300 200 100 ° ROR RRR AERA RSI TI 8 AANA ESSE BBR EE ET RR a ae In [10]: plt.figure(#igsize=(10,5)) data[ ‘salary’ ].value_counts().sort_values(ascending=False) .plot(kind="bar', Iocahost 8888inbconverthiml’Salay Precctonipynb ?download=falso 1321810812023, 20:48 In [15 In [16 In [17 Salary Prediction color=[' #A9E2F3",, "#190; plt.title("Visualize the Salary values in the data") plt.xlabel("Salary*) plt.ylabel("Count of the values") pt. show() Visualize the Salary values in the data 25000 20000 15000 10000 Count of the values 50k z salary # The data contains the unwanted informatin we would Like to remove and add new values datal‘native-country' ].value_counts().head(3) United-States 29153 Mexico 638 ? 582 Name: native-country, dtype: int6a data ‘native-country' ]=data[ 'native-country’].str.replace('?', ‘others') data[ 'workclass" J=data[ 'workclass" ].str.replace('?', ‘others' ) data[' occupation’ J=datal ‘occupation’ ].str.replace('?', ‘others’) # Visualize the top 20 countrys in the dataset datal‘native-country' ].value_counts().nlargest(20)\ -plot(kind= "bar" ,title="Top 28 country in the data set" hatch= plt.xlabel("Country name plt.ylabel("Count of values") pt. show() » Figsize=(10,5),colot localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso aise1810812023, 20:48 Salary Presicton Top 20 country in the data set 30000 oso sou f | 10000 | 2 Count of values g sooo }¥0. : United States Mexico others Philippines Germany canada Puerto-Rico salvador india cuba England Jamaica south china aly Vietnam Japan ouatemala Poland Dominican-Repubii ‘country name In [18]: # Create a pie chart to understande the relationships, race, sex percentage with output another_list=[ ‘relationship’ , ‘race’, "sex" ] num_of_colunns=1en(another_list) plt.figure(figsize=(25,8)) for i, col in enumerate(another_list): plt. subplot (1, num_of_columns, i+1) data[col].value_counts() .plot(kind= plt.title([col]) plt.tight_layout() pit. show() pie’ ,autopct="%1.1F2%", startangle=9@) In [19]: |# Create a pie chart to understand the percentage of the workclass and education and a work_place=[‘workclass", ‘education’, ‘narital-status' ] rium_colunn=1en (work_place) pit. Figure(figsize=(25,8)) for i,col in enunerate(work place): pit. subplot(1,num_colunn, #+1) data[col] .value_counts() .plot (kind= "pie, autopct= plt.title([col]) 1.14%", startangle=90) Iocahost 8888inbconverthiml’Salay Precctonipynb ?download=falso 91321810812023, 20:48 Salary Predicton plt.tight_Layout () plt.show() In [28]: # Apply the different condition to the data to creat a seperate data frame for united usa=data[data[ 'native-country" ]==" United-states'] # Find the which education is most demanding in the unitedstates usa[ ‘education’ }.value_counts().sort_values (ascending-False)\ -plot (kind=' bar’, figsize=(10,5),hatch='*" , color=['#81F781' , '#FACC2E" , “HESCEF6" ,"#F2F5/ plt.title(*Most Demanding eudcation in the united states") plt.xlabel ("Degree") plt.ylabel("Count of the values") pit. show() Most Demanding eudcation in the united states 19000 ‘8000 ‘6000 4000 ‘count of the values 2000 ° = fe e2 2662 § FE EG EE & £3 ¢ ¢ 3 2 2 € =& sg 2 $ 23,2 5 3 gk goa oR § iG 3 3 : g E 3 2 ons In [21]: # Find the total working hours with there education using the bar chart usa. groupby( education" )[ 'hours-per-week' ].sum().sort_values(ascending-False)\ -plot (kind= ‘bar’ , figsize=(10,5),hatch='//" ,color=['#D8F781' , ‘#@080FF" ,'#@86A87", ' #0A8: plt.title("Total Working hours with there degree”) plt.xlabel("Job") plt.ylabel("Count of values") pit. show() localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso 01321810812023, 20:48 Salary Presicton ‘otal Working hours with there degree 400000 200000 250000 200000 count of values 150000 200000 REE RE £ E282 8 8 8 2 8 pede, Pt ta aE "ea | ia 2 3 z a & po #Create a countplot to understanding the informaiton about the united states plt. figure(figsize=(13,6)) for i in ['workclass’, ‘education’, ‘race’, ‘relationship, 'sex"]: sns.countplot (data=usa,x=1,hue=' salary’, palette=' viridis") plt.title(#' information about the {1} column with salary") plt.xlabel([1]) plt.ylabel("Count of the values") plt.xticks(retation=90) plt.show() information about the workclass column with salary $ 5 Federab gee setampinc Ssltempacine inoue pay eras localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso se1810812023, 20:48 Salary Predicton information about the education column with salary 3000 7000 6000 5000 4000 3000 Count of the values 2000 1000 Bachelors HS-grad 11th Masters Some-college ‘Assoc-acdm Doctorate 9th Assoc-voc 10th ‘Tth-8th_ Prof-school Ast-4th Preschool 5th-6th 12th [education'] localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso aise1810812023, 20:48 Salary Presicton information about the race column with salary 17500 15000 12500 10000 7500 Count of the values 5000 2500 & zt £ 2 a 8 Asian-Pac-lslander Amer-Indian-Eskimo [race'] localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso 31321810812023, 20:48 Salary Presicton information about the relationship column with salary 7000 6000 5000 4000 3000 ‘Count of the values 2000 1000 Not-in-family Husband Wife ‘own-child Unmarried Other relative [relationship*] localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso aise1810812023, 20:48 Salary Presicton information about the sex column with salary 14000 salary mmm <=50K 1z900 mm >50K 10000 [ 8000 6000 Count of the values 4000 Female ['sex'] In [23]: # Create a boxplot for numerical column with age for i in [‘workclass' , ‘education’, ‘marital-status*, ‘occupation’ , ‘relationship’, ‘race’ plt.figure(figsize=(16,5)) sns..boxplot (data=data, x=data[i],y="age' palette='his') pit.title({i]) pit.xlabel([i]) plt.xticks(rotation=98) pit.ylabel( ‘Age ) plt.show() localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso 15132Salary Presicton 1810812023, 20:48 [workclass'] anuomsonoN Aedanounim Sur-dwsyias sroqo 064j2307 nob-erepay penis punou-duiesies, pob.ares (rmorkclass'} [education’) 16192 wer roowpsaus wapast ‘nor wnsans oops Jota s1e10p00 wane aan-2085y education’) upre-s0ssy abajosawos, false wwe srorsen. wnt pests siojsupe3 localhost 8888inbconverthiml'Salary Prediction pynb?downloadSalary Presicton 1810812023, 20:48 Cmarital-status'] 20 20 0 60 so 0 20 20 pamopin asnods-sy-pauien, payesedas juasqe-asnods-pauen pon10nia ‘asnods-n-pauuen poueuianen [maritatstatus'] Coccupation'] nies-2snoy.nue seniou-pauuiy ies-aniay0ig sieqno poddns-upes pdsurdo-auyen Surysy-Suyuures Suinow-uodsuent edaryers sare aaunvasse120 Aepadssord srouee)>-si9jpuely reuadeuewoaea yeoue;>upy occupation) vise false localhost 8888inbconverthiml'Salary Prediction pynb?download1810812023, 20:48 Salary Presicton relationship’) . 80 70 60 20 a wite own-chi | fj - a relationship‘) frace') 20 ’ ‘ ‘ 20 | ’ 10 60 20 white Black 3S & 8 | LL In [ # Find the job between the range 26 to 40 hours with workclass filterd_jobs=datal (data[ ‘hours-per-week’ ]<2@) & (data ‘hours-per-week' ]<=40)] filterd_jobs[ ‘workclass’ ].value_counts().plot (kind="bar’ ,figsize=(10,5), color=["#A9FSE plt.title("Top most working jobs in the data") plt.xlabel ("Jobs") Iocahost 8888inbconverthiml’Salay Precctonipynb ?download=falso arse1810812023, 20:48 Salary Prediction plt.ylabel("Count of values") pt. show() ‘Top most working jabs in the data 1900 200 | 600 count of values 200 others settempnotine ff Private | BS Federalgov 1 Seltempine | without pay Never worked pbs In [25]: # using groupby condition find the some intresting facts data. groupby(‘ education" )[ "salary" ].value_counts()\ sunstack() .style.background_gradient (cmap="gist_heat_r’) localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso 19132salary <=50K >50K education oy ees ‘10 BEE xoxo 1st-4th — 160,000000 6.000000 | eee -802,000000 HS-grad Masters 763.000000 Preschool 0.000000 Prof-schoo! _ 153.000000 Some-college # Create a pivot table pivot_table=data.pivot_table(colunns="workclass' ,index=" education’ , value: pivot_table. style.background_gradient(cnap="cividis_r") localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso Salary Presicton 201321810872028, 20:48 Salary Presicton Self-emp- notine Never- worked Private Selfemp-ine Stat Workclass Federal-gov —_Local-gov ‘education oth — 253,000000 1185.000000 25595.000000 729,000000 2915.000000 $08.0 rath r7s0n00 esotctn ‘easton 238c0n00 esncomon 500 ssa EEE) seonnon Serer soonen sTsinow 200 r-si| raomzo ‘esto ee ‘9th 120.0000 853.0000 14806.000000 469.000000 1417.000000 2380 Assoc 2257000000 3556000000 279000000 167700000 3139400000 15110 Assoc 570000000 3582000000 «432000000 1656000000 057000000 16900 Cries 2035.000000 Bee cti) amr) Doctorate 803000000 _1177.000000 EI 8809.000000 1914000000 2087,000000 HS-grad es Masters een) 39696.000000 '5352.000000 Preschool 130.000000 1495,000000 240 Pr saxrooo0n0 1316000000 "2362000000 002.0000 15540 nee eee osc aE Toor Parone) # Sone intresting questions asked from the data print('The most demanding education is’ ,data[ ‘education’ ].value_counts().idxmax()) print("\n the least demanding education is’ ,data[ ‘education’ ].value_counts().idxmin(? print("\n The highest working hours in the data is’, data[ 'hours-per-week' ].value_count print("\n The less working hours in the data is‘ ,data[ "hours-per-week' ].value_counts(: print("\n Most dominate race is’ ,datal'race’].value_counts().idxmax()) print("\n Less dominate race is ',data['race’].value_counts().ddxmin()) print(*\n Most dominate occupation is’ ,datal ‘occupation’ ].value_counts().idxmax()) print("\n Less dominate race is‘ ,datal ‘occupation’ ].value_counts().idxmin()) localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso 23218/08/2023, 20:45 Salary Predicton The most demanding education is HS-grac the least demanding education is Preschool The highest working hours in the data is 4@ The less working hours in the data is 82 | Most dominate race is White Less dominate race is other Most doninate occupation is Prof-specialty Less dominate race is Armed-Forces In [28]: |# Find the average working hours with occupation where more then 5@ hours long_hours_jobs=data[ (data['hours-per-week* ]>=5@) ] long_hours_jobs .groupby( ‘occupation’ )[ ‘hours-per-week’ ].mean().sort_values(ascending=I «plot (kind= "bar", figsize=(10,5),color=[‘#D@FSA9", "#FSBCA9", '#@@40FF*,#F781BE"]) plt.title("Average hours per week differnt occupation") plt.xlabel (“occupation”) plt.ylabel(“Avergage hours per week") plt.show() Average hours per week differnt occupation ‘Avergage hours per week s Privhouse-serv arming fishing protecive-serv otherservice others ‘Transport moving “eeh-suppor Protspecialty Machine-opsinspet Sales Craferepair execmanagerial Handlers-cleeners Admlerical anmed-Forces occupation # Find the average age of the bachelors degree holder bachelors=datal (data[ ‘education’ }==" Bachelors’) ] find_the_averge_age=bachelors..groupby( sex’ ){ ‘age’ ] .mean().sort_values(ascending=Falst plt. Figure(figsize=(7,5)) plt. bar(Find_the_averge_age. index, Find_the_averge_age.values, color=["#FAS882" , "#F6CEEC plt.title("Average age of the bachelors degree holders”) plt.xlabel( Gender") In [ Iocahost 8888inbconverthiml’Salay Precctonipynb ?download=falso zs1810812023, 20:48 Salary Prediction plt.ylabel("Average age") pt. show() Average age of the bachelors degree holders 40 35 30 25 20 Average age 15 10 Male Female Gender Observations: We observed that the majority of working hours per week fall within the 30 to 40 range. © The pie chart illustrates a higher percentage of males in the dataset. # The USA has the highest number of records in the dataset + A significant portion of employees earned a salary of less than 50k. * Within the USA data, the most demanded degree is high school (hs-degree), which alse corresponds to the highest working hours. © The pie chart provides insights into various aspects of the output * Employees in the “private house service" sector work for more than 50 hours per week. * Working hours for employees in private companies typically range between 20 to 40 hours Machine Learning Modeling # Install the all Required Libraries for the machine Learning Modeling from sklearn.model_selection import train test_split rom sklearn.preprocessing import LabelEncoder, StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, confusion_matrix, classification_report Iocahost 8888inbconverthiml’Salay Precctonipynb ?download=falso 2321810812023, 20:48 Salary Prediction from sklearn.ensenble import RandonForestClassifier fron sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from xgboost import xG8Classifier rom catboost import CatBoostClassifier for col in data. select_dtypes(include='object') Labelencoder=LabelEncoder() Labelencoder. fit (data[col] .unique()) data[col]=1abelencoder .transfora(data{col]) ee eee ae et ae | # split the data into independent and dependent variable data.drop(['salary'],axis=1) lata['salary'] # Normalization the data using the Standard Scaler standard-StandardScaler() tandard.fit_transform(X) # Split the data into train and test data X_train,X_test,y_train,y_test-train_test_split(x,y,test_siz ).20,randon_state=220) # Create a function for machine Learning modeling def machine_learning_model (nodel,x_train,X_test,y_train,y_test): In the function we write about the code for machine learning model Firstly we fit the train data to the model and predict the values with test data and store the values with variable and then print the accuracy score along with classification and confusion matrix print(f'The {model} *) model. Fit(X_train,y_train) y_pred=model. predict (x_test) model_score-accuracy_score(y_test,y_pred) print(f"\nthe accuracy score of the {model} is {nodel_score*100 :.2F}") print(f"\n (classification_report(y_test,y_pred)}") print(#"\n{confusion_matrix(y_test,y_pred)}") matrix=confusion_matrix(y_test,y_pred) sns.heatmap(matrix, annot=True, cnap='Reds", fmt=".2F" , Linewidth: plt.show() print(*="*30) models={ “ logistic’ : LogisticRegression(penalty="12'), ‘decison’ :DecisionTreeClassifier(criterion="gini', splitter="best’, ‘Random’ :RandonForestClassifier(n_estimators=5@, criterion="gini'), "kn :kNeighborsClassifier(),, ‘xg’ :xGBClassifier(), “catboost' :CatBoostClassifier(iterations=1) for i in range(1en(nodels)): nodel_nanes=list(nodels.values()) [1] names=list(models.keys()) [4] # And apply the machine Learning function to the models machine_learning model (model_nanes,x_train,X_test,y_train,y test) locahost 8888inbconverthiml’Salay Precctonipynb 7download=falso 2432‘10872023, 20:45 Salary Presicton The LogisticRegression() The accuracy score of the LogisticRegression() is 82.62 precision recall fi-score support e 0.84 © 8.95 8.8944 1 0.73 0.45 = @.55 1567 accuracy 0.83 6508 nacro avg 0.78 8.78 8.726508 weighted avg 0.82 8.83 8.816508 1467 264) [867 700)! 264.00 a: 867.00 700.00 0 1 The DecisionTreeClassifier() The accuracy score of the DecisionTreeClassifier() is 80.82 precision recall fi-score support e 0.88 0.87 0.87 4941 1 0.68 0.61 0.61 1567 accuracy e.81 e508 macro avg 0.74 0.74 0.74 508 weighted avg 0.81 2.81 0.81 6508 [4304 637) [ 611 956] localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso 4500 4000 3500 3000 2500 2000 - 1500 - 1000 - 500 25921810812023, 20:48 a: 611.00 Salary Presicton 637.00 956.00 The RandonForestClassifier(n_estimators=50) The accuracy score of the RandonForestClassifier(n_estimators=5@) is 85.59 precision recall f1-score e 0.88 0.93 1 0.74 0.62 accuracy macro avg 0.81 0.77 weighted avg 0.85 0.86 [4604 337) [ 601 966] localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso support 0.91 4941 0.67 1567 0.86 6508 0.79 6508 0.85 6508 4000 3500 3000 2500 - 2000 - 1500 - 1000 26921810812023, 20:48 Terme) 601.00 The KNeighborsClassifier() The accuracy score of the KNeighborsClassifier() is 83.37 accuracy aacro avg weighted avg [4520 421 [ 661 906] precision 0.78 localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso Salary Presicton 337.00 966.00 recall f1-score support 4941 1567 6508 6508 6508 4500 4000 3500 3000 2500 2000 - 1500 - 1000 -500 ame1810812023, 20:48 Salary Predicton - 4500 4900 of 20.00 421.00 3500 | 3000 2500 2000 | 661.00 906.00 ~ 1500 - 1000 - 500 localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso 2a1810812023, 20:48 Salary Presicton The XGBClassifier(base_score-None, booster colsample_bylevel=None, colsample_bynode=None, colsample_bytreesNone, early_stopping_round: enable_categorical=False, eval_metric-None, feature_type: interaction_constraints-None, learning rate-None, max_bit mmax_cat_threshold-None, max_cat_to_onehot min_child_weight-None, missing-nan, monotone_constraints=None, ‘_estimators=100, n_jobs=None, num_parallel_tree=None, sPedictor-None, random_stat The accuracy score of the XGBClassifier(base_score-None, booster-None, callbacks=Nor accuracy aacro avg weighted avg [14838 303) [553 1014)) colsample_bylevel=None, colsample_bynode=None, colsample_bytreesNone, early_stopping_rounds-None, enable_categorical=False, eval_metric-None, feature_types=None, gamnasione, gpu_id=None, grow_policy=None, importance_type=None, interaction_constraints-None, learning ratesNone, max_bin-None, mmax_cat_threshold-None, max_cat_to_oneho' min_child_weight-None, missing-nan, monotone_constraints=None, ‘_estimators=100, n_jobs=None, num parallel_tree=None, sredictor-None, random_stat ) is 86.85 precision recall fi-score support 0.89 0.94 0.92 4941 0.77 0.65 0.70 1567 0.87 6508 0.83 0.79 e.81 6508 0.86 0.87 0.86 6508 localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso 2o1321810812023, 20:48 a: 553.00 Salary Presicton 303.00 1014.00 4500 4000 3500 3000 2500 2000 - 1500 - 1000 - 500 The Learning rate set to 0.5 a: learn: @.4868985 total: 165ms The accuracy score of the is 84,53 precision recall f1-score e 0.86 0.95 1 0.77 0.51 accuracy macro avg 0.82 0.73 weighted avg 0.84 0.85 [4706 235 772795) localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso 0.90 0.61 26 0.85 2.76 0.83 remaining: ous support 4941 1567 6508 508 6508 01921810812023, 20:48 Tl Mo) 712.00 c Random=RandonForestClassifier() machine_learning_nodel(Random,X_train,X_test,y train,y test) The RandonForestClassifier() The accuracy score of the RandomForestClassifier() is 85.93 accuracy nacro avg weighted avg [4616 325 [ se1 976] precision recall f1-score localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso Salary Presicton 235.00 795.00 support 4941 1567 6508 6508 6508 4500 4000 3500 3000 2500 2000 - 1500 - 1000 - 500 a21810812023, 20:48 Salary Predicton 4500 4000 3300 | 3000 4616.00 325.00 2500 2000 as 591.00 976.00 y 2800. - 1000 - 500 c # Let's dump the model import pickle # Let's dump the mode pickle. dump(Random, open('RandonForest.pkl', "wo')) localhost 8888inbconverthiml’Salary Precctonipynb 7download=falso sas

2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
Machine Learning Project
67% (3)
Machine Learning Project
30 pages
Salary Prediction LinearRegression
100% (1)
Salary Prediction LinearRegression
7 pages
PySpark Slides
No ratings yet
PySpark Slides
30 pages
AL Notes
No ratings yet
AL Notes
61 pages
Machine Learning
No ratings yet
Machine Learning
67 pages
EDA Python Code Cheatsheets
No ratings yet
EDA Python Code Cheatsheets
52 pages
My Python
No ratings yet
My Python
48 pages
DADM Unit 5 Programs
No ratings yet
DADM Unit 5 Programs
63 pages
Seaborn Besant
No ratings yet
Seaborn Besant
27 pages
Data Scientist Salaries 1686594662
No ratings yet
Data Scientist Salaries 1686594662
29 pages
Datascience 2 PDF
No ratings yet
Datascience 2 PDF
24 pages
R Working Materials Prep
No ratings yet
R Working Materials Prep
43 pages
Intermediate Python
No ratings yet
Intermediate Python
22 pages
Intermediate Python
No ratings yet
Intermediate Python
22 pages
Kushal Kadayat
No ratings yet
Kushal Kadayat
33 pages
ML Projects
No ratings yet
ML Projects
22 pages
Aiml
No ratings yet
Aiml
27 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
Salaries For San Francisco Employee - ML - FA - DA Projects
No ratings yet
Salaries For San Francisco Employee - ML - FA - DA Projects
33 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Observation: Import As Import As Import As Import As
No ratings yet
Observation: Import As Import As Import As Import As
31 pages
06 Seaborn
No ratings yet
06 Seaborn
13 pages
Python Report Ritik
No ratings yet
Python Report Ritik
15 pages
s05 Solution
No ratings yet
s05 Solution
15 pages
HR Analytic Using Logistic Regression
No ratings yet
HR Analytic Using Logistic Regression
12 pages
Lecture 2
No ratings yet
Lecture 2
30 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
No ratings yet
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
10 pages
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
No ratings yet
Predictive+Modelling+-+Logistic+Regression+-+Student+Version-New2.3.ipynb - Colaboratory
12 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
Data Project
No ratings yet
Data Project
12 pages
Samana Tatheer-Assign 7-20U00323.Ipynb - Colaboratory
No ratings yet
Samana Tatheer-Assign 7-20U00323.Ipynb - Colaboratory
9 pages
Chapter 1
No ratings yet
Chapter 1
19 pages
Salaries For San Francisco Employee
No ratings yet
Salaries For San Francisco Employee
30 pages
Linear Regression2
No ratings yet
Linear Regression2
9 pages
Churn Prediction - Commercial Use of Data Science
No ratings yet
Churn Prediction - Commercial Use of Data Science
25 pages
Maxbox Starter139 Top5 Data Diagram Types
No ratings yet
Maxbox Starter139 Top5 Data Diagram Types
4 pages
Python Lab 9
No ratings yet
Python Lab 9
8 pages
Logistic
No ratings yet
Logistic
5 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
7 pages
Exam 1 Review Sheet
No ratings yet
Exam 1 Review Sheet
9 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
Assignment 2 297
No ratings yet
Assignment 2 297
6 pages
EXP-4 DMusingPYTHON
No ratings yet
EXP-4 DMusingPYTHON
7 pages
Data Preprocessing
No ratings yet
Data Preprocessing
18 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Machine Learning Engineer Nanodegree Supervised Learning Project: Finding Donors For CharityML
No ratings yet
Machine Learning Engineer Nanodegree Supervised Learning Project: Finding Donors For CharityML
16 pages
Capstone Project Assignment
No ratings yet
Capstone Project Assignment
3 pages
Employee Info
No ratings yet
Employee Info
2 pages
Step-by-Step Explanation of Python Data Preprocessing Script
No ratings yet
Step-by-Step Explanation of Python Data Preprocessing Script
9 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
2 pages
Eda - 1@3pm 8th Nov
No ratings yet
Eda - 1@3pm 8th Nov
2 pages
Data Preprocessing & Visualization1
No ratings yet
Data Preprocessing & Visualization1
2 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages

Salary Prediction

Uploaded by

Salary Prediction

Uploaded by

You might also like