Sometimes there is a requirement to convert a string to a number (int/float) in data analysis. For each string, we can assign a unique integer value to differentiate string values.
For this, we use the data in Comma Separated Values(CSV) files. Say we have an excel file containing CSV data as follow −
Company | Industry | Recommendation |
---|---|---|
HDFC Bank | Finance | Hold |
Apollo | Healthcare | Buy |
Hero | Automobile | Underperform |
Yes Bank | Finance | Hold |
M&M | Automobile | Underperform |
Fortis | Healthcare | Buy |
Maruti | Automobile | Underperform |
Above is just a few lines from a large dataset, we need to give different recommendation .i.e. Buy, Hold, Underperform etc. integer values, which will link to our metadata. So for the above input, our expected output will be something like −
Company | Industry | Recommendation |
---|---|---|
HDFC Bank | Finance | 2 |
Apollo | Healthcare | 1 |
Hero | Automobile | 3 |
Yes Bank | Finance | 2 |
M&M | Automobile | 3 |
Fortis | Healthcare | 1 |
Maruti | Automobile | 3 |
Here is a way to replace our string(column values) to integers.
Code 1
#Import required library import pandas as pd #Import the CSV file into Python using read_csv() from pandas dataframe = pd.read_csv("data_pandas1.csv") #Create the dictionary of key-value pair, where key is #your old value(string) and value is your new value(integer). Recommendation = {'Buy': 1, 'Hold': 2, 'Underperform': 3} #Assign these different key-value pair from above dictiionary to your table dataframe.Recommendation = [Recommendation[item] for item in dataframe.Recommendation] #New table print(dataframe)
Result
Company Industry Recommendation 0 HDFC Bank Finance 2 1 Apollo Healthcare 1 2 Hero Automobile 3 3 Yes Bank Finance 2 4 M&M Automobile 3 5 Fortis Healthcare 1 6 Maruti Automobile 3
There is another way to write above code, where we don’t deal with a dictionary instead we directly assign another value to the columns field(Recommendations here) if condition matches.
#Import required library import pandas as pd #Import the CSV file into Python using read_csv() from pandas dataf = pd.read_csv("data_pandas1.csv") #Directly assigning individual fields of Recommendation column different integer value #if condition matches .i.e.In the dataframe, recommendation columns we have "Buy" we'll assign # integer 1 to it. dataf.Recommendation[data.Recommendation =='Buy'] =1 dataf.Recommendation[data.Recommendation =='Hold'] =2 dataf.Recommendation[data.Recommendation =='Underperform'] =3 print(dataf)
Result
Company Industry Recommendation 0 HDFC Bank Finance 2 1 Apollo Healthcare 1 2 Hero Automobile 3 3 Yes Bank Finance 2 4 M&M Automobile 3 5 Fortis Healthcare 1 6 Maruti Automobile 3
Above I’ve mentioned the only couple of way to replacing string data in your table(csv format file) to an integer value and there are many instances come up when you have the same requirement to change your data field from string to integer.