0% found this document useful (0 votes)
29 views

Data Dict Dataframes Lists

data_dict_dataframes_lists

Uploaded by

Suresh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
29 views

Data Dict Dataframes Lists

data_dict_dataframes_lists

Uploaded by

Suresh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 18
318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners Article Search for Projects 6 Python Pandas Dataframe Tutorial for Beginners What is a pandas dataframe ? Pandas is a software programming library in Python used for data analysis. Pandas provides data structures land tools for understanding and analysing data. ‘The simplest way to understand a dataframe is to think of it as a MS Excel inside python. Just like how MS ‘excel is used to store data, has rows/columns and you can perform operations on the data, similarly you can do all those with a dataframe. ‘There are many ways to deal with data in python including serio structure of choice used by data scientists. Dataframes can deal with large amounts of data and support powerful functions to manipulate the data. Creating dataframes from csv / dictionary / list, adding rows, columns,using dataframe indexes and working with missing data are all part of the EDA (exoloratory data analysis) stage of a data science project. Adatatrame is represented in python code as ‘df. All dataframe operations are preceded by ‘df. [operation lists and dictionaries, but dataframe is the ntps:www projector joartcelpytnon-pandas-dataframe-ttorials/405 ane 318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners IDV LO INO y 1d: 1e) | aby a What is MultivariateOLS model in the StatsModels library? Downloadable solution code | Explanatory videos | Tech Support Where can dataframes be created from ? Dataframes can be created from the following data sources - dictionaries, lists, arrays, series, csv files, Mysql connection to a database ete. ‘What is a pandas series vs dataframe ? series is a 1-dimensional representation of data and hence has only column while a dataframe is a 2- dimensional table Numpy versus Pandas Numpy is another popular library used for data manipulation but itis largely used for numerical data, Dataframes however provide powerful functions to work across tables containing multiple data types. Table of Contents + Python Pandas Dataframe Basics «1. How to create a Dataframe «2, How to sort rows within a pandas dataframe + 3. How to find the largest value in a pandas dataframe + 4, How to list unique value in a pandas dataframe + 5, How to delete duplicates from a pandas dataframe * 6 Rename column header in a pandas dataframe + 7. Search pandas dataframe for a value * 8. Drop row and column in a pandas dataframe * 9. Replace multiple values in a pandas dataframe * 10. Save pandas dataframe as a csv file + 12, How to filter in a pandas dataframe 3 * 13, How te calculate moving average in a pandas dataframe #14, How to normalise a column in a pandas dataframe + 15, How to assign new columns in a pandas dataframe ntps:www projector jofartcelpython-pandas-datframe-tatorals!405 28 318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners + 16, How to rank a pandas dataframe in ascending and descending order Maxi e Your Productivity and ROI with ProjectPro eee] Peery ical cece (sce | hegl eee ee sereecraes Eee FA game ece cad asd Deploy Projects tonterprise rade [mg] Unimited a Sesions with Top : 4 pentane ean Book Free Demo @ProjectPro Python Pandas Dataframe Basics PANDAS DATAFRAME TUTORIAL DataFrare Pandas 1. How to create a Dataframe Every dataframe usage will have the following line at the beginning of your code: import pandas as pd (Once you have identified where your data is coming from and have stored it in an object for example “data’. You can create your dataframe with the following command, This will convert all the data stored in “data’ ‘object into a 2-dimensional dataframe representation and create a dataframe. df= pd DataFrame(data) hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, 38 318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners Example Tutorial Check out the first few lines of this pandas dataframe example to see how a dataframe is created. Here are some of the ways to create a DataFrame in Python Pandas’ New Projects Build a Streaming Pipeline Langchain Project for with DBT, Snowflake and. Customer Support App in. st of lists Creating a DataFrame from a 4 First import the pandss Lorary mort pandas as pa 4 create a List of Lists List_of_2 = [{9anvary’, 24], [“February’, 28], [‘Narch’, 21)] 4 creating the Pandas DatoFrame 4F » pé.vataFrame(1ist_of lists, coluens = [‘Wonth’, ‘Days"}) 1 to display the Oatafrane. “ ‘The above code snippet generates the following DataFrame Month Days 0 January 31 1 February 28 2 March 31 Creating a DataFrame from a dict of lists: While creating a DataFrame in Pandas from a dictionary oftists all the ists within the dictionary have to be of the same length. If the index is also passed while creating the DataFrame, then the length of the index should also be equal to the length of the lists in the dictionary. Ifthe index is not passed, the index of the DataFrame will be range(n) by default, where n is the length of each list in the dictionary. hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, ane 318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners The keys of the dictionary become the column names of the DataFrame and their values, which are lists from the rows and columns. Anport pandas as pd o 4 create dict of lists dict_of list: = (Students? :['Alan’", ‘Vivian’, ‘Alister’, "Dade?"], Age’ :[24, 26, 32, 23]) af = pd.bataFrame(dict_of lists) a at contains the following data: Students Age 0 Alan 24 1 Vian 26 2 Alister 32 3 Jade 29 Creating an index DataFrame from a dict of lists: Indices of a DataFrame are not restricted to numbering and can be specified as follows Inport pandas as pd f create dict of lists dict_of Mists = (/Students?:['Alan’’, ‘Vivian’, ‘Alister’, "Jade?"), Age’ :[24, 26, 32, 291) 4 creating the DataFrane af = pé.bataFrane(dict_of_lists, index =[ ent, Student3', student") In such a case, the DataFrame looks like: hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, 518 318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners Students Age Studentt Alan 24 Student! Vivian 26 Student2 Alister 92 Student3 Jade 29 Creating a DataFrame from a list of dicts DataFrames in Pandas can be created with a lst of dictionaries. The keys of the dictionaries are taken as the column names by default, ‘npor andas 2 pd 4 create a List of dictionaries List_ofdicts = [{*coluan_a': 1, ‘colum_b': 2, ‘colum_c':3), {"colunn_a':18, ‘colum_p': 28, ‘colum_c': 38}] 4 creating the Datarrane: 4F = pé.vataFrane(2ist_of diets) ‘The above snippet generates the following DataFrame column_a column_b column_¢ If some of the values are missing in the dictionary, lke in the code snippet below: snpor andas 2 pd 4 create a List of dictionaries List_of + [Ceolunn a": 4, "coluen_¢':3), ("colunn_a':18, ‘colum_p': 28, ‘colum_c': 38}] 1 creating the DataFrane 4F = pé.bataFrame(1ist_of dicts) Then af will contain the following DataFrame, column_a column_p | column_c o4 NaN 3 1 10 20 30 hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, ene 318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners Creating a DataFrame from a list of dicts and specifying the row indices. fnport pandas as pd 4 create 2 List of dictionaries List_of d (‘coluan_a': 4,?coluen_p?: 2, ‘colunn_c':3}, (colunn_a':1@, “coluen_b*: 22, 30) eo ng the ataFrane. 4F = pé.vataFrane(List_of dicts, index = [*row", ‘row_2*]) ‘Then af will contain the following DataFrame, column_a —column_b —column_e rowl 1 NaN 3 row2 10 20 30 Creating a DataFrame from a list of dicts and specifying both the row indices and the column indices ‘The names specified in the column list have to match the keys of the dictionary. If there is no match, the rows, corresponding to that particular column will contain NaN. Inport pandas as pd 4 create a List of dictionaries List_of dt (coluan_a's 1, ‘colum_c':3), (‘column_a':18, ‘column p': 20, ‘coluan_e': 3@)] 4 creating the DataFrane, af = pé.vataFrane(Iist_of dicts, index = [‘row’, ‘row2*], column = [colunn a’, ‘colunn_c”]) ‘Then af will contain the following DataFrame, column_a—column_¢ rowl 1 3 row2 10 30 “column_b' here does not get added to the DataFrame since itis not mentioned in the column list while creating the DataFrame. Consider the following code: Inport pandas as pd 4 create a List of dictionaries List_of ¢ ‘coluan_a': 1, ‘column c':3), (‘column_s':18, “column p': 20, ‘coluan_e': 3@)] hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, 78 318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners 4 creating the DataFrane: 4F = pa.oatarrane(List_of dicts, index = ‘row’, ‘row2*], column = [colunn a’, ‘coluan4”]) Since colurmn_d is not a key in either of the dictionaries, the DataFrame generated looks like: column_a — column_d rowl 1 NaN row2 10 NaN Creating a DataFrame froma Inport st of tuples: pandas as pd 4 create 2 List of tuples Hist_of tuples = [(8, "August? ,1998),(2, ‘January’,1987 ),(17, ‘uly’, 2621), (24, “June? ,1932) # creating 1e DataFrane 4F = pé.vataFrane(2ist_of tuples, column = [‘oate’, ‘Yonth*, “Year’]) Will generate the DataFrame df: Date Month Year o 8 August 1998 1 2 January 1987 207 July 2024 3 June 1932 Creating a DataFrame using the zip() function: In Python, the zip() function can be used to merge two lists. The zip() function generates a zip object. The zip ‘object isan iterator of tuples, where the items in each ofthe iterators passed to the zip function are paired together, ie first item of frst iterator is paired with fist item of the second iterator, the second item of the frst iterators paired with the second item of the second iterator and so on. i the iterators passed to the 7ip() function vary in length, the length of the zip operator is determined by the iterator of least length. Inport pandas as pd fust a fist 2 age = [24, 26, 32, 23] hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, ane 318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners # using zip to merge the two Lists List_of_tuples = list(zip(stugents, age)) 4 List(zip(students, age)) will return 4 [CAlan*, 24), ("Wivian’, 26), (%Alister’, 32),("Jade", 29)) 4 Converting the Lists of tuples into pandas Datafrane: 4F = pé.bataFrane(1ist of tuples, colums = [/Students’, "age']) Here, df will contain: Students Age 0 Abn 24 + Vivian 26 2° Alster 92 3 Jade 29 Creating an empty DataFrame import pandas 2s pd af = pd.datarrane() ‘The above code will create an empty DataFrame in Python Pandas, To create an empty DataFrame with the column headers: import pandas 2s od af = pd.oatarrane( [colunna’ ,*coluan2?, ‘columns? Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects 2. How to sort rows within a pandas dataframe Many times in data analysis you will need to get a sense of the data and its magnitude, Sorting rows enables this. The df.sort_values()function enables this and sorts by columns that are passed as parameters to the function For example the following command sorts the dataframe by the “reports” column in descending order df.sort_values(by='reports’, ascending=0) ‘The following command sorts the dataframe by the “reports" column in ascending order ‘df.sort_values(by="reports', ascending=1) The following command sorts the datalrame fist by the “coverage” column and then by the “reports” column d-sort_values(by=[coverage’, ‘reports')) hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, one 318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners Example Tutorial Check out this pancias dataframe example to see how various ways to sort rows inside a dataframe. 3. How to find the largest value in a pandas dataframe In the data exploratory stage of analytics, you will occasionally want to get a sense of the largest values in your dataset. This tells you directionally the shape of your data, what operations to perform on the data and what visualisation might look like. ‘The idxmax() function returns the index of the row with the highest value in your dataframe, The idxmin() function retums the index of the row with the lowest value in your dataframe. ‘When used like this - dffpreTestScore']idxmax()- it means that this command will return the index of the row that contains the maximum value for column "preTestScore" in your datafram (df) Example Tutorial Check out this pandas dataframe example to see how to find the largest value in a dataframe. Get Closer To Your Dream of Becoming a Data Scientist with 70+ Solved End-to-End ML Projects 4. How to list unique value in a pandas dataframe Finding unique values in a dataset is useful in many scenarios -to categorize the number of rows belonging to a specific entity, to find the most popular and least popular entities etc. The following command lists the unique values in the “name” column of the dataframe diname.unique() Example Tutorial Check out this pandas dataframe example to see how to find unique values ina dataframe 5. How to delete duplicates from a pandas dataframe Deleting duplicate values largely serves the purpose of reducing memory usage of your dataset. It could also be used if you don't want a specific value to be over represented in your dataset. ) Geen tearing \ Data Science Projectsin Retall& Ecommerce) (‘Bata Science Project in Entertainment & Meco Neural Netw 7. Search pandas dataframe for a value The following code finds all value sof Age where salary > 50,000, The where function helps to search a pandas dataframe for a value print(af{Age'].where(af[Satary| > 50000)) Example Tutorial Check out this data science tutorial to see an example of how to search for a value in a pandas dataframe. 8. Drop row and column in a pandas dataframe Many times in data analysis you wil have to delete rows and columns that don’ ft our modeling needs. The df.drop()helps achieve this. hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, se 318/24, 1:20PM Pyton Pandas Datatrare Tutorial fr Beginners <.drop( reports, axis=1) will drop a column names “reports. Axis=1 indicates that we are referring to a column and not a row. ‘You can also drop columns based on coditions df.dropldtiname = Tina’) will drop a row where the value of ‘name’ is not ‘Tina’ Example Tutorial Check out this code recipe to see an example of how to drop row and columns in a pandas datafame. 9. Replace multiple values in a pandas dataframe While data munging, you might inherit a dataset with fts of nul value, junk values, duplicate values et. In such instances you wil need to replace thee values in bulk The df replace(jfunction helps to replace values in a pandas dataframe. This funcation can be used to replace a string, regex, list, dictionary, series, number etc. in a dataframe ‘df.replace(-999, np.nan) will replace all occurrences of 999 with nan nul values. ‘df.replace(to_replace =["Tennis", “Cricket"], value ="Sports") will replace the values ‘Tennis’ and ‘cricket’ with the value ‘Sports Example Tutorial Check out this code recipe to see an example of how to replace multiple values in a pandas dataframe. 10. Save pandas dataframe as a .csv file [As you must have noticed from the above functions, pandas is @ very powertul library for data cleaning and preparation. Once you are done with the various data manipulations using the above commands, you will need to convert your dataframe into a sv fle. This is needed to spit your data into training and test data for model building and accuracy checking ‘The df.to_csv()function converts a pandas dataframe into a .csv file format. df.to_csv(r'C:\Users\Admin\Desktop\file3.csv, index=False) will store the .csv in a specific solution Example Tutorial Check out this code recipe to see an example of how to save a pandas dataframe as a csv file Tl. Randomly sample a pandas dataframe Trying to understand a dataset involves getting a quick insight into what type and range of data it contains. Pandas provides functions to pick random values from the dataset <étake(nprandom.permutation(len(a){2)) this code snippet picks 2 values at random di take(op random permutation(len(a:2)) this code snippet picks 4 values at random Example Tutorial Check out this Pandas tutorial en how to randomly sample a pandas dataframe. hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, vate 318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners 12. How to filter in a pandas dataframe Filtoring a dataframe enables you to view specific rows and columns ether based on order or matching specific conditions print(dff:2]) will print th frst 2 rows in the dataframe. print(di{(df{'coveragel] > 50) & (difreports] < 4))) will print rows where the column ‘coverage’ is greater than 50 and the column ‘raports' is greater than 4. Example Tutorial Check out this data science tutorial on how to filter in a pandas dataframe Get More Practice, More Data Science and Machine Learning Projects, and More guidance.Fast-Track Your car 1 Transition with ProjectPro: 13. How to calculate moving average in a pandas dataframe As part of data munging, you have to try to understand the trends in your dataset, But when your data values are very spikey its tought to spot trends. Calculating a moving average lke a 7-day average helps to smoothen out the data variability and gives you a directional trend. The dataframe.rolling() provides the rolling window calculation and by adding the ‘mean’ parameter to this function, the average can be calculated, ft = dilfpreTestScore’postTestScore|}.oling(window=2).mean() this calculates a moving average with a window of 2 on the columns ‘preTestScore’ and ‘postTestScore'. A window of 2 means, the next 2 consecutive values are averaged and this happens for the entire dataframe, Example Tutorial Check out this data science tutorial on how to calculate moving average in a pandas dataframe 14. How to normalise a column in a pandas dataframe In the data munging step of your data science project, you wil often times get data with wide variability across positive and negative values. Normalisation is done to reduce the data range when data of different scales are involved, Normalising a dataset (234,24,14) would result in (1, 0.31,0.28). Using 234 as the anchor value all other values are represented relative to 234), hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, 1318 318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners Example Tutorial Check out this data science tutorial on how to normalise a column in a pandas dataframe 15. How to assign new columns in a pandas dataframe ‘There are a couple of reasons why you might want to add new columns during data pracessing.You might have data in 2 different data frames that you want to bring into a single data frame, Or you might want to add a new column that is a result of a function on 2 or more other columns. ‘There are multiple ways to add new columns in a pandas dataframe - by declaring a new list as a column, by Using dataframe insert(), by using dataframe.assign(), by using a dictionary. ‘The dataframe.assign() function will add a new column at the end of the dataframe by default. You cannot specify in which position to add this column. For that you will need to use the dataframe.insert() f= dLassign(Marks = (71, 82, 89)) will add a new columnd "Marks" with the values 71, 82,89 as the last column in the dataframe. Example Tutorial Check out this data science recipe on how to assign new columns in a pandas dataframe Access Data Si ynce and Machine Learning Project Code Examples 16. How to rank a pandas dataframe in ascending and descending order hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, sate 318/24, 1:20PM Pytnon Pandas Dataframe Tutorial for Beginners By now you must have realised that Python is an excellent language to do data analysis. This is primarily because of the powerful data analytical packages like pandas that python provides. Ranking a pandas dataframe returns a rank for every index (row) in the series passed to the function. Both numeric and string values can be ranked by the df.rank() dicoverageRanked!] = dif'coverage'}.rank(ascending=True) this function will create a new columns ‘coverageRanked’ and assign to i ascendingt ranks of the values in the ‘coverage’ column Example Tutoria Check out this data science tutorial on how to rank a pandas dataframe 17) Add row to a DataFrame There are several ways to add a row or rows to an existing DataFrame in Python Pandas. Adding a single row using the DataFrame.loc() function. ‘To add the row at the end of the DataFrame, the length of the DataFrame has to be found to determine the position at which the new row is to be added. Inport pandas 95 od ‘feon nunpy.randon inport randint dict = (Student? :[*Beter?, ‘Janes’, ‘Ella, ‘Charlotte’, “age” :128,26,35,271, tajor‘:{ ‘Chentstry", Biology’, Physics?,"chenistry"] ‘creating a DataFrane from the dict of lists 4F = pé.bataFrane(dict) Here, df would look like this: Student Age Major 0 Peter 28 Chemistry 1 James 24 Biology 2 Ella 35 Physics 3 Charlotte 27 Chemistry adding anew row 4F 1ocLlen(dF.index)] = ['Mike’, 33, ‘Physics’ Now, df would ook like: 0 Peter 28 Chemistry 1 James 24 Biology 2 Ella 35 Physics hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, ase 318/24, 1:20PM Pytnen Pandas Dataframe Tutorial for Beginners 3 Charlotte 2 Chemistry 4 Mike 33 Physics Using the DataFrame.append() function ‘The DataFrame.append( function in Python Pandas may be used to append a single row orto append rutile rows belonging to another DataFrame to the end of a particular DataFrame and return a new DataFrame object inthe process, Any columns which are not present inthe original DataFrame are added as new columns. The new cells created inthe original DataFrame get populated with NaN ‘The syntax for the appendi) function is as follows: alse, verfy_integrty=False, so DataFrame.append(other, ignore_inde» fone) where: ‘other: the list of rows to be appended, or a DataFrame object or dictionary object of the rows to be appended. ignore_index : takes True or False; default is false. If set to True, the index labels are not used. verify_integrity : takes True or False; default is false. If set to True, ValueError gets raised on creating indexes with duplicates, sort: sorts the columns if the columns of the original DataFrame and the new rows are not aligned. sort=True Is used to silence the warning and sort. sort=False results in silencing the warning and nat sorting Returns: DataFrame object with appended rows. Using append() to add a single row: nport pandas as pd from nuspy.randon inport randint dict = (¢Student?:['Peter?, ‘Janes?, ‘Ella’, ‘Charlotte’, ‘age? :28,24,35,271, Major’: [‘Chenistry”,"Biology’, "Physics? chemistry" ] 4F = pé.bataFrane(aict) ew row = (/student?: "Mike', “Age’: 29, "Major: ‘Biology") 4F = dF append(af2, Ignore _index = True) Using append() to add the rows from a new DataFrame to an existing DataFrame, Inport pandas 9s pd 4+ First DataFrane 42 = pd.oatarrane(("foo":T1, 2, 3, a1, “bars[5, 6) 7, 81D) 1+ second oatafrane 42 = pd.oa Frane({"fo0":[9, 8, 71, bar*:[5, 4, 3}) an foo bar hitpsow-projectoo.ofanticlelpthon-pandas-dataframe-tutorials/405, rete 318/24, 1:20PM at ntps:www projector joartcelpytnon-pandas-dataframe-ttorials/405 eit

You might also like