0% found this document useful (0 votes)
32 views3 pages

7.2 - Data Frame Basics - mp4

data structures

Uploaded by

raju111yadav123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views3 pages

7.2 - Data Frame Basics - mp4

data structures

Uploaded by

raju111yadav123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

So in python, pandas is a very, very popular library, especially in data science, to manipulate

data. So in pandas, there is a very important data structure called, or an object called data
frame. A data frame, in a nutshell is like a table. You can think of it like any table that you
draw or like an Excel spreadsheet, right? Pandas data frames are extensively used across
data science to do simple operations on data, right? So now what we'll do is we will load the
data. So let's see some simple code here. I'm importing pandas as pd, and in this folder I
have a data set called weather underscore data CSV. Now, CSV stands for comma separated
values, comma separated values. Now, what does comma separated values mean? It means
imagine if I have a table like this. Imagine if I have a table like this where my first row is a
one, a two, a three, a four. My second row is B one, b two, b three, b four. My third row is C
one, c two, c three, c four, so on and so forth. Now, when I store this table as a CSV, the way
it is stored is as follows. I have a one separated by a comma, a two separated by a comma, a
three and a four. So my first row here becomes the first line in my file. And each of these
columns, each of these columns are separated by a comma. That's why it's called a comma
separated values. Each of them is a value. They're separated by column separated by a
comma. Similarly, in my next line in my file, I have B one, b two, b three, b four. So this is my
file. This is literally my file, my CSV file here. Right? So CSV files are extensively used in data
science to simply store data in a very easy to store format. Right. Now in this folder we have
weather underscore data CSV, which has some data. Now, if I want to load the data that is
there in the CSV file into a data frame, all I have to do is pandas read CSV. It's literally one
line of code. And now when I see the data here, I basically have six rows of data here and I
have four columns. The columns here are day temperature, wind speed, and even. And I
have six rows corresponding to six individual days. Now this is one way. So here, what we
are doing here is we are taking a CSV file and we are using pandas. We are using pandas.
Read CSV command, right? Read CSV function. To read this into a data frame, there are
other ways to construct data frames, for example, one such way, there are many ways to
construct it. One such way is I can simply say, I can actually, so what I have here, each of
them is a tuple. Each of them here is a tuple. And I have a list of tuples because this shows a
list, right? I have a list of tuples. Now I can construct my data frame saying that I want to
construct a data frame using this list of tuples with these as my column names. And it will
just work as just like. But here you'll have to list all the data yourself manually. You can't
read it from a file here, right? The one thing that is often done is people store all of their
data in a CSV file and they load it to the pandas data frame. That's a more commonly used
approach of loading data into a data frame. But you can also do it from a list of tuples. That's
perfectly valid. Now the other important, other important parameters of a data frame are, so
if I say df shape, it says that I have six rows and four columns. The other interesting function
is Df head. So what head does is it prints the first few rows in your data. So here it's printing
the top five rows, right. Similarly, there is a function called df tail which prints the last few
rows in your data. And these are used so your head and tail functions for a data frame. So
those of you who have done Unix programming or Unix shell scripting will understand the
functions head and tail, right? Head and tail are Unix commands, literally. And data frames
have borrowed the same terminology here. And these are often used to just look at the top
few rows or the bottom few rows in your data. Now the other interesting thing is how do
you slice your data? Suppose I have my data, which is six rows and four columns. Imagine if
I want data from second row, I want data from second row and fourth row. Okay, between
second and fourth row. Right. I can just say df two column five. It'll take the second column,
the data for the second row, the third row and fourth row, but not the fifth row. Right?
Similarly, I can say DF columns. And what I get is basically get a list of columns in my data
table. Similarly, if I want to pick one specific column, there are two ways to pick one specific
column. I can just say DF day because day is the name of a column. Or I can say DF within
square brackets. I can give the column name and this would return the same thing, both of
them return me both of them return me basically a new data frame with just one column.
Right? Now, if I want data from two columns, not one column, but two columns, then the
syntax goes like this. If I want data from both column day and column event, this is how the
syntax looks like. Looks like. So now I get a new data frame now which has six rows but only
two columns instead of four columns, right? So now again, this is an example to get all the
temperatures, right. Now, if I want to find the maximum temperature, all I have to do is I
have to say DF temperature. Now this gives me a new data frame with all temperatures, and
I'm just computing the max of it. Similarly, I can compute min. There is another function
called describe, which is often very useful in data science. So when I say DF temperature,
now, I'm operating on only the temperature column of my data. When I say describe, it says
that there are six values and the mean or the average value is 30 standard deviation. The
minimum value is 24 standard deviation, 25th percentile, 50th percentile, 75th percentile
are all statistical parameters. Those of you interested to understand these, please check out
our applied AI course. Please check out our appliedaicourse.com. So, there are a bunch of
free videos with the title exploratory data analysis where we explain what each of these
percentiles mean, what standard deviation means, and things like that. You also get max. By
just running one command, you get a good idea of what the data looks like. So quickly, I can
say that my minimum temperature is 24 degrees and my maximum temperature is 35
degrees, and my mean temperature is 30 degrees. And I have six values here, right? Very
simple. Within one line, I can get a good sense of what are all the values and what types of
values are there in a given column. Now here is an interesting thing. Imagine if I want to
select the row which has a maximum temperature. How do I do it? First, I say df
temperature max. Right? This will return me the maximum temperature. Now, if I say df
temperature equals to the maximum temperature, this returns me the row number of that
row which has whose temperature is equal to the maximum temperature. Now here I'm
saying df within square brackets. So this returns me the row with maximum temperature,
right? So this is like, see, remember, we are using df thrice here, okay? This one we are using
to compute the maximum temperature. This one to say which row has this maximum
temperature. And now to get the actual row itself, right? Similarly, if I want to get only the
day column of the row which has the maximum temperature. Now, this gives me the
maximum temperature value. This gives me the row with the maximum temperature. And
for this row, I want to find only the day column. Right. And this is how you do it. And you get
the value that one, two. 2017 is the day on which I have maximum temperature. Now, these
are very elegant and simple operations that we can do to do some simple operations like
this. And data frames are super popular nowadays. Data frames and pandas specifically is
something that most data scientists and machine learning engineers use extensively. When
they use Python, even we use pandas data frames extensively in this workshop. So, and
they're super popular. We'll see more code of using data frames in our actual workshop
code itself.

You might also like