Module 3 Data Types
Module 3 Data Types
Introduction
Data science is all about experimenting with raw or structured data. Data is the
fuel that can drive a business to the right path or at least provide actionable insights
that can help strategize current campaigns, easily organize the launch of new products,
or try out different experiments.
All these things have one common driving component and this is Data. We are
entering into the digital era where we produce a lot of Data. For instance, a company
like Flipkart produces more than 2TB of data on daily basis.
When this Data has so much importance in our life then it becomes important to
properly store and process this without any error. When dealing with datasets, the
category of data plays an important role to determine which preprocessing strategy
would work for a particular set to get the right results or which type of statistical analysis
should be applied for the best results. Let’s dive into some of the commonly used
categories of data.
These are usually extracted from audio, images, or text medium. Another example
can be of a smartphone brand that provides information about the current rating, the
color of the phone, category of the phone, and so on. All this information can be
categorized as Qualitative data. There are two subcategories under this:
1. Nominal
These are the set of values that don’t possess a natural ordering. Let’s understand
this with some examples. The color of a smartphone can be considered as a nominal
data type as we can’t compare one color with others.
It is not possible to state that ‘Red’ is greater than ‘Blue’. The gender of a person is
another one where we can’t differentiate between male, female, or others. Mobile phone
categories whether it is midrange, budget segment, or premium smartphone is also
nominal data type.
Nominal data types in statistics are not quantifiable and cannot be measured
through numerical units. Nominal types of statistical data are valuable while conducting
qualitative research as it extends freedom of opinion to subjects.
2. Ordinal
These types of values have a natural ordering while maintaining their class of values.
If we consider the size of a clothing brand then we can easily sort them according to
their name tag in the order of small < medium < large. The grading system while
marking candidates in a test can also be considered as an ordinal data type where A+ is
definitely better than B grade.
These categories help us deciding which encoding strategy can be applied to which
type of data. Data encoding for Qualitative data is important because machine learning
models can’t handle these values directly and needed to be converted to numerical
types as the models are mathematical in nature.
For nominal data type where there is no comparison among the categories, one-hot
encoding can be applied which is similar to binary coding considering there are in less
number and for the ordinal data type, label encoding can be applied which is a form of
integer encoding.
Difference Between Nominal and Ordinal Data
This data type tries to quantify things and it does by considering numerical values
that make it countable in nature. The price of a smartphone, discount offered, number of
ratings on a product, the frequency of processor of a smartphone, or ram of that particular
phone, all these things fall under the category of Quantitative data types.
The key thing is that there can be an infinite number of values a feature can take.
For instance, the price of a smartphone can vary from x amount to any value and it can
be further broken down based on fractional values. The two subcategories which describe
them clearly are:
1. Discrete
The numerical values which fall under are integers or whole numbers are placed
under this category. The number of speakers in the phone, cameras, cores in the
processor, the number of sims supported all these are some of the examples of the
discrete data type.
2. Continuous
The fractional numbers are considered as continuous values. These can take the
form of the operating frequency of the processors, the android version of the phone, wifi
frequency, temperature of the cores, and so on.
Unlike discrete data types of data in research, with a whole and fixed value,
continuous data can break down into smaller pieces and can take any value. For example,
volatile values such as temperature and the weight of a human can be included in the
continuous value. Continuous types of statistical data are represented using a graph that
easily reflects value fluctuation by the highs and lows of the line through a certain period
of time.
Difference between Discrete Data and Continuous Data
Importance of Qualitative and Quantitative Data
On the other hand, the Quantitative data types of statistical data work with
numerical values that can be measured, answering questions such as ‘how much’, ‘how
many’, or ‘how many times’. Quantitative data types in statistics contain a precise
numerical value. Therefore, they can help organizations use these figures to gauge
improved and faulty figures and predict future trends.
Excel data types are the four different kinds of values in Microsoft Excel. The four
types of data are text, number, logical and error. You may perform different functions with
each type, so it's important to know which ones to use and when to use them. You may
also consider that some data types may change when exporting data into a spreadsheet.
Here's a list of the four data types you can find in Microsoft Excel, with information
about the ways you can use them:
1. Number data
Data is this category includes any kind of number. These may include large
numbers or small fractions and quantitative or qualitative data. It's important to
remember the difference between quantitative and qualitative number values because
some numbers may not represent an amount of something. For example, you might enter
a number that represents financial earnings in one cell and a number that represents a
date in another. Both count as number data, but may enter differently in the spreadsheet.
Make sure you use the appropriate symbols and formats to ensure Excel reads your
number data accurately.
Monetary totals
Whole numbers
Percentages
Decimals
Dates
Times
Integers
Phone numbers
2. Text data
This kind of data includes characters such as alphabetical, numerical and special
symbols. The primary difference between number data and text data is that you can use
calculations on number data but not text data. Since there can be overlap between these
two types of data, you may manually change the format of a cell to ensure it operates the
way you want. You may also use text data to label columns or rows to help keep track of
different categories. For example, you may label a row "revenue" and a column "January
2022."
Excel may categorize figures it doesn't recognize as text data by default, so it's
important to format your data to fit the type you want. Examples of text data may include:
Words
Sentences
Dates
Times
Addresses
3. Logical data
Data in this type is either TRUE or FALSE, usually as the product of a test or
comparison. This means you can use a function to determine whether the data in your
spreadsheet meets different measures. For example, you may want to use your
spreadsheet to set sales goals and measure whether your sales performance matches. You
may conduct these tests using logical functions for different scenarios. The four logical
functions are:
AND: An AND function may help you determine whether your data meets
multiple conditions. For example, you might use this function to test if data in
one cell is larger than a certain amount and the data in another cell is also larger
than another amount.
OR: You may use this function to determine that at least one of your arguments
meets your conditions. If none of the data matches your conditions, Excel
produces a FALSE value.
XOR: This function stands for "Exclusive Or," which means that only one
argument may be TRUE or FALSE. For example, you might use this function to
ensure that only one of your cells contains a certain value.
NOT: You might use this function when you want to filter out arguments that
don't match your conditions. This marks each argument as TRUE so you can
assess possible patterns in data that doesn't match your conditions.
4. Error data
This type of data occurs when Excel recognizes a mistake or missing information
while processing your entry. For example, if you attempt to run a function on a cell that
contains text data, Excel produces the error value #VALUE!. This helps you identify where
the issue is so you can correct it and produce the result you want. A "#" character at the
beginning of each error value can help you easily recognize these instances. Knowing the
different error values can help you understand how to resolve different mistakes or add
the appropriate information. These values are:
#NAME?: You may see this value if you have a value inside a formula without
quotes or with a beginning or end quote missing. It may also populate if there's a
typo in the formula.
#DIV/0: This error value might arise if you try dividing a number by zero. Since the
result is an undefined number, Excel uses #DIV/0 to represent where you can try
a different equation.
#REF!: An invalid cell reference error value may result if you remove or paste items
in a cell or range of cells where you previously entered a formula. To correct this
issue, you can undo your previous action and place your new data in a cell or cell
range that doesn't contain a formula.
#NUM!: A #NUM! value may appear if you enter an invalid formula or function. It
may also appear if the total that a formula or function produces is too large for
Excel to represent in a cell.
#N/A: You may enter this error value when you want to indicate to yourself areas
where you can enter a value later. Excel may also automatically populate this value
if imported data contains empty or unreadable cells.
#VALUE!: This error indicates that an argument or operator in a function or
formula is invalid. For example, if you try to calculate the sum of a range of cells
where one cell contains alphabetical characters, you can get a #VALUE! result.
#NULL!: If you're referencing the intersection between a range of cells in a
function, you may see this error value because those cells don't actually intersect.
It may also appear if a range of cells for a function are missing separating commas.
This function allows you to format data in a cell as text data. This can help you
maintain your format in situation where Excel might change it to a default format. For
example, if you want to display dates using numbers and alphabetical characters, you can
set the cells to text so Excel doesn't convert them to its default date format. It may also
help you display quantitative numbers that start with one or more zeros, since Excel
removes them for number data.
To use this function, select the cell or range of cells you want to change and then type the
following formula into the Formula Bar:
=Text(Value, format_text)
Microsoft Excel allows you to enter large numbers, but limits numbers that have
more than 15 numerical digits, not including commas and decimal points. It edits numbers
past this range by replacing any digit past the limit with a zero. It's important to keep a
separate record of any number data that requires more than 15 digits for accuracy. You
may also adapt to this limit by using text. For example, if you want to enter a number in
the billions where writing it out would take more than 15 digits, you can instead use the
letter B to signify that the number is in the billions.
When you export data into a spreadsheet, Excel may sometimes revert settings
based on what it thinks certain values represent. For example, the format of an error value
may enter into a new spreadsheet as text. Checking the new spreadsheet with the old one
may help you identify missing or altered formats so you can correct them.
If you want to check your spreadsheet for error values, you may select "Ctrl+F" on
your keyboard and search for the "#" symbol. Since all error values start with a number,
or hash, sign, you can find them this way. This can help you fix missing or incorrect entries
to ensure accuracy in your functions.
Determine number data display
It's important to determine how you want to display number data, such as dates,
percentages or fractions. For example, if you enter 2/8, Excel may reduce your fraction to
1/4. If you want to retain the unreduced fraction, you can change the settings for that cell
in the Format menu. Deciding how you want to display certain numbers before you enter
all your data can help ensure your spreadsheet is uniform and organized.
REFERENCE/S:
1. https://www.geeksforgeeks.org/data-analytics-and-its-type/
2. https://www.geeksforgeeks.org/different-sources-of-data-for-data-analysis/
3. https://www.coursera.org/articles/data-source
4. https://www.upgrad.com/blog/types-of-data/
5. https://www.indeed.com/career-advice/career-development/excel-data-
types#:~:text=The%20four%20types%20of%20data,and%20when%20to%20u
se%20them.
Prepared by: