0% found this document useful (0 votes)
43 views

Data Filtering: Slicing Rows

Filtering data involves reducing noise or errors to focus on important features. It allows displaying only certain data by criteria like age ranges. Filtering slices or dices datasets into subsets for specific analysis needs, such as seeing temperatures over time (rows) or across locations (columns). Python tools let users filter rows by criteria and select columns for tasks like tracking sales increases in specific products or locations.

Uploaded by

Inderpal Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Data Filtering: Slicing Rows

Filtering data involves reducing noise or errors to focus on important features. It allows displaying only certain data by criteria like age ranges. Filtering slices or dices datasets into subsets for specific analysis needs, such as seeing temperatures over time (rows) or across locations (columns). Python tools let users filter rows by criteria and select columns for tasks like tracking sales increases in specific products or locations.

Uploaded by

Inderpal Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Data filtering

Data filtering is the task of reducing the content of noise or errors from measured
process data. It is an important task because measurement noise masks the important
features in the data and limits their usefulness in practice.
Filtering data in a spreadsheet means to allow only certain data to be displayed. This
function is useful when you want to focus only on specific information in a large dataset or
table. Filtering doesn't remove or modify data; it just changes which records appear on your
screen.

Python is a useful tool for data science. You may not need to work with all the
data in a dataset. In fact, looking at just one particular column might be
beneficial, such as age, or a set of rows with a significant amount of
information. You perform two steps to obtain just the data you need to perform
a particular task:

 Filter rows to create a subject of the data that meets the criterion you
select (such as all the people between the ages of 5 and 10).

 Select data columns that contain the data you need to analyze. For
example, you probably don’t need the individuals’ names unless you
want to perform some analysis based on name.

The act of slicing and dicing data, gives you a subset of the data suitable for
analysis. Here are various ways to obtain specific pieces of data to meet
particular needs.

Slicing rows
Slicing can occur in multiple ways when working with data, but the technique
of interest here is to slice data from a row of 2D or 3D data. A 2D array may
contain temperatures (x axis) over a specific timeframe (y axis). Slicing a row
would mean seeing the temperatures at a specific time. In some cases, you
might associate rows with cases in a dataset.

A 3D array might include an axis for place (x axis), product (y axis), and time
(z axis) so that you can see sales for items over time. Perhaps you want to
track whether sales of an item are increasing, and specifically where they are
increasing. Slicing a row would mean seeing all the sales for one specific
product for all locations at any time. The following example demonstrates how
to perform this task:

x = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9],],


[[11,12,13], [14,15,16], [17,18,19],],
[[21,22,23], [24,25,26], [27,28,29]]])
x[1]

In this case, the example builds a 3D array. It then slices row 1 of that array to
produce the following output:

array([[11, 12, 13],


[14, 15, 16],
[17, 18, 19]])

Slicing columns
Using the examples from above, slicing columns would obtain data at a 90-
degree angle from rows. In other words, when working with the 2D array, you
would want to see the times at which specific temperatures occurred.
Likewise, you might want to see the sales of all products for a specific location
at any time when working with the 3D array. In some cases, you might
associate columns with features in a dataset. The following example
demonstrates how to perform this task using the same array:

x = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9],],


[[11,12,13], [14,15,16], [17,18,19],],
[[21,22,23], [24,25,26], [27,28,29]]])
x[:,1]

Notice that the indexing now occurs at two levels. The first index refers to the
row. Using the colon (:) for the row means to use all the rows. The second
index refers to a column. In this case, the output will contain column 1. Here’s
the output you see:

array([[ 4, 5, 6],
[14, 15, 16],
[24, 25, 26]])

Dicing
The act of dicing a dataset means to perform both row and column slicing
such that you end up with a data wedge. For example, when working with the
3D array, you might want to see the sales of a specific product in a specific
location at any time. The following example demonstrates how to perform this
task using the same array as above:

x = np.array([[[1, 2, 3], [4, 5, 6], [7, 8, 9],],


[[11,12,13], [14,15,16], [17,18,19],],
[[21,22,23], [24,25,26], [27,28,29]]])
print x[1,1]
print x[:,1,1]
print x[1,:,1]
print
print x[1:2, 1:2]

This example dices the array in four different ways. First, you get row 1, column 1. Of
course, what you may actually want is column 1, z axis 1. If that’s not quite right, you
could always request row 1, z axis 1 instead. Then again, you may want rows 1 and 2
of columns 1 and 2. Here’s the output of all four requests:

[14 15 16]
[ 5 15 25]
[12 15 18]
[[[14 15 16]
[17 18 19]]
[[24 25 26]
[27 28 29]]]

You might also like