Data Processing Using Advanced Tools
Data Processing Using Advanced Tools
TOOLS
Data Sorting
Pivot Table
Data Analysis
Filtering
A filter is a list of conditions that each entry has to meet to be displayed. Calc provides three
types of filter:
Standard – specifies the logical conditions to filter your data.
The filter criteria used in standard filtering defines a filter by indicating the type of line, the name
of the field, a logical condition and a value or a combination of arguments.
Operator – for the following arguments, you can choose between the logical
operators AND / OR.
Field name – specifies the field names from the current table to set them in the
argument. You will see the column identifiers if no text is available for the field names.
Condition – specifies the comparative operators through which the entries in the
Field name and Value fields can be linked.
Value – specifies a value to filter the field. The Value list box contains all possible values
for the specified Field name. Select a value to be used in the filter, including Empty and
Not Empty entries.
Case sensitive – distinguishes between uppercase and lowercase letters when filtering
the data.
Applying an AutoFilter
An AutoFilter adds a drop-down list to the top row of one or more data columns which lets you select
the rows to be displayed. The list includes every unique entry in the selected cells sorted into lexical
order (see http://sheepsystems.com/bookdog/HelpBook/LexicalOrder.html for an
The options for advanced filtering are the same as those used for standard
Note filtering, see “Applying a standard filter” on page 73 for more information.
Filters
Use filters to limit the visible rows in a spreadsheet. Generic filters, common to all sorts of data
manipulations, are automatically provided by the auto filter capability. You can also define your
own filters.
After applying a filter, some rows are visible and some rows are not. If you select
Caution multiple rows in one operation, you will also select the invisible rows
contained between the selected visible rows. Operations, such as delete, act
on all of the selected rows. To avoid this problem, you must individually select
each of the filtered rows using the control key.
Sort options
On the Options page of the Sort dialog (Figure 56), you can set additional options:
Case Sensitivity – sorts first by uppercase letters and then by lowercase letters. For
Asian languages, special handling applies.
Note cases and diacritics ignored. If they evaluate as the same, their diacritics are
taken into account for the second-level comparison. If they still evaluate as the
same, their cases, character widths, and Japanese Kana difference are
considered for the third-level comparison.
Range contains column/row labels – omits the first row or the first column in the
selection from the sort. The Direction setting at the bottom of the dialog defines the
name and function of this check box.
Include formats – preserves the current cell formatting.
Enable natural sort – natural sorting is a sort algorithm that sorts string-prefixed numbers
based on the value of the numerical element in each sorted number, instead of the
traditional way of sorting them as ordinary strings. For instance, assume you have a
series of values such as, A1, A2, A3, A4, A5, A6, ..., A19, A20, A21. When you put these
values into a range of cells and run the sort, it will become A1, A11, A12, A13, ..., A19,
A2, A20, A21, A3, A4, A5, ..., A9. While this sorting behavior may make sense to those
who understand the underlying sorting mechanism, to the rest of the population it seems
completely bizarre, if not outright inconvenient. With natural sorting selected, values such
as the ones in the above example are sorted correctly, which improves the convenience
of sorting operations in general.
Copy sort results to – copies the sorted list to the cell range that you specify. Select a
named cell range where you want to display the sorted list, or enter a cell range in the
input box.
Custom sort order – select this option and then select the custom sort order that you
want to apply. To define a custom sort order, go to LibreOffice > Preferences >
LibreOffice Calc > Sort Lists.
Language – select the language for the sorting rules.
Options – select a sorting option for the language. For example, select the
"phonebook" option for German to include the umlaut special character in the sorting.
Top to Bottom (Sort Rows) – sorts rows by the values in the active columns of
the selected range.
Left to Right (Sort Columns) – sorts columns by the values in the active rows of
the selected range.
Quick sort
If the columns in your spreadsheet have a header with a text format, you can use a quick sort.
Select a cell or a cell range to be sorted.
Click the Sort Ascending or Sort Descending icons on the Standard toolbar.
Click on the Options tab (see Figure 306) to set the sort options. Check the Range contains
column labels checkbox to prevent column headers from being sorted with the rest of the data.
The Sort by list box in Figure 305 displays the columns using the column headers if the Range
contains column labels checkbox in Figure 306 is checked. If the Range contains column
labels checkbox is not checked, however, then the columns are identified by their column name;
Column A, for example.
Normally, sorting the data causes the existing data to be replaced by the newly sorted data.
The Copy sort results to checkbox, however, causes the selected data to be left unchanged
and a copy of the sorted data is copied to the specified location. You can either directly enter a
target address (Sheet3.A1, for example) or select a predefined range.
Check the Custom sort order checkbox to sort based on a predefined list of values. To set
your own predefined lists, use Tools > Options > LibreOffice Calc > Sort Lists and then enter
your own sort lists. Predefined sort lists are useful for sorting lists of data that should not be
sorted alphabetically or numerically. For example, sorting days based on their name.
Database preconditions
The first thing needed to work with the Pivot Table is a list of raw data, similar to a database table,
consisting of rows (data sets) and columns (data fields). The field names are in the first row above
the list.
The data source could be an external file or database. For the simplest case, where data is
contained in a Calc spreadsheet, Calc offers sorting functions that do not require the Pivot Table.
For processing data in lists, the program needs to know where in the spreadsheet the table is. The
table can be anywhere in the sheet, in any position. A spreadsheet can contains several unrelated
tables.
Calc recognizes your lists automatically. It uses the following logic: Starting from the cell you have
selected (which must be within the list), Calc checks the surrounding cells in all 4 directions (left,
right, above, below). The border is recognized if the program discovers an empty row or column,
or if it hits the left or upper border of the spreadsheet.
This means that the described functions can only work correctly if there are no empty rows or
columns in your list. Avoid empty lines (for example for formatting). You can format your list by
using cell formats.
If you select more than one single cell before you start sorting, filtering, or calling the Pivot Table,
then the automatic list recognition is switched off. Calc assumes that the list matches exactly the
cells you have selected.
Rule For sorting, filtering, or using the Pivot Table, always select only one cell.
A relatively common source of errors is to inadvertently declare a list by mistake and then to sort
that list. If you select multiple cells—for example, a whole column—then the sorting mixes up the
data that should be together in one row.
In addition to these formal aspects, the logical structure of your table is also very important.
Calc lists must have the normal form; that is, they must have a simple linear
Rule structure.
When entering the data, do not add outlines, groups, or summaries. Here are some
mistakes commonly made by inexperienced spreadsheet users:
You made several unnecessary sheets; for example, a sheet for each group of articles.
In this case, analyses are then possible only within each group.
In a Sales list, instead of only one column for the amount, you made a column for the amounts
for each employee. In this case, the system will have difficulty grouping data from the various
columns together. Thus, an analysis with the Pivot Table would no longer be
Data sources
At this time, the possible data sources for the Pivot Table are a Calc spreadsheet or an
external data source that is registered in LibreOffice.
Calc spreadsheet
Analyzing a list in a Calc spreadsheet is the simplest and most often used case. Lists might
be updated regularly, or the data might be imported from a different application.
The behavior of Calc while inserting data from a different application depends on the format of the
data. If the data is in a common spreadsheet format, it is copied directly into Calc. However, if the
data is in plain text format, the Text Import dialog (Figure 171) appears after you select the file
containing the data; see Chapter 1, Introducing Calc, for more more information about this dialog.
Figure 171: Selecting the source data for the Pivot Table
Basic layout
In the Pivot Table dialog (Figure 172) are four white areas that show the layout of the result. Beside
these white areas are buttons with the names of the fields in your data source. To choose a layout,
drag and drop the field buttons into the white areas.
The Data Fields area in the middle must contain at least one field. Advanced users can use more
than one field here. For the Data Field an aggregate function is used. For example, if you move
the sales field into the Data Fields area, it appears there as Sum – sales.
Row Fields and Column Fields indicate from which groups the result will be sorted. Often more
than one field is used at a time to get partial sums for rows or columns. The order of the fields
gives the order of the sums from overall to specific.
For example, if you drag region and employee into the Row Fields area, the sum will be divided
into the employees. Within the employees will be the listing for the different regions (see Figure
173).
Figure 173: Pivot Table field order for analysis, and resulting layout in pivot table
Fields that are placed into the Page Fields area appear in the result above as a drop down list.
The summary in your result takes only that part of your base data into account that you have
selected. For example, if you use employee as a Page Field, you can filter the result shown for
each employee.
To remove a field from the white layout area, just drag it past the border and drop it (the cursor
will change to a crossed symbol), or select it and click the Remove button.
More options
To expand the Pivot Table dialog and show more options, click More.
Selection from
Shows the sheet name and the range of cells used for the Pivot Table.
Results to
Results to defines where your result will be shown. Setting Results to as – undefined – and
1
then entering a cell reference tells the Pivot Table where to show the results. An error dialog
is displayed if you fail to enter a cell reference. Selecting Results to as - new sheet – adds a
new sheet to the spreadsheet file and places the results there. The new sheet is named using
the format Pivot Table_sheetname_X; where X is the number of the table created, 1 for first,
2 for second and so on. For the source shown in Figure 3, the new sheet for the first table
produced would be named Pivot Table_sheetname_1. Each new sheet is inserted next to the
source sheet.
Identify categories
With this option selected, if the source data has missing entries in a list and does not meet the
recommended data structure (see Figure 175), the Pivot Table adds it to the listed category above
it. If this option is not chosen, then the Pivot Table inserts (empty) (see Figure 177).
In this case the word - undefined – is misleading because the output position is in fact defined.
Add filter
Use this option to add or hide the cell labeled Filter above the Pivot Table results. This cell is
a convenient button for additional filtering options within the Pivot Table.
In the Displayed value section, you can choose other possibilities for analysis by using the
aggregate function. Depending on the setting for Type, you may have to choose definitions
for Base field and Base item.
The table below lists the possible types of displayed value and associated base field and
item, together with a note on usage.
Figure 180: Original Pivot Table (top) and a Difference from example (below)
Figure 184: Division of the regions for employees (two row fields) without subtotals
You can remove a column, row, or page field from the Pivot Table by clicking on it and dragging
it out of the table. The cursor changes to that shown in Figure 191. A field removed in error cannot
be recovered, and it is necessary to return to the Pivot Table to replace it.
Before you can group, you have to produce a Pivot Table with ungrouped data. The
time needed for creating a Pivot Table depends mostly on the number of columns
Note and rows and not on the size of the basic data. Through grouping you can
produce the Pivot Table with a small number of rows and columns. The Pivot
Table can contain a lot of categories, depending on your data source.
You can select several non-contiguous cells in one step by pressing and holding the
Tip Control key while left-clicking with the mouse.
Given the input data shown in Figure 195, execute the Pivot Table with Department in the Row
Field and Sum (Sick Days) in the Data Field. The output should look like that in Figure 196.
With the mouse, select the Departments Accounting, Purchasing and Sales.
Choose the Data > Group and Outline > Group from the Menu bar or press F12. The output
should now look like that in Figure 197. Repeat this for all groups that you want to create from
the different categories (Select Assembly, Production and Warehouse and Group again. The
output should look like Figure 198.
You can change the default names for the groups and the newly created group field by editing
the name in the input field (for example changing 'Group2' to 'Technical'). The Pivot Table will
remember these settings, even if you change the layout later on. For the following pictures, the
dialog was called again (right-click, Edit Layout) and by selecting the icon “Department 2”, then
Options, and finally from the preferences menu Automatic was selected. This generated the
partial sum results shown in Figure 199. Double clicking Group 1 and Technical collapses the
entries, as shown in Figure 200.
Sort automatically
To sort automatically, right-click within the Pivot Table and choose Edit Layout. This will open the
Pivot Table (Figure 172). Within the Layout area of the Pivot Table, double-click the row or column
field you want to sort. In the Data Field dialog which opens (Figure 186), click Options to display
the Data Field Options dialog.
For Sort by, choose either Ascending or Descending. On the left side is a drop-down list where
you can choose the field this setting should apply to. With this method, you can specify that sorting
does not happen according to the categories but according to the results of the data field.
Figure 204: Before the drill down for the category golfing
A dialog appears allowing you to select the field to use for further subdivision. In
this example, employee.
Figure 207: New table sheet after the drill down for a value in a data field
Filtering
To limit the Pivot Table analysis to a subset of the information that is contained in the data
basis, you can filter with the Pivot Table.
An Autofilter or default filter used on the sheet has no effect on the Pivot Table
Note analysis process. The Pivot Table always uses the complete list that was
selected when it was started.
To do this, click Filter on the top left side above the results.
Figure 208: Filter field in the upper left area of the Pivot Table
In the Filter dialog, you can define up to 3 filter options that are used in the same way as
Calc’s default filter.
Even if they are not called a filter, page fields are a practical way to filter the results.
Note The advantage is that the filtering criteria used are clearly visible.
Cell formatting
The cells in the results area of the Pivot Table are automatically formatted in a simple format
by Calc. You can change this formatting using all the tools in Calc, but note that if you make
any change in the design of the Pivot Table or any updates, the formatting will return to the
format applied automatically by Calc.
For the number format in the data field, Calc uses the number format that is used in the
corresponding cell in the source list. In most cases, this is useful (for example, if the values are in
the currency format, then the corresponding cell in the result area is also formatted as currency).
However, if the result is a fraction or a percentage, the Pivot Table does not recognize that this
might be a problem; such results must either be without a unit or be displayed as a percentage.
Although you can correct the number format manually, the correction stays in effect only until the
next update.
Using shortcuts
If you use the Pivot Table very often, you might find the frequent use of the menu paths (Data
> Pivot Table > Create and Data > Group and Outline > Group) inconvenient.
For grouping, a shortcut is already defined: F12. For starting the Pivot Table, you can define your
own keyboard shortcut. If you prefer to have toolbar icons instead of keyboard shortcuts, you can
create a user-defined symbol and add it to either your own custom made toolbar or the Standard
toolbar.
For an explanation how to create keyboard shortcuts or add icons to toolbars, see Chapter 14,
Setting Up and Customizing Calc.
The problem
Normally, you create a reference to a value by entering the address of the cell that contains the
value. For example, the formula =C6*2 creates a reference to cell C6 and returns the doubled
value.
If this cell is located in the results area of the Pivot Table, it contains the result that was calculated
by referencing specific categories of the row and column fields. In Figure 210, the cell C6
contains the sum of the sales values of the employee Hans in the category Sailing. The formula in
the cell C12 uses this value.
Figure 211: The value that you really want to use can be found now in
a different location.
Consolidating data
Data > Consolidate provides a way to combine data from two or more ranges of cells into a new
range while running one of several functions (such as Sum or Average) on the data. During
consolidation, the contents of cells from several sheets can be combined into one place. The
effect is that copies of the identified ranges are stacked with their top left corners at the specified
result position, and the selected operation is used in each cell to calculate the result value.
Open the document containing the cell ranges to be consolidated.
Choose Data > Consolidate to open the Consolidate dialog. Figure 215 shows this
dialog after making the changes described below.
The Source data range list contains any existing named ranges (created using Data
> Define Range) so you can quickly select one to consolidate with other areas.
If the source range is not named, click in the field to the right of the drop-down list and
either type a reference for the first source data range or use the mouse to select the
range on the sheet. (You may need to move the Consolidate dialog or click on the Shrink
icon to reach the required cells.)
Click Add. The selected range is added to the Consolidation ranges list.
Select additional ranges and click Add after each selection.
Specify where you want to display the result by selecting a target range from the
Copy results to drop-down list.
If the target range is not named, click in the field next to Copy results to and enter the
reference of the target range or select the range using the mouse or position the cursor
in the top left cell of the target range. Copy results to takes only the first cell of the target
range instead of the entire range as is the case for Source data range.
Select a function from the Function list. This specifies how the values of the consolidation
ranges will be calculated. The default setting is Sum, which adds the corresponding cell
values of the Source data range and gives the result in the target range.
Most of the available functions are statistical (such as Average, Min, Max, Stdev), and
the tool is most useful when you are working with the same data over and over.
At this point you can click More in the Consolidate dialog to access the following
additional settings:
In the Options section, select Link to source data to insert the formulas that generate
the results into the target range, rather than the actual results. If you link the data, any
values modified in the source range are automatically updated in the target range.
Caution The corresponding cell references in the target range are inserted in
consecutive rows, which are automatically ordered and then hidden from view.
Only the final result, based on the selected function, is displayed.
In the Consolidate by section, select either Row labels or Column labels if the cells
of the source data range are not to be consolidated corresponding to the identical
position of the cell in the range, but instead according to a matching row label or
column label. To consolidate by row labels or column labels, the label must be
contained in the selected source ranges. The text in the labels must be identical, so
that rows or columns can be accurately matched. If the row or column label of one
source data range does not match any that exist in other source data ranges, it is
added to the target range as a new row or column.
Click OK to consolidate the ranges.
If you are continually working with the same range, then you probably want to use
Tip Data > Define Range to give it a name.
The consolidation ranges and target range are saved as part of the document. If you later open a
document in which consolidation has been defined, this data is still available.
Figure 216: AutoFilter applied and Brigitte selected in the Employee column
Select the location for the subtotal to be displayed by clicking in the chosen cell.
Select Insert > Function from the Menu bar, or click the Function Wizard button on
the Function Bar, or press Ctrl+F2 to open the Function Wizard.
Select SUBTOTAL from the function list in the Function Wizard dialog and click Next>>
at the bottom of the dialog.
Enter the required information into the two input boxes as shown in Figure 217. The range
is selected from the filtered data, and the function is selected from the list of available
possible functions as shown in the Help file extract of Figure 218. In our example we
select the sales figures (column B) and we require the sum total (function index 9).
Click OK to return the summed values of Brigitte’s sales (Figure 219).
A partial view of the results using our example data is shown in Figure 221. Subtotals for Sales
by Employee and Category were used
Calc inserts, to the left of the row numbering labels, an outline area that graphically represents
the structure of the subtotals. Number 1 represents the highest level of grouping, the Grand Total.
Numbers 2 to 4 show reducing grouping levels, with number 4 showing individual entries. The
number of levels depends on the number of groupings in the subtotals.
Figure 222: Click the plus buttons to expand the elements again
Optionally add some information to the Comment box. The example shows the default
comment. This information is displayed in the Navigator when you click the Scenarios
icon and select the desired scenario.
Optionally select or deselect the options in the Settings section. See below for more
information about these options.
Click OK to close the dialog. The new scenario is automatically activated.
You can create several scenarios for any given range of cells.
Settings
The lower portion of the Create Scenario dialog contains several options. The default settings
(as shown in Figure 224) are likely to be suitable in most situations.
Display border
Places a border around the range of cells that your scenario alters. To choose the color of the
border, use the field to the right of this option. The border has a title bar displaying the name
of the active scenario. Click the arrow button to the right of the scenario name to open a drop-
down list of all the scenarios that have been defined for the cells within the border. You can
choose any of the scenarios from this list at any time.
Prevent changes
Prevents changes to a scenario enabled as a Copy back, when the sheet is protected but the cells
are not. Also prevents changes to the settings described in this section while the sheet is protected.
A fuller explanation of the effect this option has in different situations is given below.
Changing scenarios
Scenarios have two aspects that can be altered independently:
Scenario properties (the settings described above)
Scenario cell values (the entries within the scenario border)
The extent to which either of these aspects can be changed is dependent upon both the
existing properties of the scenario and the current protection state of the sheet and cells.
In the Formulas field of the Multiple Operations dialog, enter the cell reference to the formula that
you wish to use.
The arrangement of your alternative values dictates how you should complete the rest of the
dialog. If you have listed them in a single column, you should complete the field for Column input
cell. If they are along a single row, complete the Row input cell field. You may also use both in
more advanced cases. Both single and double-variable versions are explained below.
The above can be explained best by examples. Cell references correspond to those in
the following figures.
Let’s say you produce toys that you sell for $10 each (cell B1). Each toy costs $2 to make (cell
B2), in addition to which you have fixed costs of $10,000 per year (cell B3). How much profit will
you make in a year if you sell a particular number of toys?
You may find it easier to mark the required reference in the sheet if you click the
Tip Shrink icon to reduce the Multiple operations dialog to the size of the input field. The
icon then changes to the Maximize icon; click it to restore the dialog to its original size.
Beware of entering the cell reference of a variable into the wrong field. The Row input
Caution cell field should contain not the cell reference of the variable which changes down the
rows of your results table, but that of the variable whose alternative values have been
entered along a single row.
With the cursor in the Formulas field of the Multiple operations dialog, click cell B5 (profit).
Set the cursor in the Row input cell field and click cell B1. This means that B1, the
selling price, is the horizontally entered variable (with the values 8, 10, 15 and 20).
Set the cursor in the Column input cell field and click cell B4. This means that B4,
the quantity, is the vertically entered variable.
Click OK. The profits for the different selling prices are now shown in the range
E2:H11 (See Figure 231).
Choose Tools > Solver. The Solver dialog (Figure 235) opens.
Click in the Target cell field. In the sheet, click in the cell that contains the target value.
In this example it is cell B4 containing total interest value.
Select Value of and enter 1000 in the field next to it. In this example, the target cell value is
1000 because your target is a total interest earned of $1000. Select Maximum or Minimum
if the target cell value needs to be one of those extremes.
Click in the By changing cells field and click on cell C2 in the sheet. In this example,
you need to find the amount invested in Fund X (cell C2).
Enter limiting conditions for the variables by selecting the Cell reference, Operator and
Value fields. In this example, the amount invested in Fund X (cell C2) should not be
greater than the total amount available (cell C4) and should not be less than 0.
Click OK. A dialog appears informing you that the Solving successfully finished. Click
Keep Result to enter the result in the cell with the variable value. The result is shown in
Figure 236.
Use Tools > Options > LibreOffice > Advanced and select the Enable macro
Tip recording option to enable the macro recorder.
11) Click OK to create a new module named Module1. Select the newly created Module1, type
PasteMultiply in the Macro name box at the upper left, and click Save. (See Figure 290.)
The created macro is saved in Module1 of the Standard library in the Untitled 1 document. Listing 1
shows the contents of the macro.
Listing 1. Paste special with multiply.
sub PasteMultiply
rem --------------------------------------------------------------
rem define variables
dim document as object
dim dispatcher as object
rem --------------------------------------------------------------
rem --------------------------------------------------------------
dim args1(5) as new com.sun.star.beans.PropertyValue
args1(0).Name = "Flags"
args1(0).Value = "A"
args1(1).Name = "FormulaCommand"
args1(1).Value = 3
args1(2).Name = "SkipEmptyCells"
More detail on recording macros is provided in Chapter 13, Getting Started with Macros, in the
Getting Started guide; we recommend you read it if you have not already done so. More detail
is also provided in the following sections, but not as related to recording macros.