Data Manipulation Workshop
Data Manipulation Workshop
Objectives
Introduction
This workshop is an introduction to the use of a spreadsheet. It will initiate the student to a powerful tool
widely used in science. This tutorial is based on Microsoft's Excel spreadsheet. It is an application that
can be installed on all major operating systems (OS) such as Windows, Mac and Android. It can also be
used online (Microsoft Office Excel Online) for operating systems that do not support Windows
applications like Linux. All students registered at Dawson College have an account with Microsoft, so
everyone can use Excel Online on any device.
Note that several spreadsheet software similar to Excel are available on Mac (Numbers) Linux (Open
office Calc) or directly running online (Google Docs Spreadsheet). Lastly, third party freeware for all
systems (WPS office, Libre office) are also available.
-1-
B- Handling data
Basic commands are available with the “Home” tab. For example, You can change the text size, font,
alignment, border type and color of each cell. These instructions are similar to those found in the
Microsoft Word software. On this point, Excel behaves like a Word document.
Any cell in a sheet can be selected with the mouse or the keyboard. The information in a cell is entered
with a double click and it is stored once the cell is not selected anymore (pressing either “Enter” or “Tab”).
Figure 2. Entering the information in a cell
Besides the mouse, it is also possible to use the keyboard to navigate across the worksheet:
Table 1. some shortcut keys used in Excel
-2-
Figure 5. Managing the column and the rows
The size of each column and row are variable as shown
in figure 5. Also, the position of any column or row in the
sheet can be changed by using the option cut/paste.
After selecting a complete column (clicking the top letter)
a specific menu is accessible by clicking the right mouse
button. Then, a complete column (or row) can be added
or removed.
The data in a range can be ordered by their size magnitude (number) or by alphabetic order (text). Once
a data range is selected, use the “right click” to access the menu that will be used to order the data.
In this process, the values (column C) are still linked to their respective constant of column B. The “Sort”
option applies to the first column of the range. If the numbers are to be ordered, a custom sort is needed.
-3-
C- Formula and functions
To do a calculation, a formula is used. It is written in a cell. However, some symbols, normally used in
mathematical operations, are written differently on a worksheet:
Table 2. The Excel instructions to perform mathematical operations.
Product 10 × 5 10 * 5
Power 212 2^12
Power of 10 2.18×10−18 2.18E-18
Square root √2 sqrt(2) or 2^0.5
Logarithm Log 5 , Ln 8 log(5) , ln(8)
Exponential e2.8 exp(2.8)
π 3.14159... pi()
Note:Your spreadsheet could be configured to use the comma instead of a dot for decimal point.
A mathematical operation always starts with the symbol “=”. For the sample calculation in figure 8,
the value of the Planck's constant (C2) is multiplied by the speed of light and divided by a wave
frequency.
Figure 8. Sample of a mathematical operation performed in a cell
The cell E4 is selected and the formula is entered in this cell. The formula bar at the top can also be used
to enter the equation. Still, the formula will be stored in cell E4. As soon as the symbol located at the
end of the formula bar is pressed or the cell is unselected, the answer is calculated (figure 9).
Figure 9. Result of a mathematical operation (from figure 8)
The answer is displayed in the cell where the formula was entered.
The formula bar shows that the corresponding equation is still present in the cell E4. The formula can be
changed any time and a new result will be calculated.
-4-
D- Tools and Statistic
Several statistical functions are already available in Excel. To see the complete list, click the symbol
at the front of the formula bar. Figure 10 shows a sampling of the possible calculations that can be
performed.
Table 3. Most common functions used in Excel. The symbol “( )” is the range
Here, the data range (A2:D7) is indicated in the formula bar. The functions are entered in column F (from
F2 to F6). The formula in cells is not visible because it only displays the processed information. However,
by selecting a cell (here, F2), the corresponding formula is displayed in the formula bar.
-5-
First, the volume and the temperature are converted according to the units of “R”.
Figure 12. Conversion of the volume and the temperature
Now, the main calculation can be performed (Figure 13). The formula to calculate the pressure is stored in
cell B10. It is also displayed in the formula bar.
Figure 13. Formula for the calculation of the pressure (window left) and final result (window right)
-6-
F- Making a chart
A chart is a visual representation of a set of data. It can take many form (column, pie, bar, etc.). It is more
than a simple illustration of a trend, it is also an analytical tool used to access some parameters of a
system. In this part of this tutorial, a range of data will be plotted and processed with some basic chart
options to properly illustrate a scientific chart.
A chart often used in science is the “scatter plot” with Cartesian coordinates (x-y). As illustrated in
Figure 15, a chart with Excel is simply obtained with 3 steps: 1-Select the range of data, 2-click on the tab
“insert” and 3-select the appropriate chart. Since often a trend line (or a fit) is added later, then it is more
appropriate to select “Scatter” without any line drawn between the data points.
At this point, a chart has been created, however, some adjustments are needed for it to become a
valuable scientific document. At the least, the three following modifications have to be completed.
-7-
The data points are always presented “full range” which is not the case in Figure 16. The range of the
y axis (ordinate) should be adjusted to expand this important region of the chart where the data are.
Both axes must be identified with a correct label with their corresponding units if any.
A significant title is added either directly on the chart or in a chart legend. Often, the title corresponds to
the objective of the experiment (example: To find the cooling rate of a solution) or a characteristic name
like “van't Hoff plot” or “Calibration curve of the spectrometer 27”. Never write a title like: “temperature vs.
time”. It is a redundant and useless information since it is already indicated on the axes.
Figure 17. A proper chart presentation
To make any adjustments to the chart, “double click” on the object you want to modify. For example, if the
range of the y-axis is incorrect, double click on the axis numbers to open the corresponding menu. The
same procedure holds for any label on the chart, just double click on it to make a modification.
Adding a trend line or a fit Figure 18. Menu Chart Elements and trendline
-8-
Figure 19. The Format Trendline menu
The two boxes: “Display Equation” and “Display R-squared” should always be checked.
Also, a correlation coefficient or R-squared is used to measure the quality of the fit. The closer the R 2
value to ±1, the better the fit.
Note that it is not possible to put a trend line on a chart for the online version of Excel. In this case,
another online software such as “Google Docs Spreadsheet” can be used instead.
-9-
G- Data series
A data series is made by using a formula or directly from the “crosshair” tool available on several
spreadsheet software. Data series, combined with mathematical equations are useful to see a trend or to
make a curve from a mathematical function (see section H).
a. Write, in two subsequent cells, b. Select the two cells showing c. Slide down the crosshair until
the first number of the series the trend. When the pointer is the number of the trend
followed by a second number placed at the bottom right generated is equal to the last
with the required increment, corner it becomes a crosshair number of the data series
here 1 unit. symbol. needed.
A series can be made out of any trend. It can be a number, a time, a date, etc. The column F of the
spreadsheet below is an example of a series involving the date with a increment of 7 days.
In the formula bar, the date shown (year-month-day) is different than the one displayed in the cell F4. Still
it is the same, it is simply presented in a more convenient way to read and understand. Again, this
information is highly customizable.
- 10 -
H- Creating a set of data from an equation (simulation)
An interesting application of the spreadsheet is the possibility to plot a chart out of a mathematical
equation.
Consider the simple Boyle's Law for gases: PV = k. where k = 24.45 L.atm at 25 °C
Figure 19. The Boyle's law
To make a chart from this equation, two sets of data in two
columns are created. The independent data (abscissa or x) is the
pressure and the dependent one is the volume (ordinate or y).
The first column is made with a data series (see section G).
The second column, the volume, was obtained from the quotient
of the constant k over P as illustrated in the formula bar. Then,
the formula in B7 is copied again to the other cells below to get
the corresponding curve for Boyle's Law.
However, when a formula is copied into another cell, it changes.
For instance, if the cell B7, with the formula =B3/A7 is copied into
the cell below (B8), its formula will become =B4/A8. The pressure
(A8) is recorded, however, the value of constant k is no longer
stored because the wrong cell is now read. To tell Excel that you
don’t want the B3 cell to change, when a copy is made, a pair of
dollar signs is used. $B$3 means that this cell, when copied into
another cell will still be $B$3.
Therefore, the formula in B7 is =$B$3/A7. When the contents of B7 are copied to the cell below or B8, the
formula becomes = $B$3/A8. The value of the constant k is maintained as the pressure changes.
Figure 20. Chart of Boyle's law from the data point of Figure 19
A trend line of the data based on a “power” fit ( y = ax b ) gives the value of the constant “k”. Naturally, a
correlation coefficient of 1 is expected since it is a simulation with values obtained from an equation.
- 11 -
Procedure
The procedure for this laboratory is a set of exercises to perform in the lab and/or at home.
* A “.csv” file or Comma Separated Values is a very practical way to share data. This file is a standard
ASCII document, therefore, it can be read by any OS system, on any device and by any software.
The downside, however, is that you can only store values. All information about the document format
(font, paragraph, table, etc.) is not stored.
- 12 -
The student's lab report is the workbook submission (spreadsheets altogether in a single
document) which contains problems 4, 5 and 6.
Your instructor may add an additional assignment.
5a. Use the Rydberg equation to calculate the wavelength, in nanometers, of the photon corresponding to
the electron transition from the excited state n o = 5 to the excited state ni = 2 in an hydrogen atom.
5b. Perform the same calculation for the transition n o = 4 to ni = 1 for the same atom.
P = Po e kt
Assuming that the initial number of people infected is P o, and that k is the rate of growth, answer the
two following questions:
a. If the initial number of people infected is 2000, calculate the total number of the population infected
after 1 month (30 days) if the infection rate is 5%. Make a graph to show the progression.
b. What will be the total number of people infected after 3 weeks starting from the same initial number of
people infected but with an infection rate in 10%? Compare your graph to the one obtained in 7a
Note: when k = 0.05, one can say that the rate of growth is close to 5% per unit of t.
- 13 -