0% found this document useful (0 votes)
7 views13 pages

Data Manipulation Workshop

The document outlines a workshop focused on using Microsoft Excel for data manipulation, including making calculations, processing data, and creating charts. It covers basic functionalities of Excel, such as managing information, handling data, using formulas and functions, and generating visual representations of data. Additionally, it provides a practical procedure for students to apply these skills through exercises involving data management and statistical calculations.

Uploaded by

Maya Ka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

Data Manipulation Workshop

The document outlines a workshop focused on using Microsoft Excel for data manipulation, including making calculations, processing data, and creating charts. It covers basic functionalities of Excel, such as managing information, handling data, using formulas and functions, and generating visual representations of data. Additionally, it provides a practical procedure for students to apply these skills through exercises involving data management and statistical calculations.

Uploaded by

Maya Ka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Data Manipulation Workshop

Objectives

 To use a spreadsheet to make simple calculations


 To perform basic data processing
 To make a chart

Introduction
This workshop is an introduction to the use of a spreadsheet. It will initiate the student to a powerful tool
widely used in science. This tutorial is based on Microsoft's Excel spreadsheet. It is an application that
can be installed on all major operating systems (OS) such as Windows, Mac and Android. It can also be
used online (Microsoft Office Excel Online) for operating systems that do not support Windows
applications like Linux. All students registered at Dawson College have an account with Microsoft, so
everyone can use Excel Online on any device.
Note that several spreadsheet software similar to Excel are available on Mac (Numbers) Linux (Open
office Calc) or directly running online (Google Docs Spreadsheet). Lastly, third party freeware for all
systems (WPS office, Libre office) are also available.

A- Managing the information


A spreadsheet is a data processing file in which the information is presented in a tabular form. Excel has
a menu bar with tabs used to access a toolbar.
Figure 1. The Excel worksheet

-1-
B- Handling data
Basic commands are available with the “Home” tab. For example, You can change the text size, font,
alignment, border type and color of each cell. These instructions are similar to those found in the
Microsoft Word software. On this point, Excel behaves like a Word document.
Any cell in a sheet can be selected with the mouse or the keyboard. The information in a cell is entered
with a double click and it is stored once the cell is not selected anymore (pressing either “Enter” or “Tab”).
Figure 2. Entering the information in a cell

The difference with Word is that Excel is


capable of managing and processing
alphanumerical information.
Each entry has a specific address.
Here, the number 52.3 is in the cell B3.

Besides the mouse, it is also possible to use the keyboard to navigate across the worksheet:
Table 1. some shortcut keys used in Excel

keyboard shortcut displacement on the spreadsheet


Enter down ↓
Shift + Enter up ↑
Tab right →
Shift + Tab left ←

Figure 3. A range of data


Most of the time, the data is treated as a group called a range.
Just like a cell, a range also has an address. It is named after the
combination of two cells: the cell at the top left and the cell at the
bottom right of the range.
For instance, the name of the the range in figure 3 is: (B3:C7)

Figure 4. Some number format


By clicking Home > Numbers in the
menu tab, numbers can be displayed
with a customizable format (figure 4).
Even if a cell shows only 1 digit the
corresponding 5 sig. fig. number is still
memorized (visible in the formula bar).
Note: in Excel, the number 6 ×10−2 is
presented as: 6E-02.

-2-
Figure 5. Managing the column and the rows
The size of each column and row are variable as shown
in figure 5. Also, the position of any column or row in the
sheet can be changed by using the option cut/paste.
After selecting a complete column (clicking the top letter)
a specific menu is accessible by clicking the right mouse
button. Then, a complete column (or row) can be added
or removed.

The data in a range can be ordered by their size magnitude (number) or by alphabetic order (text). Once
a data range is selected, use the “right click” to access the menu that will be used to order the data.

Figure 6. Sorting data in alphabetical order.

In this process, the values (column C) are still linked to their respective constant of column B. The “Sort”
option applies to the first column of the range. If the numbers are to be ordered, a custom sort is needed.

Figure 7. Sort in ascending order of the second row of the range

-3-
C- Formula and functions
To do a calculation, a formula is used. It is written in a cell. However, some symbols, normally used in
mathematical operations, are written differently on a worksheet:
Table 2. The Excel instructions to perform mathematical operations.

Operator Mathematical expression Excel instruction

Product 10 × 5 10 * 5
Power 212 2^12
Power of 10 2.18×10−18 2.18E-18
Square root √2 sqrt(2) or 2^0.5
Logarithm Log 5 , Ln 8 log(5) , ln(8)
Exponential e2.8 exp(2.8)
π 3.14159... pi()
Note:Your spreadsheet could be configured to use the comma instead of a dot for decimal point.
A mathematical operation always starts with the symbol “=”. For the sample calculation in figure 8,
the value of the Planck's constant (C2) is multiplied by the speed of light and divided by a wave
frequency.
Figure 8. Sample of a mathematical operation performed in a cell

The cell E4 is selected and the formula is entered in this cell. The formula bar at the top can also be used
to enter the equation. Still, the formula will be stored in cell E4. As soon as the symbol located at the
end of the formula bar is pressed or the cell is unselected, the answer is calculated (figure 9).
Figure 9. Result of a mathematical operation (from figure 8)

The answer is displayed in the cell where the formula was entered.
The formula bar shows that the corresponding equation is still present in the cell E4. The formula can be
changed any time and a new result will be calculated.

-4-
D- Tools and Statistic

Several statistical functions are already available in Excel. To see the complete list, click the symbol
at the front of the formula bar. Figure 10 shows a sampling of the possible calculations that can be
performed.
Table 3. Most common functions used in Excel. The symbol “( )” is the range

Statistic Excel command Tools Excel command


average AVERAGE( ) find largest value MAX( )
median MEDIAN( ) find the smallest value MIN( )
standard deviation STDEV.S( ) add all the numbers together SUM( )

Figure 10. Sample calculation using standard functions available on Excel

Here, the data range (A2:D7) is indicated in the formula bar. The functions are entered in column F (from
F2 to F6). The formula in cells is not visible because it only displays the processed information. However,
by selecting a cell (here, F2), the corresponding formula is displayed in the formula bar.

E- Using a spreadsheet to do a simple calculation


Sometimes, it is more convenient to use a spreadsheet than a calculator. The advantage is being able to
see input numbers as they are calculated. Afterwards, any number can be changed without having to
retype them again. In addition, the result is more precise since a worksheet like Excel, can handle
numbers ranging from 1.80×10−308 to 1.80x10308, with a precision of 15 digits.
Figure 11. A set of data to be processed.
This example shows a simple ideal gas
calculation. All values (V, n, R, T) are
already entered in the worksheet. The
objective is to calculate the pressure of this
gas.
The worksheet always starts with a title and
all the numbers are written with their
appropriate units.
The pressure is unknown.

-5-
First, the volume and the temperature are converted according to the units of “R”.
Figure 12. Conversion of the volume and the temperature

Now, the main calculation can be performed (Figure 13). The formula to calculate the pressure is stored in
cell B10. It is also displayed in the formula bar.

Figure 13. Formula for the calculation of the pressure (window left) and final result (window right)

Figure 14. Changing the volume


Now if the gas volume increases to
825 mL, the final pressure, will
automatically be calculated as
soon as the volume is entered.
This spreadsheet could also
calculate the pressure if there is
any change in the number of mole
or the temperature without having
to make any changes to the
formula.

-6-
F- Making a chart
A chart is a visual representation of a set of data. It can take many form (column, pie, bar, etc.). It is more
than a simple illustration of a trend, it is also an analytical tool used to access some parameters of a
system. In this part of this tutorial, a range of data will be plotted and processed with some basic chart
options to properly illustrate a scientific chart.
A chart often used in science is the “scatter plot” with Cartesian coordinates (x-y). As illustrated in
Figure 15, a chart with Excel is simply obtained with 3 steps: 1-Select the range of data, 2-click on the tab
“insert” and 3-select the appropriate chart. Since often a trend line (or a fit) is added later, then it is more
appropriate to select “Scatter” without any line drawn between the data points.

Figure 15. The scatter chart menu

A chart will automatically be generated and inserted in the actual sheet.


Figure 16. The basic scatter chart

At this point, a chart has been created, however, some adjustments are needed for it to become a
valuable scientific document. At the least, the three following modifications have to be completed.

-7-
The data points are always presented “full range” which is not the case in Figure 16. The range of the
y axis (ordinate) should be adjusted to expand this important region of the chart where the data are.
Both axes must be identified with a correct label with their corresponding units if any.
A significant title is added either directly on the chart or in a chart legend. Often, the title corresponds to
the objective of the experiment (example: To find the cooling rate of a solution) or a characteristic name
like “van't Hoff plot” or “Calibration curve of the spectrometer 27”. Never write a title like: “temperature vs.
time”. It is a redundant and useless information since it is already indicated on the axes.
Figure 17. A proper chart presentation

To make any adjustments to the chart, “double click” on the object you want to modify. For example, if the
range of the y-axis is incorrect, double click on the axis numbers to open the corresponding menu. The
same procedure holds for any label on the chart, just double click on it to make a modification.

Adding a trend line or a fit Figure 18. Menu Chart Elements and trendline

To complete the chart, a trendline (from a curve fitting tool),


is added to the graph (Figure 17).
The formula obtained from a trend line, gives access to
some curve parameters (slope, intercept, correlation
coefficient).

These “Chart Elements” are available by clicking the + icon


placed at the top right position of the chart (Figure 17).
Once this menu is open, you will notice that several options
are already selected like the Axes, the Chart Title and the
Gridlines. The axis title can be added here as well as the trendline.
In the Chart Elements > Trendline, a choice of different curve fitting options is presented. If the menu
“More options...” is clicked, a window to the right of the graph offers even more choices. This window is
shown in Figure 19.

-8-
Figure 19. The Format Trendline menu

The two boxes: “Display Equation” and “Display R-squared” should always be checked.
Also, a correlation coefficient or R-squared is used to measure the quality of the fit. The closer the R 2
value to ±1, the better the fit.
Note that it is not possible to put a trend line on a chart for the online version of Excel. In this case,
another online software such as “Google Docs Spreadsheet” can be used instead.

-9-
G- Data series
A data series is made by using a formula or directly from the “crosshair” tool available on several
spreadsheet software. Data series, combined with mathematical equations are useful to see a trend or to
make a curve from a mathematical function (see section H).

Figure 18. Using the crosshair tool to make a data series

a. Write, in two subsequent cells, b. Select the two cells showing c. Slide down the crosshair until
the first number of the series the trend. When the pointer is the number of the trend
followed by a second number placed at the bottom right generated is equal to the last
with the required increment, corner it becomes a crosshair number of the data series
here 1 unit. symbol. needed.

A series can be made out of any trend. It can be a number, a time, a date, etc. The column F of the
spreadsheet below is an example of a series involving the date with a increment of 7 days.

Figure 18. Several automatic data trends available with Excel.

In the formula bar, the date shown (year-month-day) is different than the one displayed in the cell F4. Still
it is the same, it is simply presented in a more convenient way to read and understand. Again, this
information is highly customizable.

- 10 -
H- Creating a set of data from an equation (simulation)
An interesting application of the spreadsheet is the possibility to plot a chart out of a mathematical
equation.
Consider the simple Boyle's Law for gases: PV = k. where k = 24.45 L.atm at 25 °C
Figure 19. The Boyle's law
To make a chart from this equation, two sets of data in two
columns are created. The independent data (abscissa or x) is the
pressure and the dependent one is the volume (ordinate or y).
The first column is made with a data series (see section G).
The second column, the volume, was obtained from the quotient
of the constant k over P as illustrated in the formula bar. Then,
the formula in B7 is copied again to the other cells below to get
the corresponding curve for Boyle's Law.
However, when a formula is copied into another cell, it changes.
For instance, if the cell B7, with the formula =B3/A7 is copied into
the cell below (B8), its formula will become =B4/A8. The pressure
(A8) is recorded, however, the value of constant k is no longer
stored because the wrong cell is now read. To tell Excel that you
don’t want the B3 cell to change, when a copy is made, a pair of
dollar signs is used. $B$3 means that this cell, when copied into
another cell will still be $B$3.
Therefore, the formula in B7 is =$B$3/A7. When the contents of B7 are copied to the cell below or B8, the
formula becomes = $B$3/A8. The value of the constant k is maintained as the pressure changes.

Figure 20. Chart of Boyle's law from the data point of Figure 19

A trend line of the data based on a “power” fit ( y = ax b ) gives the value of the constant “k”. Naturally, a
correlation coefficient of 1 is expected since it is a simulation with values obtained from an equation.

- 11 -
Procedure
The procedure for this laboratory is a set of exercises to perform in the lab and/or at home.

Managing information and handling data (Sections A and B)


1. Open the file "Zumdahl_Table.csv"* and arrange its contents to look, as much as possible, to the
following table below. (note that the subjects are in alphabetical order):

* A “.csv” file or Comma Separated Values is a very practical way to share data. This file is a standard
ASCII document, therefore, it can be read by any OS system, on any device and by any software.
The downside, however, is that you can only store values. All information about the document format
(font, paragraph, table, etc.) is not stored.

Formula and function / statistics (Section C and D)


2. Calculate the average, the median and the standard deviation for the following set of data:
(5, 3, 4, 6, 5, 7, 6, 5, 4, 3, 5, 8, 6, 7, 4, 6)
3. Make a simple molar mass calculator based on the following template. The molar mass formula is
located in cell (H3). It will be calculated according to the data in the cell D4 to F4.

- 12 -
The student's lab report is the workbook submission (spreadsheets altogether in a single
document) which contains problems 4, 5 and 6.
Your instructor may add an additional assignment.

Using a spreadsheet to make a simple calculation (section E)


Debroglie wavelength: λ = h/mv where h is the Planck constant: 6.626x10 −34 kg.m2.s−1.
4a. Calculate the de Broglie wavelength (λ) of a proton (mass = 1.67×10−27 kg) moving at a 52% of the
speed of sound (speed of sound = 340.29 m/s).
4b. What will be the wavelength for the electron (9.11x10 −31 kg) moving at 95% of the speed of sound?

5a. Use the Rydberg equation to calculate the wavelength, in nanometers, of the photon corresponding to
the electron transition from the excited state n o = 5 to the excited state ni = 2 in an hydrogen atom.
5b. Perform the same calculation for the transition n o = 4 to ni = 1 for the same atom.

The Rydberg equation for an hydrogen atom is : with RH = 1.0974x107 m−1.

Making a chart (Section F)


6. Open the file “He_Emission_Spectrum.csv” corresponding to the calibration of the spectrometer “J”. It
contains the 12 emission lines spectrum of the He atom measured by this spectroscope. Make a
chart (the x-axis is the spectrometer scale). Write the axes label and provide an appropriate title. Also,
use the “Power trendline” to fit the data. Show the equation and the R 2 value. Can you get a better fit
with an another trendline? (Example: polynomial)

Data series and simulation - making a chart (Sections F, G and H)


7. You want to make a table of data (simulation) associated to the progression of a virus infection.
The virus progresses according to the following equation:

P = Po e kt

Assuming that the initial number of people infected is P o, and that k is the rate of growth, answer the
two following questions:
a. If the initial number of people infected is 2000, calculate the total number of the population infected
after 1 month (30 days) if the infection rate is 5%. Make a graph to show the progression.
b. What will be the total number of people infected after 3 weeks starting from the same initial number of
people infected but with an infection rate in 10%? Compare your graph to the one obtained in 7a
Note: when k = 0.05, one can say that the rate of growth is close to 5% per unit of t.

- 13 -

You might also like