MSM UserGuide
MSM UserGuide
User Guide
EFCOVAL
Work package WP3A
page 2
The Multiple Source Method
Short description
The Multiple Source Method (MSM) is a new statistical method for estimating usual dietary
intake of nutrients and food including episodically consumed foods for populations as well as
individuals. The strength of the method lies in its ability to combine dietary intake data , such
as 24h dietary recalls or food records, with supporting data on consumption frequency from
food frequency questionnaires or food propensity questionnaires and other external sources.
This information is used to distinguish the proportion of habitual consumer and habitual non
consumer. The MSM offers several options to include such data.(see section 'Assessment of
habitual non-consumers').
The method can make use of covariate information such as consumption frequency
information from an FFQ to improve the modelling of consumption probability and intake
amount. Precondition is that everybody in the sample provides this covariate information such
as frequency of consumption.
MSM calculates dietary intake for individuals first and then constructs the population
distribution based on the individual data. Although the method is able to estimate usual
dietary intake on the basis of 24h-recalls or food records only, in the case of rarely consumed
foods it is highly recommended to make use of additional consumption information from
sources such as food frequency questionnaires.
The default assumption of the MSM is, in the absence of other information specified, that all
individuals in the dataset are habitual consumers. For all habitual consumers an estimate of
intake is calculated.
page 3
If information about habitual consumption frequency is available, then this information needs
to be specified before carrying out the calculations. MSM handles the different types of this
information as follows:
• assume that all are habitual consumer (this is the default, see above) , or
• use population specific data from other sources such as surveys that define that
a certain percentage of individuals are habitual consumer, or
page 4
Statistical Background
The usual dietary intake is estimated with MSM in a three step procedure (Figure 1). In the
first step, the probability of eating a certain food on a random day is estimated for each
individual. Secondly, the usual amount of food intake on a consumption day is estimated. The
resulting numbers from step one and two are finally multiplied by each other to estimate the
usual daily intake for each individual.
Step 1:
Individual probability of consuming a certain food or nutrient is estimated by a logistic
regression model that may contain a set of covariates assumed to be predictive for
consumption like gender and age as well as information on consumption frequency if
page 5
available. The corresponding residuals are transformed to the real numbers and inter- and
intra-individual variances are estimated. Residuals are shrunken by the quotient of inter-
individual variance by intra-individual variance, back-transformed to the original scale and
used to estimate the probability of consumption for an individual on a random day.
Step 2:
The usual intake on consumption days is estimated by applying a linear regression model with
the observed food intake as a function of covariates that are assumed to be predictive for
dietary intake, i.e. gender and age as well as consumption frequency if available. Then, the
corresponding residuals of the linear regression model are transformed to normality/symmetry
by a two-parameter Box-Cox transformation. The transformed residuals are employed to
estimate inter- and intra-individual variance, which is used to shrink the mean food intake of
an individual to a grand mean. The quantities calculated in this shrinkage process for each
individual are back-transformed to the original scale and added to the estimate from the linear
regression model described above, resulting in an estimate for usual intake of an individual on
a consumption day.
Step 3:
The individuals’ probability of consumption on a random day (Step 1) is multiplied by the
usual intake of an individual on a consumption day (Step 2) giving an estimate for the usual
daily food or nutrient intake for each individual. Subsequently, descriptive statistics based on
individuals’ estimates are calculated to the characterize dietary intake distribution of the entire
study population.
A more detailed description of the method is available in the publication by Haubrock et al.
page 6
MSM Program
Overview
The MSM program has been developed to provide a user friendly interface for the Multiple
Source Method (MSM) as the basis of a service for nutritional scientists that need to estimate
dieatry intake in human populations. The program was implemented as a web-based program
built with Open Source components based on proven standard protocols and procedures. The
program is written in Perl using the application framework Catalyst
(http://www.catalystframework.org). As the statistical engine, the R system (http://www.r-
project.org) is used. For communication between R and the program, the Statistics::R package
is used. To support the concurrent use of the program by multiple users and the efficient
distribution of computing resources, the program employs a multi-user system with a resource
bag pattern (round robin) design for the R engine. User interactions with program make use of
the JavaScript library The Yahoo! User Interface Library (YUI)
(http://developer.yahoo.com/yui).
MSM Website
The program is a web-based tool and can be accessed through the MSM website which can be
reached at https://nugo.dife.de/msm/. This website encrypts the data sent between browser
and website. (See Appendix: Data Security/internal data handling - Encryption) This secure
connection is specially marked by the browser. With Firefox version 3 , the icon in front of
page 7
Figure 2: Starting page of the MSM website
Overview
When you open the link to the MSM program, the basic MSM screen will appear. In the main
window you can see four buttons (Figure 3) that will guide you through the program, helping
you to execute the four main steps of the program. At the top of the page you will see an
application menu. This menu contains the Help drop down menu that provides you with
further information about the program and MSM, including a copy of this user guide. Two
other menus are contained in the menu: File for data handling, and Calculation for setup and
analysis. The functions of both menus are explained in later sections.
The panel below the application menu shows the current date and time on the left hand side
The right side of the panel shows you your current session ID (SID: xxxxxx) and the number
page 8
of the R process you are using (Process: Rx, where x is a number, see the Appendix for more
information about sessions and R processes)
At the bottom of the page, a status panel is located . It is used to show selected program
specific messages.
Throughout the program question marks icons are attached to input fields and other
program options. If you need more information about this field or option, move the mouse
over the question mark and a 'tool tip' with a help text will appear. Upon click on the icon, the
same help text will appear in a larger pop-up panel, which can be moved and closed through
the closing x .
page 9
Step 1: Data Import
The first step is to import your data into the program. Please select Load Data in the File
menu at the top left hand corner or use the link provided by the button ( Figure 4).
The data file must contain at least the following variables: A column with numbers that
identify individuals in the data set and one column with the dietary intake data (response) .
This can be data from 24h-recalls, food records or any other short-term dietary assessment
method. For each individual, at least two measurements of the intake should be present. The
minimum requirement is to have at least one individual with two dasets of intake data. In
addition to the two mandatory columns, other variables can be included. A variable providing
additional information on consumption frequency is highly recommended in order to harness
page 10
the full power of the MSM analysis especially for episodically consumed foods. These data
are usually obtained from long-term assessment instruments such as a food frequency
questionnaires or food propensity questionnaires. The variable can have continous numeric
values. Additional columns for explanatory variables like gender, age or BMI (covariates) can
be added. (see example in Table 2) . As a requirement of the program, only one column for
the response is allowed; therefore for each intake measurement, a separate row has to be
created with the individual’s id, explanatory variables and food frequency information being
identical in each row. By defining a grouping variable you can analyse intake of more than
one group item such as foods, food groups or nutrients, in one program run. This grouping
variable currently needs to be an numeric variable.
Table 1: Small fictive data set with a structure suitable for MSM analysis: gram: intake data
(response), ffq: long-term consumption frequency information from a frequency
questionnaire, group: food group, agesex: interaction term for age and sex
id gram ffq sex age agesex group
1 439.9 1.24 2 45 90 1
1 5 1.24 2 45 90 1
2 9.7 1.53 2 47 94 1
2 0 1.53 2 47 94 1
3 684.8 1.08 1 57 57 1
3 849.2 1.08 1 57 57 1
4 108 1.46 2 36 72 1
4 174.8 1.46 2 36 72 1
5 150 1.14 2 53 106 1
5 0 1.14 2 53 106 1
6 251.6 2.51 2 54 108 1
6 487.4 2.51 2 54 108 1
7 224.9 2.87 2 59 118 1
7 512.9 2.87 2 59 118 1
8 707.5 1.65 1 57 57 1
8 230 1.65 1 57 57 1
9 392 1.1 2 47 94 1
9 0 1.1 2 47 94 1
10 0 0 1 40 40 1
10 0 0 1 40 40 1
11 0 0.93 1 54 54 1
11 460.2 0.93 1 54 54 1
12 0 0.64 1 40 40 1
12 144.5 0.64 1 40 40 1
13 0 0.46 1 41 41 1
13 0 0.46 1 41 41 1
14 823.2 3.02 2 35 70 1
14 160 3.02 2 35 70 1
15 9.7 0 1 47 47 1
15 4.3 0 1 47 47 1
16 480 3.01 1 63 63 1
16 423.8 3.01 1 63 63 1
page 11
Import
The Upload Data panel (Figure 5) lets you import your data into the MSM program.
First, click on the Browse.. button next to the the Data file input field and choose a suitable
local data file or add the correct location of the file manually into the field. After selection of
a file, the program will provide a Data set name based on the file name. You can change this
name as necessary. Please take care only not to use any special characters or spaces in the
data set name since this will lead to an error. The exception for this rule is the underscore (_)
that can be used without problems. The program will try to automatically detect the Column
separator using a set of standard separators If you use an uncommon Column Separator
you need to specify that in the corresponding input field. After file name, column separator
and data set name are specified, click the Upload button to import your data. Depending on
the size of the file to upload, this might take a few seconds.
page 12
Figure 6: Upload data panel after successful data import
After the upload is completed, you will get a message indicating that the data has been
imported successfully, e.g. “Upload of [data file] to /var/tmp/msm/upload and import into
[Data set name] data set was successful” ( Figure 6). If there is no message, please check the
Explorer frame on the left. If this is empty, your import wasn’t successful.
After the successful import of a dataset you can either upload another dataset or go to the next
step number 2, defining your analysis model.
Explorer Panel
Once you have successfully imported a dataset, you can see the Data set structure panel in
the Explorer section on the left side. The panel lists data sets and their variables. Click on the
plus icon next to the data set name. A tree-like list of variable names will appear. Close
this list by clicking on the minus icon next to the data set name.
page 13
The Explorer section shows a second panel labelled Data sets. This panel lists Input data
sets and Result data sets. You can view the data by selecting the respective data set name and
clicking on the view button on the bottom of the Data sets panel. The data will be shown in a
separate window panel that you can move around or remove through the closing icon .
page 14
Step 2: Define the statistical model
To proceed to the form for defining the calculation parameters, click on Calculation and
choose Setup in the application menu or use the link with button .
You will be directed to the Setup tab in the main section of the window (Figure 7). The other
tabs Results, Log and History will be explained in the following sections
The program will preselect the first data set from the alphabetically ordered list of available
input data sets. Select a dataset of your choice and provide information about your data
structure.
page 15
As first item you need to specify the name of the ID variable . The program has already
preselected the first variable from the data set selected earlier. Use this choice or select the
appropriate variable which identifies the study participants from the list of variables in the
dataset provided in the drop-down list. Now you can specify if your data set contains
information about more than one food group. If this is the case, then select the check box
Analysis by groups and specify the name of the group variable from the drop-down list that
appears below the check box ( Figure 8).
Start defining the MSM Model structure by specifying the name of the response variable .
This dependent variable should contain the intake data of the food or nutrient of interest from
the short term measurement (24h-recall or food record). Then define the right hand side of the
model in the field MSM regression model using the explanatory variables from of your data
set that have an influence on the response. These can be age or gender. The example data set
(Table 1) contains the covariates "sex", "age" , "agesex" and "ffq". A suitable model then
would be specified as "sex+age+agesex + ffq" . The interaction term "agesex" needs be added
before the analysis since interactions between covariates are currently not directly supported.
page 16
Figure 9: Section for defining consumption frequency and habitual consumer information
Figure 10: Default Consumption frequency option assigning habitual consumer status to all
individuals
If you have a variable in your dataset that provides information about long-term consumption
for all the individual participant, it can be specified in the drop-down variable list "Select an
additional frequency variable". Information from this variable is used to identify if a
participant is a consumer or not, in case there is no intake according to the short term
page 17
measurement (see also Section "Assessment of habitual consumers" on page 3). To enable
this selection list, deselect the default option described above. After you have made a
selection all other options in this section are disabled. (Figure 11)
Figure 11: Consumption frequency option that specifies a variable in the dataset describing
the individual consumption frequency
If you do not have information about long-term consumption for each individual, but know
the long-term consumption probability within the population, you can provide this
information in the input field “Specify a consumption probability from external source”.
(Figure 12) The value needs to be between 0 and 1 (use the (.) dot as decimal point). This
value is used to determine the overall proportion of consumers in the analysis if this value is
larger than the proportion determined from the 24h recall data. Otherwise the consumer status
is derived only from the values specified in the 24h recall. If no value is specified in this field
and no additional food frequency variable is given, then the method uses a default procedure
for assigning the consumer status: Of those participants recording no consumption in the
short-term instrument, one half are assigned to the consumer class, there as the other half is
considered to be 'true' non-consumers. Individuals are assigned to the consumer and non-
consumer groups by a random sampling method.
If you want to analyse your data based on the short-term data only and do not want any
information on consumption frequency being considered than you need to set the value for
“Specify a consumption probability from external source” to 0. If you assume that all
participants are consumers and no non-consumers (i.e. in the analysis of nutrients) you have
to set the value to 1.
page 18
Figure 12: Consumption frequency option that defines a common consumption probability
value
If you have no consumption frequency data available but do not want to use the default setting
then you can select the last option in the section Consumption frequency information. This
check-box option "Use a probability value of 0.5 (50%) to assign habitual consumer status"
(Figure 13) , assumes that there is a certain percentage (50%) of real habitual consumers
among the individuals not having consumed a food during to the short-term measurement
(24hour recall) period. Therefore, 50% of those not consuming in the 24h recall are randomly
assigned to a habitual consumer status. For those select in this way an intake estimate will be
calculated.
For example, given a percentage of non-consumers 20% from 24h recalls, this assumption
will treat 50% of those 20% , that is an additional 10% of the sample, as habitual consumers.
Therefore, 10% of the sample will remain habitual non-consumers, 90% will be considered as
habitual consumers by the MSM and will have an intake estimate calculated.
If this option is selected, all other options in this section are disabled.
Figure 13: Consumption frequency option using a fixed probability value for assigning
habitual consumer status
page 19
In the section Output you can define the result files and other related output. You can change
the name for the output in the field "Specify the name of the output". The program provides
a predefined value in this field that is derived from the dataset name.
After you finished filling in the form you can start the calculation by clicking the “Submit
model to MSM” button. Alternatively you can use the button link or the Run MSM
option from the Calculation menu.
page 20
Step 3: Calculation
Your browser will submit the form parameters to the MSM web program to start the analysis.
The program will check if all required variables (data set, id and response ) are properly
specified. If not, the setup page will be re-displayed and the incorrect fields are highlighted
with a red border and a yellow explanation message next to the field. In addition, a pop-up
box will alert you that there are errors on the page ( Figure 14) In this case, click OK to close
the alert box, correct the errors in the setup and resubmit your analysis.
If the setup was correct the view will remain at the Results tab. While the calculations are
ongoing, a progress indicator animation is displayed at the lower end of the Result tab (Figure
15) . While the calculations are in progress you can move between the tabs in the main section
but you should not select a menu link or reload the page. Doing so will interrupt the
communication to the MSM program and, although the calculation will continue to run, the
results might not be displayed properly.
page 21
various sized datasets. In case your program runs longer than 60 minutes you will receive a
time-out error from the web server. If this problem persists you might need to check your data
set and maybe split your dataset and/or and run only one (or a few) food group(s) at a time.
page 22
Step 4: Review your results
The program will automatically display the Results tab after it finished the calculation. When
you see an error message or if there are no results shown in the Results tab, an error occurred
during calculation. In that case select the Log tab and check the diagnostic output from the
program. This will give you an indication which error occurred. If possible, such as with
miss-specified models, please correct the values in the Setup form and re-run your analysis.
Please see the section Troubleshooting for more details on handling problems.
If the calculations were successfully completed you will see a section with the results
displayed ( Figure 16). In the case of a group-wise analysis you will receive a section for each
group.
The result section is divided by analysis and contains several items under the numbered, time-
stamped Analysis heading: One or, in the case of group-wise analysis, more tables with
descriptive statistics for the resulting variables including percentiles, one or more density
plot(s) showing the distribution of the resulting variables and a table with links to the
respective resulting data sets including the log files
page 23
In case of errors during the calculation or if certain conditions are encountered during the
analysis, error or warning messages will be displayed in a text box beneath the Univariate
statistics block. (Figure 12)
Figure 17: Warning note for a calculation with severely skewed distribution
If you run repeated analyses with the same or other already loaded datasets, then these results
will be appended below the already existing result sections.
page 24
Figure 18: Univariate statistics for example result data
At least three output files are generated for each analysis: Response estimate file, univariate
statistics file and a log file If a group-wise analysis is performed you have the choice in the
Setup form to have separate files for each group generated (default) or have a common result
file that will contain all group results. A common log file will be generated for a groupwise
analysis. The files are named according to the output name specified during setup and can be
downloaded using the links in the result file table.
The Univariate output file (Output-name_Univar.txt) contains all descriptive statistics shown
above in the Results tab. The Result file (Output-name.txt) contains the result variables
described above for each individual as well as an overview about input data (intake on day1
and day2 of the 24h-recall), consumer status on day 1 (C1) and day 2 (C2) and overall
consumer status (0 – non consumer, 1 - consumer) and probability of consumption
(P_response). (Figure 19) The Log file lists the parameter of the analysis and the putput of the
analysis.
The numbers are stored in the file as tabulator separated values using dots (.) as decimal point.
The files can be viewed with a text editor such as Wordpad on Windows systems (Please do
not use Notepad program, often the default editor, since this program can not handle the line
endings properly) or imported into any other program such as spreadsheets. Upon import
please take care to specify the correct column separator and the correct decimal point to
ensure the correct import of the values.
page 25
Figure 19: Content of result dataset for example analysis
Plot
The program automatically generates a distribution plot for the four different output variables
(for the variable description see section Tables and Files above). A kernel density estimation
is used with a Gaussian kernel and a common bandwidth for all output variables. Bandwidth
and number of observations are displayed at the bottom of the plot. This plot is meant for
illustrative not analytical purposes. Users are encouraged to base their assessment of the
resulting distribution on their own analysis of the resulting intake estimates. Next to the
univariate result table, a thumbnail representation of the plot is shown. To see the original size
image, please click on the plot. The plot will appear in a panel at the top of the page. (Figure
20) To close the panel, click on the closing icon. To store the image locally, right click on the
plot and use the Save image as.. option to save the plot.
page 26
Figure 20: Example density plots for the four result distributions
Log
You can find further information about the analysis and diagnostic output from the calculation
in the Log Tab (Figure 21). The output first lists the program version and the parameter of the
analysis and then lets you follow through the individual steps of the analysis. It will show
information and summary statistics of the regression modelling in MSM step 1 and step 2.
Box-Cox Transformation parameter will be displayed as well as warnings, if there were
problems with transforming the data (see Internal Procedures in special cases). Timestamps
for start and end of the analysis are also provided.
page 27
Figure 21: Example log content showing analysis parameters and regression statistics from
the Log tab
History
The MSM program remembers the analyses performed throughout a session. It records the
result files generated during your session and displays these in the History tab (Figure 22).
Each row in the table shows a result file with its analysis number, link to the file for download
the time it was created and the size of the file. The list is refreshed after each calculation but
this can also be done manually with clicking the Refresh list button
page 28
Troubleshooting
The following error messages or warning notes can appear in the result section after an
analysis.
Error messages
The following errors messages can appear when the data is not suitable for MSM analysis:
1. ERROR --> execution stopped: only 1 recall per subject!
2. ERROR --> no subjects with more than 1 positive intake in 24h-recalls, within person
variance can not be estimated!
1: The MSM needs at least two short-term measurements for at least one individual. No
analysis is possible if there is only one measurement for all participants, see below.
2: The MSM needs at least one participant that records larger than zero consumption at two
consumption occasions in the short-term instrument. Otherwise the within person variance
can not be estimated and therefore the overall analysis will fail.
Correct the errors above by only include measurements into the data set that meet the
conditions described above.
Warning notes
The following warning notes can appear in the result section when the certain conditions are
met during analysis. They indicate special handling of the analysis by the MSM program. See
also the section Internal procedures in special cases for detailed explanations of these
conditions. The response text identifies the short-term intake variable.
1. NOTE: only positive values for skewness of variable response during Box-Cox-
transformation!
NOTE: back transformation carried out with minimal skewness
2. NOTE: only negative values for skewness of variable response during Box-Cox-
transformation!
NOTE: continue calculation with parameters where skewness is closest to zero
page 29
page 30
Internal procedures in special cases
Extremely skewed data with only positive values for skewness
When only positive values for skewness are encountered during Box-Cox transformation of
residuals, MSM will use the best parameter estimates that lead to the residual distribution
skewness closest to zero. A warning note to that effect will be displayed in the Log and
Result tab (Note 1 ).
page 31
Comparison with other method(s)
The MSM is new tool to estimate the usual food intake using information from repeated 24h-
dietary recalls and food frequency questionnaires. A comprehensive comparison of the MSM
with the widely used NCI method (Tooze et al., 2006; Subar et al., 2006) can be found in the
table 2. A more detailed comparison of four different methods to estimate usual dietary intake
distributions through simulation and application studies can be found in the paper by
Souverein et al.2011. This latest analysis compared the methods 2-day within-person mean,
Iowa State University method (ISU), National Cancer Institute method (NCI), Multiple
Source Method (MSM) and the Statistical Program for Analysis of Dietary Exposure (Spade)
and shows that MSM performs equally well compared to these alternative methods.
page 32
Criterion NCI method MSM
Normality The amount part of the model is The residual with respect to
transformation transformed to normality covariates is transformed to
conditionally on covariates using normality using a two-parameter
a one-parameter Box-Cox Box-Cox transformation with power
transformation with positive equal to the reciprocal of a positive
real-valued power parameter integer and a real-valued location
parameter
page 33
Contributions
The Multiple Source Method was conceived and developed by Kurt Hoffmann. Minor
modifications of the method were implemented by Jennifer Haubrock, Heiner Boeing, Sven
Knüppel and Wolfgang Bernigau. Ulrich Harttig designed and maintains the web-based
program and programmed the statistical functions in R, assisted by Wolfgang Bernigau in
statistical issues.
This User Guide was prepared by Ulrich Harttig with support from Jennifer Haubrock, Sven
Knüppel, Karina Meidtner and Heiner Boeing.
This work has been carried out at the German Institute of Human Nutrition Potsdam-
Rehbruecke (DIfE) in the Department of Epidemiology headed by Heiner Boeing. Funding
for this work has been provided by the European Commission, 6th Framework Programme,
(FOOD-CT-2006-022895) through the EFCOVAL project (www.efcoval.eu). This guide is
the deliverable D3A.4 of EFCOVAL work package 3A.
page 34
References
Harttig U, Haubrock J, Knüppel S, Boeing H. 2011 The MSM program: web-based statistics
package for estimating usual dietary intake using the Multiple Source Method. Eur J
Clin Nutr. 65 S1:S87-91
Haubrock J, Nöthlings U, Volatier JL, Dekkers A, Ocké M, Harttig U, Illner AK, Knüppel S,
Andersen LF, Boeing H; European Food Consumption Validation Consortium.
Estimating usual food intake distributions by using the multiple source method in the
EPIC-Potsdam Calibration Study. J Nutr, 141, 914-20
Souverein OW, Dekkers AL, Geelen A, Haubrock J, de Vries JH, Ocké MC, Harttig U,
Boeing H, van 't Veer P. 2011 Comparing four methods to estimate usual intake
distributions. Eur J Clin Nutr. 65 S1:S92-S101
Subar, A. F., Dodd, K. W., Guenther, P. M., Kipnis, V., Midthune, D., McDowell, M., Tooze,
J. A., Freedman, L. S. & Krebs-Smith, S. M. 2006. The food propensity questionnaire:
concept, development, and validation for use as a covariate in a model to estimate
usual food intake. J Am Diet Assoc, 106, 1556-63.
Tooze, J. A., Midthune, D., Dodd, K. W., Freedman, L. S., Krebs-Smith, S. M., Subar, A. F.,
Guenther, P. M., Carroll, R. J. & Kipnis, V. 2006. A new statistical method for
estimating the usual intake of episodically consumed foods with application to their
distribution. J Am Diet Assoc, 106, 1575-87.
R Development Core Team (2008). R: A language and environment for statistical computing.
R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL
http://www.R-project.org.
page 35
Appendix
Encryption
The communication between a users browser and the web-server that hosts the MSM program
is encrypted using the SSL/TLS protocol via port 443 (https). This encryption prevents the
interception of data sent to and from the web-server by third parties. To ensure that the web
server with whom the browser communicates is the correct one, the web server identifies
itself by using a certificate issued by an established Certification Authority.
These and additional information about the certificate and therefore the web server can be
obtained using the browser: Most browsers indicate the use of encrypted communication by
displaying a lock symbol at the status bar or next to the URL address field. By clicking on the
lock a dialogue will appear and present the user with information about the certificate.
Encapsulation
The MSM program use a Sessions mechanism, a standard procedure for managing user
communication with a web based application or web site. Main feature of this mechanism is
the use of unique character strings that are assigned to each user. Using this unique session ID
all input data files and data sets and all resulting data sets and output files are tagged and
therefore uniquely assigned to the correct user. This mechanism enables the program to
encapsulate the data and analyses of a user and to separate different users from each other ,
page 36
making concurrent use possible. Users can only see and analyse their own data, tagged with
their own session id, even if multiple users are using the program at the same time
Temporary Storage
Uploaded data files and the resulting data files are stored only temporarily in specifically
designated directories to which only the program and the administrator of the web server has
access to. (Figure 23) Other users do not have access to these storage areas. The storage is
cleaned after the session has expired or after 30 days of creation. Data sets used by the R
statistical engine are kept only in the memory of the respective R process and are deleted after
an R process has ended.
.png
SID1.ABC
SID1.XYZ
Process files
./msm/Rx ./msm/scratch
Administrator (r/w)
page 37
Internal calculation details
Random numbers
Random numbers are provided by using the R rnorm function (package:stats) after setting a
seed value to a predefined number (123456789). The rnorm function uses an implementation
of the "Mersenne-Twister" algorithm for random number generation.
Quantiles
Quantiles are calculated using the quantile function (package:stats) with algorithm type 2
which estimates the inverse of the empirical distribution function with averaging at
discontinuities.
Density
For the distribution plot, the distribution density is calculated for each result variable. The
density function (package:stats) computes kernel density estimates using the Gaussian kernel
as default. The bandwidth from the first distribution density was used for all subsequent
density estimates. The image was generated with the plot command, subsequent distributions
were added with the lines command.
The detailed documentation of the R functions can be found in the R reference and on-line at
http://cran.r-project.org/manuals.html
page 38
Document Changes
02. September 2009
Draft version
11 September 2009
Original version 1.0 Ulrich Harttig
17 September 2009
Expanded on Appendix section Ulrich Harttig
27 November 2009
Updated internal handling section Ulrich Harttig
Added Document Changes Section Ulrich Harttig
07 May 2010
Typo fixes Ulrich Harttig
10 August 2010
Added description of modified handling of additional consumption frequency
information Ulrich Harttig
20. October 2010
Update of certificate information Ulrich Harttig
17. December 2010
added table with expected analysis duration
added new section "Program and Algorithm changes" Ulrich Harttig
27. January 2011
added Table index Ulrich Harttig
01. August 2011
updated References Ulrich Harttig
The MSM no longer requires that all individual must has the same number of short-
term measurements UlrichHarttig
page 39
Index
access..........................................................................................................................................7
Box-Cox....................................................................................................................6, 27, 31, 33
consumption...................................................................................3-6, 10, 16, 17, 18, 19, 25, 33
dietary..................................................................................................................1, 3, 5, 6, 10, 32
distribution................................................................................3, 6, 7, 23, 26, 31, 32, 33, 35, 38
Log................................................................................................................6, 15, 23, 27, 31, 33
MSM......................................................................1, 3, 5, 7, 8, 11, 12, 16, 20, 24, 31-33, 35, 36
Multiple Source Method.........................................................................................1, 3, 7, 34, 35
residual............................................................................................................................6, 31, 33
simulation............................................................................................................................32, 35
skewness........................................................................................................................24, 29, 31
transform...................................................................................................................6, 27, 31, 33
Illustration Index
Figure 1: Structure of the Multiple Source Method....................................................................5
Figure 2: Starting page of the MSM website..............................................................................8
Figure 3: Screen shot of the basic MSM page............................................................................9
page 40
Figure 23: Schema of file handling by the MSM program.......................................................37
Table Index
Table 1: Small fictive data set with a structure suitable for MSM analysis: gram: intake data
(response), ffq: long-term consumption frequency information from a frequency
questionnaire, group: food group, agesex: interaction term for age and sex............................11
Table 2: Average duration of MSM analyses using the MSM Web application. Duration
numbers are mean of duplicate measurements in minutes........................................................22
Table 3: Comparison of the MSM with the NCI method ........................................................33
page 41