0% found this document useful (0 votes)
13 views

MSM UserGuide

The Multiple Source Method (MSM) is a statistical approach for estimating usual dietary intake, combining short-term dietary data with consumption frequency information. It involves a three-step process to assess individual probabilities of food consumption and estimate usual intake, ultimately providing a population distribution of dietary intake. The MSM program is a web-based tool designed for nutritional scientists to facilitate this analysis, requiring specific data formats for input.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

MSM UserGuide

The Multiple Source Method (MSM) is a statistical approach for estimating usual dietary intake, combining short-term dietary data with consumption frequency information. It involves a three-step process to assess individual probabilities of food consumption and estimate usual intake, ultimately providing a population distribution of dietary intake. The MSM program is a web-based tool designed for nutritional scientists to facilitate this analysis, requiring specific data formats for input.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

MSM

Multiple Source Method (MSM) for


estimating usual dietary intake from short-term
measurement data

User Guide

EFCOVAL
Work package WP3A

Potsdam, 1. August 2011


Contents
The Multiple Source Method ....................................................................................................3
Short description..............................................................................................................................3
Assessment of habitual consumers..................................................................................................................3
Statistical Background.....................................................................................................................5
MSM Program...........................................................................................................................7
Overview...........................................................................................................................................7
MSM Website...................................................................................................................................7
Running the Program......................................................................................................................8
Overview..........................................................................................................................................................8
Step 1: Data Import........................................................................................................................................10
Input file format.....................................................................................................................................10
Import.....................................................................................................................................................12
Explorer Panel ......................................................................................................................................13
Step 2: Define the statistical model...............................................................................................................15
Step 3: Calculation.........................................................................................................................................21
Step 4: Review your results...........................................................................................................................23
Tables and Files.....................................................................................................................................24
Plot.........................................................................................................................................................26
Log ........................................................................................................................................................27
History ...................................................................................................................................................28
Troubleshooting.............................................................................................................................................29
Error messages......................................................................................................................................29
Warning notes........................................................................................................................................29
Internal procedures in special cases..............................................................................................31
Extremely skewed data with only positive values for skewness...................................................................31
Extremely skewed data with only negative values for skewness..................................................................31
Rare consumed foods, small sample size or missing data.............................................................................31
Comparison with other method(s)................................................................................................32
Contributions...........................................................................................................................34
References................................................................................................................................35
Appendix..................................................................................................................................36
Data security and internal data handling.....................................................................................36
Encryption......................................................................................................................................................36
Web server certificate (as of 20/10/10)..................................................................................................36
Encapsulation.................................................................................................................................................36
Temporary Storage........................................................................................................................................37
Internal calculation details............................................................................................................38
Random numbers...........................................................................................................................................38
Quantiles........................................................................................................................................................38
Density...........................................................................................................................................................38
Document Changes.........................................................................................................................39
Program and Algorithm changes..................................................................................................39
Index.........................................................................................................................................40
Illustration Index.....................................................................................................................40
Table Index..............................................................................................................................41

page 2
The Multiple Source Method

Short description
The Multiple Source Method (MSM) is a new statistical method for estimating usual dietary
intake of nutrients and food including episodically consumed foods for populations as well as
individuals. The strength of the method lies in its ability to combine dietary intake data , such
as 24h dietary recalls or food records, with supporting data on consumption frequency from
food frequency questionnaires or food propensity questionnaires and other external sources.
This information is used to distinguish the proportion of habitual consumer and habitual non
consumer. The MSM offers several options to include such data.(see section 'Assessment of
habitual non-consumers').

The method can make use of covariate information such as consumption frequency
information from an FFQ to improve the modelling of consumption probability and intake
amount. Precondition is that everybody in the sample provides this covariate information such
as frequency of consumption.

MSM calculates dietary intake for individuals first and then constructs the population
distribution based on the individual data. Although the method is able to estimate usual
dietary intake on the basis of 24h-recalls or food records only, in the case of rarely consumed
foods it is highly recommended to make use of additional consumption information from
sources such as food frequency questionnaires.

Assessment of habitual consumers


The definition how habitual consumers and non-consumers are determined is a prerequisite
step for MSM.

The default assumption of the MSM is, in the absence of other information specified, that all
individuals in the dataset are habitual consumers. For all habitual consumers an estimate of
intake is calculated.

page 3
If information about habitual consumption frequency is available, then this information needs
to be specified before carrying out the calculations. MSM handles the different types of this
information as follows:

• If there is individual consumption frequency information for everybody within the


sample, then:
• No reported frequency of use means habitual non consumer
• Reported frequency of use means habitual consumer

• If there is no individual consumption frequency information available then one can


specify one of the following common settings which will be applicable to all members
of the dataset:

• assume that all are habitual consumer (this is the default, see above) , or

• use population specific data from other sources such as surveys that define that
a certain percentage of individuals are habitual consumer, or

• assume that there is a certain percentage of habitual consumers among the


individuals not having consumed a food according to the short-term
measurement (24hour recall).
For example, one can assume that 50% of those not consuming in the 24h
recall period will consume during other periods, so that they are habitual
consumers. Given a percentage of non-consumers 20% in the recalls, then this
assumption will treat 50% of those 20% , that is an additional 10% of the
sample, as habitual consumers. Therefore, 10% of the sample will remain
habitual non-consumers, 90% will be considered as habitual consumers by the
MSM

page 4
Statistical Background
The usual dietary intake is estimated with MSM in a three step procedure (Figure 1). In the
first step, the probability of eating a certain food on a random day is estimated for each
individual. Secondly, the usual amount of food intake on a consumption day is estimated. The
resulting numbers from step one and two are finally multiplied by each other to estimate the
usual daily intake for each individual.

Figure 1: Structure of the Multiple Source Method

Step 1:
Individual probability of consuming a certain food or nutrient is estimated by a logistic
regression model that may contain a set of covariates assumed to be predictive for
consumption like gender and age as well as information on consumption frequency if

page 5
available. The corresponding residuals are transformed to the real numbers and inter- and
intra-individual variances are estimated. Residuals are shrunken by the quotient of inter-
individual variance by intra-individual variance, back-transformed to the original scale and
used to estimate the probability of consumption for an individual on a random day.

Step 2:
The usual intake on consumption days is estimated by applying a linear regression model with
the observed food intake as a function of covariates that are assumed to be predictive for
dietary intake, i.e. gender and age as well as consumption frequency if available. Then, the
corresponding residuals of the linear regression model are transformed to normality/symmetry
by a two-parameter Box-Cox transformation. The transformed residuals are employed to
estimate inter- and intra-individual variance, which is used to shrink the mean food intake of
an individual to a grand mean. The quantities calculated in this shrinkage process for each
individual are back-transformed to the original scale and added to the estimate from the linear
regression model described above, resulting in an estimate for usual intake of an individual on
a consumption day.

Step 3:
The individuals’ probability of consumption on a random day (Step 1) is multiplied by the
usual intake of an individual on a consumption day (Step 2) giving an estimate for the usual
daily food or nutrient intake for each individual. Subsequently, descriptive statistics based on
individuals’ estimates are calculated to the characterize dietary intake distribution of the entire
study population.

A more detailed description of the method is available in the publication by Haubrock et al.

page 6
MSM Program

Overview

The MSM program has been developed to provide a user friendly interface for the Multiple
Source Method (MSM) as the basis of a service for nutritional scientists that need to estimate
dieatry intake in human populations. The program was implemented as a web-based program
built with Open Source components based on proven standard protocols and procedures. The
program is written in Perl using the application framework Catalyst
(http://www.catalystframework.org). As the statistical engine, the R system (http://www.r-
project.org) is used. For communication between R and the program, the Statistics::R package
is used. To support the concurrent use of the program by multiple users and the efficient
distribution of computing resources, the program employs a multi-user system with a resource
bag pattern (round robin) design for the R engine. User interactions with program make use of
the JavaScript library The Yahoo! User Interface Library (YUI)
(http://developer.yahoo.com/yui).

MSM Website
The program is a web-based tool and can be accessed through the MSM website which can be
reached at https://nugo.dife.de/msm/. This website encrypts the data sent between browser
and website. (See Appendix: Data Security/internal data handling - Encryption) This secure
connection is specially marked by the browser. With Firefox version 3 , the icon in front of

the web address is marked by a blue background as well as a lock


symbol in the status bar. Internet Explorer version 7 and up depicts a lock symbol after the

address . In both cases, a click on the lock symbol reveals


information about the encrypted connection. The alternative addresses
http://nugo.dife.de/msm gets automatically redirected to the secure connection.
The website provides a short introduction and the link to the MSM program (Figure 2).

page 7
Figure 2: Starting page of the MSM website

Running the Program

Overview
When you open the link to the MSM program, the basic MSM screen will appear. In the main
window you can see four buttons (Figure 3) that will guide you through the program, helping
you to execute the four main steps of the program. At the top of the page you will see an
application menu. This menu contains the Help drop down menu that provides you with
further information about the program and MSM, including a copy of this user guide. Two
other menus are contained in the menu: File for data handling, and Calculation for setup and
analysis. The functions of both menus are explained in later sections.
The panel below the application menu shows the current date and time on the left hand side
The right side of the panel shows you your current session ID (SID: xxxxxx) and the number

page 8
of the R process you are using (Process: Rx, where x is a number, see the Appendix for more
information about sessions and R processes)
At the bottom of the page, a status panel is located . It is used to show selected program
specific messages.

Throughout the program question marks icons are attached to input fields and other
program options. If you need more information about this field or option, move the mouse
over the question mark and a 'tool tip' with a help text will appear. Upon click on the icon, the
same help text will appear in a larger pop-up panel, which can be moved and closed through
the closing x .

Figure 3: Screen shot of the basic MSM page

page 9
Step 1: Data Import
The first step is to import your data into the program. Please select Load Data in the File

menu at the top left hand corner or use the link provided by the button ( Figure 4).

Figure 4: Start with step 1 - loading data

Input file format


The data file must be a character delimited text (ASCII) file and must contain a header row
with variable names that uniquely identify the data columns. Allowed characters that serve are
column separator are: comma (,) tabulator (\t), semicolon (;) pipe (|), colon (:) and space ( ).
The decimal character for numbers is a dot (.) . If your original data is in a different format
such as a SAS, SPSS datafile, a spread sheet or any other binary format, please use the
original program to export the data set into a delimited text file such as *.csv .

The data file must contain at least the following variables: A column with numbers that
identify individuals in the data set and one column with the dietary intake data (response) .
This can be data from 24h-recalls, food records or any other short-term dietary assessment
method. For each individual, at least two measurements of the intake should be present. The
minimum requirement is to have at least one individual with two dasets of intake data. In
addition to the two mandatory columns, other variables can be included. A variable providing
additional information on consumption frequency is highly recommended in order to harness

page 10
the full power of the MSM analysis especially for episodically consumed foods. These data
are usually obtained from long-term assessment instruments such as a food frequency
questionnaires or food propensity questionnaires. The variable can have continous numeric
values. Additional columns for explanatory variables like gender, age or BMI (covariates) can
be added. (see example in Table 2) . As a requirement of the program, only one column for
the response is allowed; therefore for each intake measurement, a separate row has to be
created with the individual’s id, explanatory variables and food frequency information being
identical in each row. By defining a grouping variable you can analyse intake of more than
one group item such as foods, food groups or nutrients, in one program run. This grouping
variable currently needs to be an numeric variable.

Table 1: Small fictive data set with a structure suitable for MSM analysis: gram: intake data
(response), ffq: long-term consumption frequency information from a frequency
questionnaire, group: food group, agesex: interaction term for age and sex
id gram ffq sex age agesex group
1 439.9 1.24 2 45 90 1
1 5 1.24 2 45 90 1
2 9.7 1.53 2 47 94 1
2 0 1.53 2 47 94 1
3 684.8 1.08 1 57 57 1
3 849.2 1.08 1 57 57 1
4 108 1.46 2 36 72 1
4 174.8 1.46 2 36 72 1
5 150 1.14 2 53 106 1
5 0 1.14 2 53 106 1
6 251.6 2.51 2 54 108 1
6 487.4 2.51 2 54 108 1
7 224.9 2.87 2 59 118 1
7 512.9 2.87 2 59 118 1
8 707.5 1.65 1 57 57 1
8 230 1.65 1 57 57 1
9 392 1.1 2 47 94 1
9 0 1.1 2 47 94 1
10 0 0 1 40 40 1
10 0 0 1 40 40 1
11 0 0.93 1 54 54 1
11 460.2 0.93 1 54 54 1
12 0 0.64 1 40 40 1
12 144.5 0.64 1 40 40 1
13 0 0.46 1 41 41 1
13 0 0.46 1 41 41 1
14 823.2 3.02 2 35 70 1
14 160 3.02 2 35 70 1
15 9.7 0 1 47 47 1
15 4.3 0 1 47 47 1
16 480 3.01 1 63 63 1
16 423.8 3.01 1 63 63 1

page 11
Import

Figure 5: Upload Data panel for importing data sets

The Upload Data panel (Figure 5) lets you import your data into the MSM program.
First, click on the Browse.. button next to the the Data file input field and choose a suitable
local data file or add the correct location of the file manually into the field. After selection of
a file, the program will provide a Data set name based on the file name. You can change this
name as necessary. Please take care only not to use any special characters or spaces in the
data set name since this will lead to an error. The exception for this rule is the underscore (_)
that can be used without problems. The program will try to automatically detect the Column
separator using a set of standard separators If you use an uncommon Column Separator
you need to specify that in the corresponding input field. After file name, column separator
and data set name are specified, click the Upload button to import your data. Depending on
the size of the file to upload, this might take a few seconds.

page 12
Figure 6: Upload data panel after successful data import

After the upload is completed, you will get a message indicating that the data has been
imported successfully, e.g. “Upload of [data file] to /var/tmp/msm/upload and import into
[Data set name] data set was successful” ( Figure 6). If there is no message, please check the
Explorer frame on the left. If this is empty, your import wasn’t successful.
After the successful import of a dataset you can either upload another dataset or go to the next
step number 2, defining your analysis model.

Explorer Panel
Once you have successfully imported a dataset, you can see the Data set structure panel in
the Explorer section on the left side. The panel lists data sets and their variables. Click on the

plus icon next to the data set name. A tree-like list of variable names will appear. Close
this list by clicking on the minus icon next to the data set name.

page 13
The Explorer section shows a second panel labelled Data sets. This panel lists Input data
sets and Result data sets. You can view the data by selecting the respective data set name and
clicking on the view button on the bottom of the Data sets panel. The data will be shown in a
separate window panel that you can move around or remove through the closing icon .

page 14
Step 2: Define the statistical model
To proceed to the form for defining the calculation parameters, click on Calculation and

choose Setup in the application menu or use the link with button .

You will be directed to the Setup tab in the main section of the window (Figure 7). The other
tabs Results, Log and History will be explained in the following sections

Figure 7: Setup form for defining calculation parameters

The program will preselect the first data set from the alphabetically ordered list of available
input data sets. Select a dataset of your choice and provide information about your data
structure.

page 15
As first item you need to specify the name of the ID variable . The program has already
preselected the first variable from the data set selected earlier. Use this choice or select the
appropriate variable which identifies the study participants from the list of variables in the
dataset provided in the drop-down list. Now you can specify if your data set contains
information about more than one food group. If this is the case, then select the check box
Analysis by groups and specify the name of the group variable from the drop-down list that
appears below the check box ( Figure 8).

Figure 8: Specifying by-group analysis

Start defining the MSM Model structure by specifying the name of the response variable .
This dependent variable should contain the intake data of the food or nutrient of interest from
the short term measurement (24h-recall or food record). Then define the right hand side of the
model in the field MSM regression model using the explanatory variables from of your data
set that have an influence on the response. These can be age or gender. The example data set
(Table 1) contains the covariates "sex", "age" , "agesex" and "ffq". A suitable model then
would be specified as "sex+age+agesex + ffq" . The interaction term "agesex" needs be added
before the analysis since interactions between covariates are currently not directly supported.

The inclusion of Consumption frequency information is a strength and an important part of


the MSM (see also section "Short introduction"). This information can be provided in the
following section.(Figure 9)

page 16
Figure 9: Section for defining consumption frequency and habitual consumer information

If no Consumption frequency information is available or specified, the default of the


method is to assume that all individuals are habitual consumers. This option, named "Default
option assuming all as habitual consumer " (Figure 10) is preselected at the beginning of the
setup, all other options in this section are disabled.

Figure 10: Default Consumption frequency option assigning habitual consumer status to all
individuals

If you have a variable in your dataset that provides information about long-term consumption
for all the individual participant, it can be specified in the drop-down variable list "Select an
additional frequency variable". Information from this variable is used to identify if a
participant is a consumer or not, in case there is no intake according to the short term

page 17
measurement (see also Section "Assessment of habitual consumers" on page 3). To enable
this selection list, deselect the default option described above. After you have made a
selection all other options in this section are disabled. (Figure 11)

Figure 11: Consumption frequency option that specifies a variable in the dataset describing
the individual consumption frequency

If you do not have information about long-term consumption for each individual, but know
the long-term consumption probability within the population, you can provide this
information in the input field “Specify a consumption probability from external source”.
(Figure 12) The value needs to be between 0 and 1 (use the (.) dot as decimal point). This
value is used to determine the overall proportion of consumers in the analysis if this value is
larger than the proportion determined from the 24h recall data. Otherwise the consumer status
is derived only from the values specified in the 24h recall. If no value is specified in this field
and no additional food frequency variable is given, then the method uses a default procedure
for assigning the consumer status: Of those participants recording no consumption in the
short-term instrument, one half are assigned to the consumer class, there as the other half is
considered to be 'true' non-consumers. Individuals are assigned to the consumer and non-
consumer groups by a random sampling method.
If you want to analyse your data based on the short-term data only and do not want any
information on consumption frequency being considered than you need to set the value for
“Specify a consumption probability from external source” to 0. If you assume that all
participants are consumers and no non-consumers (i.e. in the analysis of nutrients) you have
to set the value to 1.

page 18
Figure 12: Consumption frequency option that defines a common consumption probability
value

If you have no consumption frequency data available but do not want to use the default setting
then you can select the last option in the section Consumption frequency information. This
check-box option "Use a probability value of 0.5 (50%) to assign habitual consumer status"
(Figure 13) , assumes that there is a certain percentage (50%) of real habitual consumers
among the individuals not having consumed a food during to the short-term measurement
(24hour recall) period. Therefore, 50% of those not consuming in the 24h recall are randomly
assigned to a habitual consumer status. For those select in this way an intake estimate will be
calculated.
For example, given a percentage of non-consumers 20% from 24h recalls, this assumption
will treat 50% of those 20% , that is an additional 10% of the sample, as habitual consumers.
Therefore, 10% of the sample will remain habitual non-consumers, 90% will be considered as
habitual consumers by the MSM and will have an intake estimate calculated.
If this option is selected, all other options in this section are disabled.

Figure 13: Consumption frequency option using a fixed probability value for assigning
habitual consumer status

page 19
In the section Output you can define the result files and other related output. You can change
the name for the output in the field "Specify the name of the output". The program provides
a predefined value in this field that is derived from the dataset name.

After you finished filling in the form you can start the calculation by clicking the “Submit

model to MSM” button. Alternatively you can use the button link or the Run MSM
option from the Calculation menu.

page 20
Step 3: Calculation

Your browser will submit the form parameters to the MSM web program to start the analysis.
The program will check if all required variables (data set, id and response ) are properly
specified. If not, the setup page will be re-displayed and the incorrect fields are highlighted
with a red border and a yellow explanation message next to the field. In addition, a pop-up
box will alert you that there are errors on the page ( Figure 14) In this case, click OK to close
the alert box, correct the errors in the setup and resubmit your analysis.

Figure 14: Error messages caused by incorrect setup values

If the setup was correct the view will remain at the Results tab. While the calculations are
ongoing, a progress indicator animation is displayed at the lower end of the Result tab (Figure
15) . While the calculations are in progress you can move between the tabs in the main section
but you should not select a menu link or reload the page. Doing so will interrupt the
communication to the MSM program and, although the calculation will continue to run, the
results might not be displayed properly.

Depending on size of your dataset (number of participants, number of repeated measurements,


number of groups to analyse) the calculations can last from a few seconds to several minutes
or longer. An example data set from a calibration study with 350 people, two 24h recalls and
39 food groups, with a total of more than 30000 data rows usually takes up to 4 minutes using
the current (19/11/09) MSM program installation. Table 2 shows the expected duration of

page 21
various sized datasets. In case your program runs longer than 60 minutes you will receive a
time-out error from the web server. If this problem persists you might need to check your data
set and maybe split your dataset and/or and run only one (or a few) food group(s) at a time.

Figure 15: Progress indicator during a MSM calculation

Individuals Replicate Foods/groups Observations(lines Duration (min.)


measurements of text file)
2,000 7 1 14,000 0.4
10,000 7 1 70,000 2.0
20,000 7 1 140,000 5.1
50,000 7 1 350,000 21.2

2,000 3 3 18,000 1.2


393 2 39 30,654 3.9
10,000 3 3 90,000 6.1
20,000 3 3 180,000 13.1
Table 2: Average duration of MSM analyses using the MSM Web application. Duration
numbers are mean of duplicate measurements in minutes.

page 22
Step 4: Review your results

The program will automatically display the Results tab after it finished the calculation. When
you see an error message or if there are no results shown in the Results tab, an error occurred
during calculation. In that case select the Log tab and check the diagnostic output from the
program. This will give you an indication which error occurred. If possible, such as with
miss-specified models, please correct the values in the Setup form and re-run your analysis.
Please see the section Troubleshooting for more details on handling problems.
If the calculations were successfully completed you will see a section with the results
displayed ( Figure 16). In the case of a group-wise analysis you will receive a section for each
group.

Figure 16: Result section after a successfully completed analysis

The result section is divided by analysis and contains several items under the numbered, time-
stamped Analysis heading: One or, in the case of group-wise analysis, more tables with
descriptive statistics for the resulting variables including percentiles, one or more density
plot(s) showing the distribution of the resulting variables and a table with links to the
respective resulting data sets including the log files

page 23
In case of errors during the calculation or if certain conditions are encountered during the
analysis, error or warning messages will be displayed in a text box beneath the Univariate
statistics block. (Figure 12)

Figure 17: Warning note for a calculation with severely skewed distribution

If you run repeated analyses with the same or other already loaded datasets, then these results
will be appended below the already existing result sections.

Tables and Files


The output table displayed in the results tab (Figure 18) shows the mean, standard deviation
(sd), kurtosis, skewness and the 5th to 95th percentiles of the resulting variables that are based
on your response variable. The resulting variables are named after the response variable, their
type is indicated by a suffix:
• Response_c_m: the mean of the response on consumption days in the short-term
measurement
• Response_c_usual: the usual intake for the response variable in consumers from the
short-term measurement calculated by the MSM
• Response_all_m: the mean of the response for all days of the short-term measurement
• Response_all_usual: usual daily intake for the response variable for all participants
calculated by the MSM method

page 24
Figure 18: Univariate statistics for example result data

At least three output files are generated for each analysis: Response estimate file, univariate
statistics file and a log file If a group-wise analysis is performed you have the choice in the
Setup form to have separate files for each group generated (default) or have a common result
file that will contain all group results. A common log file will be generated for a groupwise
analysis. The files are named according to the output name specified during setup and can be
downloaded using the links in the result file table.
The Univariate output file (Output-name_Univar.txt) contains all descriptive statistics shown
above in the Results tab. The Result file (Output-name.txt) contains the result variables
described above for each individual as well as an overview about input data (intake on day1
and day2 of the 24h-recall), consumer status on day 1 (C1) and day 2 (C2) and overall
consumer status (0 – non consumer, 1 - consumer) and probability of consumption
(P_response). (Figure 19) The Log file lists the parameter of the analysis and the putput of the
analysis.
The numbers are stored in the file as tabulator separated values using dots (.) as decimal point.
The files can be viewed with a text editor such as Wordpad on Windows systems (Please do
not use Notepad program, often the default editor, since this program can not handle the line
endings properly) or imported into any other program such as spreadsheets. Upon import
please take care to specify the correct column separator and the correct decimal point to
ensure the correct import of the values.

page 25
Figure 19: Content of result dataset for example analysis

Plot
The program automatically generates a distribution plot for the four different output variables
(for the variable description see section Tables and Files above). A kernel density estimation
is used with a Gaussian kernel and a common bandwidth for all output variables. Bandwidth
and number of observations are displayed at the bottom of the plot. This plot is meant for
illustrative not analytical purposes. Users are encouraged to base their assessment of the
resulting distribution on their own analysis of the resulting intake estimates. Next to the
univariate result table, a thumbnail representation of the plot is shown. To see the original size
image, please click on the plot. The plot will appear in a panel at the top of the page. (Figure
20) To close the panel, click on the closing icon. To store the image locally, right click on the
plot and use the Save image as.. option to save the plot.

page 26
Figure 20: Example density plots for the four result distributions

Log
You can find further information about the analysis and diagnostic output from the calculation
in the Log Tab (Figure 21). The output first lists the program version and the parameter of the
analysis and then lets you follow through the individual steps of the analysis. It will show
information and summary statistics of the regression modelling in MSM step 1 and step 2.
Box-Cox Transformation parameter will be displayed as well as warnings, if there were
problems with transforming the data (see Internal Procedures in special cases). Timestamps
for start and end of the analysis are also provided.

page 27
Figure 21: Example log content showing analysis parameters and regression statistics from
the Log tab

History
The MSM program remembers the analyses performed throughout a session. It records the
result files generated during your session and displays these in the History tab (Figure 22).
Each row in the table shows a result file with its analysis number, link to the file for download
the time it was created and the size of the file. The list is refreshed after each calculation but
this can also be done manually with clicking the Refresh list button

Figure 22: Example content of the History tab

page 28
Troubleshooting
The following error messages or warning notes can appear in the result section after an
analysis.

Error messages

The following errors messages can appear when the data is not suitable for MSM analysis:
1. ERROR --> execution stopped: only 1 recall per subject!
2. ERROR --> no subjects with more than 1 positive intake in 24h-recalls, within person
variance can not be estimated!

1: The MSM needs at least two short-term measurements for at least one individual. No
analysis is possible if there is only one measurement for all participants, see below.

2: The MSM needs at least one participant that records larger than zero consumption at two
consumption occasions in the short-term instrument. Otherwise the within person variance
can not be estimated and therefore the overall analysis will fail.

Correct the errors above by only include measurements into the data set that meet the
conditions described above.

Warning notes

The following warning notes can appear in the result section when the certain conditions are
met during analysis. They indicate special handling of the analysis by the MSM program. See
also the section Internal procedures in special cases for detailed explanations of these
conditions. The response text identifies the short-term intake variable.

1. NOTE: only positive values for skewness of variable response during Box-Cox-
transformation!
NOTE: back transformation carried out with minimal skewness
2. NOTE: only negative values for skewness of variable response during Box-Cox-
transformation!
NOTE: continue calculation with parameters where skewness is closest to zero

page 29
page 30
Internal procedures in special cases
Extremely skewed data with only positive values for skewness
When only positive values for skewness are encountered during Box-Cox transformation of
residuals, MSM will use the best parameter estimates that lead to the residual distribution
skewness closest to zero. A warning note to that effect will be displayed in the Log and
Result tab (Note 1 ).

Extremely skewed data with only negative values for skewness


When only negative values for skewness are encountered during Box-Cox transformation of
residuals, MSM will use the parameters that lead to the best skewness value for the residual
distribution (which is defined as being is closest to zero) to determining Lambda, W and
skewness for the Box-Cox transformation. Warning notes will be issued to the Log and
Result tab (Note 2)

Rare consumed foods, small sample size or missing data


If no subjects with more than 1 positive intake in 24h-recalls are encountered the program is
stopped with warning, since within person variance can not be estimated under this condition.
(Error 3)

page 31
Comparison with other method(s)
The MSM is new tool to estimate the usual food intake using information from repeated 24h-
dietary recalls and food frequency questionnaires. A comprehensive comparison of the MSM
with the widely used NCI method (Tooze et al., 2006; Subar et al., 2006) can be found in the
table 2. A more detailed comparison of four different methods to estimate usual dietary intake
distributions through simulation and application studies can be found in the paper by
Souverein et al.2011. This latest analysis compared the methods 2-day within-person mean,
Iowa State University method (ISU), National Cancer Institute method (NCI), Multiple
Source Method (MSM) and the Statistical Program for Analysis of Dietary Exposure (Spade)
and shows that MSM performs equally well compared to these alternative methods.

page 32
Criterion NCI method MSM

General Use of a two-part mixed-effects Twofold use of the shrinkage


concept regression model to simulate technique to estimate individual usual
data for estimating the usual intake and usual intake distribution
intake distribution

Application Can be applied to intake of Can be applied to intake of nutrients


nutrients and foods, including and foods, including episodically
episodically consumed foods consumed foods

Adjustment Estimates of usual intake on Estimates of usual intake on


consumption days and estimates consumption days and estimates of
of the probability of the probability of consumption can
consumption can both be both be adjusted for covariates
adjusted for covariates

Allowing for Correlation between Correlation between consumption-


correlation consumption-day amount and day amount and probability of
probability of consumption is consumption is allowed for by
allowed for by an additional starting the shrinkage procedures
model parameter from individual means

Normality The amount part of the model is The residual with respect to
transformation transformed to normality covariates is transformed to
conditionally on covariates using normality using a two-parameter
a one-parameter Box-Cox Box-Cox transformation with power
transformation with positive equal to the reciprocal of a positive
real-valued power parameter integer and a real-valued location
parameter

Back- Use of an approximate formula Use of a mathematically derived


transformation that underestimates all formula that is precise for all
percentiles of the usual intake considered Box-Cox transformations
distribution as long as the power
parameter of the Box-Cox
transformation is small

Table 3: Comparison of the MSM with the NCI method

page 33
Contributions

The Multiple Source Method was conceived and developed by Kurt Hoffmann. Minor
modifications of the method were implemented by Jennifer Haubrock, Heiner Boeing, Sven
Knüppel and Wolfgang Bernigau. Ulrich Harttig designed and maintains the web-based
program and programmed the statistical functions in R, assisted by Wolfgang Bernigau in
statistical issues.
This User Guide was prepared by Ulrich Harttig with support from Jennifer Haubrock, Sven
Knüppel, Karina Meidtner and Heiner Boeing.

This work has been carried out at the German Institute of Human Nutrition Potsdam-
Rehbruecke (DIfE) in the Department of Epidemiology headed by Heiner Boeing. Funding
for this work has been provided by the European Commission, 6th Framework Programme,
(FOOD-CT-2006-022895) through the EFCOVAL project (www.efcoval.eu). This guide is
the deliverable D3A.4 of EFCOVAL work package 3A.

page 34
References

Harttig U, Haubrock J, Knüppel S, Boeing H. 2011 The MSM program: web-based statistics
package for estimating usual dietary intake using the Multiple Source Method. Eur J
Clin Nutr. 65 S1:S87-91

Haubrock J, Nöthlings U, Volatier JL, Dekkers A, Ocké M, Harttig U, Illner AK, Knüppel S,
Andersen LF, Boeing H; European Food Consumption Validation Consortium.
Estimating usual food intake distributions by using the multiple source method in the
EPIC-Potsdam Calibration Study. J Nutr, 141, 914-20

Souverein OW, Dekkers AL, Geelen A, Haubrock J, de Vries JH, Ocké MC, Harttig U,
Boeing H, van 't Veer P. 2011 Comparing four methods to estimate usual intake
distributions. Eur J Clin Nutr. 65 S1:S92-S101

Subar, A. F., Dodd, K. W., Guenther, P. M., Kipnis, V., Midthune, D., McDowell, M., Tooze,
J. A., Freedman, L. S. & Krebs-Smith, S. M. 2006. The food propensity questionnaire:
concept, development, and validation for use as a covariate in a model to estimate
usual food intake. J Am Diet Assoc, 106, 1556-63.

Tooze, J. A., Midthune, D., Dodd, K. W., Freedman, L. S., Krebs-Smith, S. M., Subar, A. F.,
Guenther, P. M., Carroll, R. J. & Kipnis, V. 2006. A new statistical method for
estimating the usual intake of episodically consumed foods with application to their
distribution. J Am Diet Assoc, 106, 1575-87.

R Development Core Team (2008). R: A language and environment for statistical computing.
R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL
http://www.R-project.org.

page 35
Appendix

Data security and internal data handling


Data security in the MSM program is provided by a three step approach: encryption to
protect the data transfer between user and the program from third parties, encapsulation to
separate data from multiple users and temporary storage for the duration of user
interaction/session with the program .

Encryption
The communication between a users browser and the web-server that hosts the MSM program
is encrypted using the SSL/TLS protocol via port 443 (https). This encryption prevents the
interception of data sent to and from the web-server by third parties. To ensure that the web
server with whom the browser communicates is the correct one, the web server identifies
itself by using a certificate issued by an established Certification Authority.

Web server certificate (as of 20/10/10)


The web server uses a server certificate issued by Go Daddy Secure Certification Authority
(http://www.godaddy.com ) and has the serial number 4F14A709085C73
The certificate has the following fingerprints:
SHA1: 90:E4:C5:6E:52:C2:BA:DB:3C:EF:A0:8B:C7:FC:C4:08:62:EC:43:A6
MD5: A0:15:FC:43:ED:65:59:E5:90:8C:EB:21:CB:A8:EF:F8
The certificate is valid until: Oct 29 00:53:29 2011 GMT

These and additional information about the certificate and therefore the web server can be
obtained using the browser: Most browsers indicate the use of encrypted communication by
displaying a lock symbol at the status bar or next to the URL address field. By clicking on the
lock a dialogue will appear and present the user with information about the certificate.

Encapsulation
The MSM program use a Sessions mechanism, a standard procedure for managing user
communication with a web based application or web site. Main feature of this mechanism is
the use of unique character strings that are assigned to each user. Using this unique session ID
all input data files and data sets and all resulting data sets and output files are tagged and
therefore uniquely assigned to the correct user. This mechanism enables the program to
encapsulate the data and analyses of a user and to separate different users from each other ,

page 36
making concurrent use possible. Users can only see and analyse their own data, tagged with
their own session id, even if multiple users are using the program at the same time

Temporary Storage
Uploaded data files and the resulting data files are stored only temporarily in specifically
designated directories to which only the program and the administrator of the web server has
access to. (Figure 23) Other users do not have access to these storage areas. The storage is
cleaned after the session has expired or after 30 days of creation. Data sets used by the R
statistical engine are kept only in the memory of the respective R process and are deleted after
an R process has ended.

Input file Output files

.png

Web server - MSM

Upload files display


export
./msm/upload SID1.IN Rx
SID1.OUT SID1.out.png
memory

SID1.ABC
SID1.XYZ
Process files
./msm/Rx ./msm/scratch

User (-/-) Result files

Administrator (r/w)

Figure 23: Schema of file handling by the MSM program

page 37
Internal calculation details
Random numbers
Random numbers are provided by using the R rnorm function (package:stats) after setting a
seed value to a predefined number (123456789). The rnorm function uses an implementation
of the "Mersenne-Twister" algorithm for random number generation.

Quantiles
Quantiles are calculated using the quantile function (package:stats) with algorithm type 2
which estimates the inverse of the empirical distribution function with averaging at
discontinuities.

Density
For the distribution plot, the distribution density is calculated for each result variable. The
density function (package:stats) computes kernel density estimates using the Gaussian kernel
as default. The bandwidth from the first distribution density was used for all subsequent
density estimates. The image was generated with the plot command, subsequent distributions
were added with the lines command.

The detailed documentation of the R functions can be found in the R reference and on-line at
http://cran.r-project.org/manuals.html

page 38
Document Changes
02. September 2009
Draft version
11 September 2009
Original version 1.0 Ulrich Harttig
17 September 2009
Expanded on Appendix section Ulrich Harttig
27 November 2009
Updated internal handling section Ulrich Harttig
Added Document Changes Section Ulrich Harttig
07 May 2010
Typo fixes Ulrich Harttig
10 August 2010
Added description of modified handling of additional consumption frequency
information Ulrich Harttig
20. October 2010
Update of certificate information Ulrich Harttig
17. December 2010
added table with expected analysis duration
added new section "Program and Algorithm changes" Ulrich Harttig
27. January 2011
added Table index Ulrich Harttig
01. August 2011
updated References Ulrich Harttig

Program and Algorithm changes


17. December 2010
Log files are now part of the result files and also show in the 'History' tab
UlrichHarttig

The MSM no longer requires that all individual must has the same number of short-
term measurements UlrichHarttig

page 39
Index
access..........................................................................................................................................7
Box-Cox....................................................................................................................6, 27, 31, 33
consumption...................................................................................3-6, 10, 16, 17, 18, 19, 25, 33
dietary..................................................................................................................1, 3, 5, 6, 10, 32
distribution................................................................................3, 6, 7, 23, 26, 31, 32, 33, 35, 38
Log................................................................................................................6, 15, 23, 27, 31, 33
MSM......................................................................1, 3, 5, 7, 8, 11, 12, 16, 20, 24, 31-33, 35, 36
Multiple Source Method.........................................................................................1, 3, 7, 34, 35
residual............................................................................................................................6, 31, 33
simulation............................................................................................................................32, 35
skewness........................................................................................................................24, 29, 31
transform...................................................................................................................6, 27, 31, 33

Illustration Index
Figure 1: Structure of the Multiple Source Method....................................................................5
Figure 2: Starting page of the MSM website..............................................................................8
Figure 3: Screen shot of the basic MSM page............................................................................9

Figure 4: Start with step 1 - loading data .................................................................................10


Figure 5: Upload Data panel for importing data sets................................................................12

Figure 6: Upload data panel after successful data import.........................................................13


Figure 7: Setup form for defining calculation parameters........................................................15

Figure 8: Specifying by-group analysis....................................................................................16


Figure 9: Section for defining consumption frequency and habitual consumer information. . .17
Figure 10: Default Consumption frequency option assigning habitual consumer status to all
individuals.................................................................................................................................17
Figure 11: Consumption frequency option that specifies a variable in the dataset describing
the individual consumption frequency......................................................................................18
Figure 12: Consumption frequency option that defines a common consumption probability
value .........................................................................................................................................19
Figure 13: Consumption frequency option using a fixed probability value for assigning
habitual consumer status ..........................................................................................................19

Figure 14: Error messages caused by incorrect setup values....................................................21


Figure 15: Progress indicator during a MSM calculation.........................................................22

Figure 16: Result section after a successfully completed analysis...........................................23


Figure 17: Warning note for a calculation with severely skewed distribution.........................24
Figure 18: Univariate statistics for example result data............................................................25
Figure 19: Content of result dataset for example analysis........................................................26
Figure 20: Example density plots for the four result distributions...........................................27
Figure 21: Example log content showing analysis parameters and regression statistics from
the Log tab................................................................................................................................28
Figure 22: Example content of the History tab.........................................................................28

page 40
Figure 23: Schema of file handling by the MSM program.......................................................37

Table Index
Table 1: Small fictive data set with a structure suitable for MSM analysis: gram: intake data
(response), ffq: long-term consumption frequency information from a frequency
questionnaire, group: food group, agesex: interaction term for age and sex............................11
Table 2: Average duration of MSM analyses using the MSM Web application. Duration
numbers are mean of duplicate measurements in minutes........................................................22
Table 3: Comparison of the MSM with the NCI method ........................................................33

page 41

You might also like