Dependency methods
They assume that the variables analyzed are divided into two groups: the dependent and independent variables. The goal of the methods is dependent on whether the set of independent variables affects all dependent variables and how. They can be classified into two subgroups according to the variable (s) dependent (s) is (are) quantitative or qualitative. If the dependent variable is quantitative some techniques that can be applied are: Regression Analysis, Survival Analysis, Analysis of variance, Canonical Correlation If the dependent variable is qualitative some techniques that can be applied are: Discriminant Analysis, Logistic regression models, Conjoint Analysis
Regression analysis
Technique adequate if in the analysis exists one or several dependent metric variables whose value depends of one or more independent metrics variables. For example, trying to predict the annual expenditure on Christmas of a person from their income level, education level, gender and age.
Survival Analysis
Similar to regression analysis but with the difference that the independent variable is the time of survival of an individual or object. For example, try to predict the time spent in unemployment of an individual from their level of education and age.
Analysis of variance
They are used in situations where the total sample is divided into several groups based on one or more independent nonmetric variables and the dependent variables analyzed are metric. It aims to find out if there are significant differences between the groups in terms of the dependent variables. For example, are there differences in the level of cholesterol by gender? Does it affect also the type of occupation?
Canonical Correlation
Its aim is to connect simultaneously several independent and dependent metric variables defining linear combinations of each set of variables that maximize the correlation between the two sets of variables. For example, analyzing how is related the time dedicated to work and leisure for a person with an specific income level, age and education level.
Discriminant Analysis
This technique gives optimal classification rules of new observations where is unknown its source group based on the information provided by the values that in it takes the independent variables. For example, determining the financial ratios that best allow discriminating between profitable and unprofitable.
Logistic regression models
Are regression models in which the dependent variable is not metric. They are used as an alternative to the discriminant analysis when normal assumption cannot be assumed.
Conjoint Analysis
It is a technique that analyzes the effect of independent non-metric variables on metric or nonmetric variables. The difference with the analysis of variance is based on two facts: the dependent variables can be non-metric and the values of the independent variables are not set by metrics analyst. In other disciplines is known as Design of Experiments. For example, a company wants to design a new product and it needs to specify the shape of the container, its price per package content and chemical composition. Presents various compositions of these four factors. 100 customers provide a ranking of the combinations that are presented. It wants to determine the optimal values of these four factors.
Interdependence methods
These methods do not distinguish between dependent and independent variables and the objective is to identify which variables are related, how they are, and why. They can be classified into two groups according to the type of data to analyze whether metric or nonmetric. If data are metric can be used, among others, the following techniques: Factorial Analysis and Principal Component Analysis, Multidimensional Scales, Cluster Analysis.
Factor Analysis and Principal Component Analysis
Is used to analyze interactions between a large number of variables such interrelationships explaining metrics in terms of fewer variables called factors (if unobservable) or principal components (if they are observable). For example, if a financial analyst wants to determine which is the financial health of a company based on the knowledge of a number of financial ratios, building several numerical indices that define their situation, the problem would be resolved by analyzing Principal Components. If a psychologist wants to determine the factors that characterize an individual's intelligence from their answers to an IQ test, can use to solve this problem a Factorial Analysis.
Multidimensional Scales
Is intended to transform judgments of preference or similarity in distances represented in a multidimensional space. Consequently a map is constructed in which positions represents the objects compared. Those who are similar are closed and far from the dissimilar ones. For example, look at the soft drinks market, perceptions that a consumer group has about a list of drinks and brands in order to study how a consumer uses subjective factors when classifying these products.
Cluster Analysis
Its aim is to classify a sample of entities (individuals or variables) into a small number of groups so that observations belonging to a group are very similar to each other and very dissimilar to the rest. Unlike discriminant analysis is unknown the number and composition of such groups. For example, sorting food groups (fish, meat, vegetables and milk) in terms of its nutritional value. If the data are not metric can be used, in addition to multidimensional scaling and cluster analysis, the following techniques: Correspondence Analysis Log-linear models
Correspondence Analysis
Applies to multidimensional contingency tables and pursues a similar objective of multidimensional scales but simultaneously representing the rows and columns of the contingency tables. For example, unemployment in Aragon analyze considering the province, sex, age and educational level of the unemployed
Log-linear models
They apply to multidimensional contingency tables and multidimensional dependencies modeling the observed variables that seek to explain the observed frequencies.
Structural Methods
They assume that the variables are divided into two groups: the dependent variable and the independent. The objective of these methods is to analyze not only as independent variables to the dependent variables affect, but also how variables relate the two groups together. They analyze the relationships between a group of variables represented by systems of simultaneous equations which assume that some of them (called constructs) are measured with error from other observable variables called indicators. The models consist, therefore, of two parts: a structural model that specifies the dependency relationships between the latent constructs and a measurement model that specifies how the indicators are related to their corresponding constructs. For example, analyzing how they relate to the levels of use of the services of a company with the perceptions that customers have of it.
Multivariate analysis steeps 1. Goals of the analysis
The problem is specified defining objectives and multivariate techniques that will be used. The investigator must establish the problem conceptually defining the concepts and relations that are fundamental to the investigation. It must determine whether such relationships will be relations of dependence or interdependence. With all these the variables to observe are determined.
2. Design of the analysis
Determine the sample size, the equations to estimate (if applicable), the distances to calculate (if applicable) and the estimation techniques employed. Once this is determined we can proceed to observe the data.
3. Hypotheses of the Analysis
We evaluate the assumptions underlying the multivariate technique. These hypotheses may be of normality, linearity, independence, homoscedasticity, etc. You must also decide what to do with the missing data.
4. Analytical procedure
We estimate the model and we evaluate the fit to the data. In this step may appear unusual observations (outliers) or influential whose influence on the estimates and the goodness of fit must be analyzed.
5. Interpretation of the results
Such interpretations can lead to additional specifications or model variables with which you can return back to steps 3) and 4)
6. Analysis Validation
Is to establish the validity of the results obtained by analyzing whether the results, obtained with the sample, is generalized to the population from which it comes. This sample can be divided into several parts in which the model is re-estimated and the results are compared. Other techniques that can be used here are resampling techniques (jackknife and bootstrap)
Example
What technique are you going to apply?. Define what to do in each steep. Goals of the analysis: Predicting the amount of money a person spends in cinema depending on income level, education level, gender and age which would allow us to better understand what the patterns of behavior of the population are. Goals of the analysis: Detect if some behavior emerges (or interrelation) on the data you have.
MANOVA by hand
Example Suppose you want to determine whether the brand of laundry detergent used and the temperature affects the amount of dirt removed from your laundry. To this end, you buy two different brand of detergent (Super and Best) and choose three different temperature levels (cold, warm, and hot). Then you divide your laundry randomly into 6r piles of equal size and assign each r piles into the combination of (Super and Best) and (cold, warm, and hot). In this example, we are interested in testing Null Hypotheses H0D : The amount of dirt removed does not depend on the type of detergent H0T : The amount of dirt removed does not depend on the temperature
One says the experiment has two factors (Factor Detergent, Factor Temperature) at a = 2(Super and Best) and b = 3(cold, warm and hot) levels. Thus there are ab = 3 2 = 6 different combinations of detergent and temperature. With each combination you wash r = 4 loads. r is called the number of replicates. This sums up to n = abr = 24 loads in total. The amounts Yijk of dirt removed when washing sub pile k (k = 1, 2, 3, 4) with detergent i (i = 1, 2) at temperature j (j = 1, 2, 3) are recorded in next table.