Engineering Data Analysis
Engineering Data Analysis
• Engineering Data Analysis (EDA) is an indispensable analysis tool for the engineering team of the
industries to analyze processes, integration, and yield (conversion rate) effectively to enhance the
competitiveness of the company.
• Data analysis is the process of inspecting, cleaning, transforming, and interpreting data to discover
meaningful insights, draw conclusions, and support decision-making. It involves using various
techniques, tools, and methodologies to extract valuable information from raw data, which can be in the
form of numbers, text, images, or any other structured or unstructured format.
• The procedure helps reduce the risks inherent in decision-making by providing useful insights and
statistics, often presented in charts, images, tables, and graphs.
• Qualitative Data Analysis - The qualitative data analysis method derives data via words, symbols,
pictures, and observations. This method doesn’t use statistics. The most common qualitative methods
include:
• Quantitative Data Analysis - Also known as statistical data analysis methods collect raw data and
process it into numerical data. Quantitative analysis methods include:
- Hypothesis Testing, for assessing the truth of a given hypothesis or theory for a data set or
demographic.
- Mean, or average determines a subject’s overall trend by dividing the sum of a list of numbers
by the number of items on the list.
- Sample Size Determination uses a small sample taken from a larger group of people and
analyzed. The results gained are considered representative of the entire body.
3. Define/ explain planning of survey.
• Planning a survey involves the systematic process of designing and organizing the various aspects of a
survey study to collect accurate, relevant, and meaningful data from a targeted group of individuals or
entities. Proper planning is crucial to ensure that the survey objectives are met and that the collected
data is of high quality and can provide valuable insights.
• Conducting a survey involves collecting data from a group of individuals or entities to gather insights,
opinions, or information about a specific topic.
5. Define/explain Experiments.
• Experiment is a scientific or systematic procedure carried out to investigate, study, and understand the
effects of one or more variables on a phenomenon.
• Experiments are designed to establish cause-and-effect relationships between variables and are
commonly used in various scientific fields, including physics, chemistry, biology, psychology, and social
sciences.
• Experimentation is the systematic process of conducting experiments to explore, test, and gather
empirical evidence about the relationships between variables in controlled settings.
• It involves deliberately manipulating one or more independent variables while observing and
measuring their effects on dependent variables.
• The process of conducting an experiment involves a series of systematic steps designed to investigate
and understand the relationships between variables in a controlled setting.
1. Identify the Research Question or Objective - Clearly define what you want to study or explore
through the experiment. What specific aspect of the phenomenon do you want to understand?
2. Formulate a Hypothesis- Develop a testable statement that predicts the expected relationship
between the independent and dependent variables. The hypothesis guides the experiment and
provides a basis for comparison.
3. Select Variables- Identify the independent variable(s) that you will manipulate and the
dependent variable(s) that you will measure to assess the effects.
4. Design the Experiment- Determine the overall structure and design of the experiment. This
includes selecting the experimental and control groups, deciding how the independent variable
will be manipulated, and planning the data collection process.
5. Random Assignment- If applicable, randomly assign participants or subjects to the experimental
and control groups. Randomization helps ensure that the groups are comparable and minimizes
bias.
6. Manipulate the Independent Variable- Intentionally change the value or condition of the
independent variable in the experimental group while keeping it constant in the control group.
7. Data Collection- Collect data by measuring the dependent variable(s) in both the experimental
and control groups. Ensure that the data collection process is consistent and accurate.
8. Control Variables- Identify and control any other variables that could potentially affect the
results. These are known as extraneous variables. By keeping them constant or accounting for
them, you can isolate the effects of the independent variable.
9. Implement Data Analysis-Choose appropriate statistical methods to analyze the collected data.
Determine whether there are statistically significant differences between the experimental and
control groups.
10. Interpret Results- Analyze the results in the context of your hypothesis. Determine whether the
observed differences in the dependent variable(s) are due to the manipulation of the
independent variable or if they could have occurred by chance.
11. Draw Conclusions- Based on the analysis, draw conclusions about whether the hypothesis is
supported or rejected. Explain the implications of the findings and how they contribute to the
understanding of the research question.
12. Consider Limitations-Reflect on the limitations of the experiment. Discuss potential sources of
error, the generalizability of the results, and any constraints that might have impacted the study.
13. Replication and Validation- Replicate the experiment with different samples or settings to
validate the findings and enhance the reliability of the results.
14. Report and Communicate- Prepare a comprehensive report that includes the research question,
hypothesis, methods, results, and conclusions. Communicate the findings to the scientific
community through presentations, publications, or other appropriate channels.
• Data presentation refers to the process of organizing, visualizing, and communicating data in a
meaningful and understandable manner.
• Effective data presentation enhances the ability to convey insights, trends, and patterns to an
audience, whether they are experts in the field or non-technical individuals.
Grouped Data- involves organizing individual data points into intervals or ranges and then counting the
number of data points that fall within each interval. Grouping data is useful when dealing with a large set
of data to simplify analysis and presentation. Intervals are often chosen to make the data more
manageable and to highlight patterns or trends.
Ungrouped Data- also known as raw data, consists of individual observations or values that have not
been categorized or grouped in any way. Each data point represents a separate piece of information.
Ungrouped data is the most basic form of data and is often collected directly from sources.
• the main difference between grouped and ungrouped data lies in how the data is organized.
Ungrouped data consists of individual, separate data points, while grouped data involves categorizing
data points into intervals or ranges and counting how many falls into each interval. Grouped data is often
used for simplifying analysis and presentation when dealing with large datasets.
Measures of central tendency are statistical values that provide insight into the center or typical value of
a dataset. They help summarize the data by indicating where the "center" of the distribution lies. The
three main measures of central tendency are:
1. Mean- also known as the average, is calculated by summing up all the data points in a dataset
and then dividing by the total number of data points.
2. Median- is the middle value when the data is arranged in ascending or descending order. If
there's an odd number of data points, the median is the middle value. If there's an even number,
the median is the average of the two middle values.
3. Mode- is the value that appears most frequently in the dataset.