Module 5 Lecture Note
Module 5 Lecture Note
1
o Characteristics:
▪ Easily searchable.
▪ Often stored in relational databases.
4. Unstructured Data
o Description: Data that does not have a predefined structure.
o Examples: Emails, social media posts, videos.
o Characteristics:
▪ More complex to analyze.
▪ Often requires advanced tools for processing.
5. Semi-structured Data
o Description: Data that does not conform to a fixed schema but has some
organizational properties.
o Examples: JSON files, XML documents.
o Characteristics:
▪ Flexible format.
▪ Can be processed using specific tools and techniques.
Sources of Data
1. Primary Data
o Description: Data collected first hand for a specific purpose.
o Examples: Surveys, experiments, interviews.
o Advantages:
▪ Specific to the researcher's needs.
▪ High reliability and accuracy.
o Disadvantages:
▪ Time-consuming and expensive to collect.
2. Secondary Data
o Description: Data collected by someone else for a different purpose.
o Examples: Government reports, academic articles, company records.
o Advantages:
▪ Readily available and less costly.
▪ Can provide a broad perspective.
o Disadvantages:
▪ May not perfectly fit the researcher's needs.
▪ Quality and reliability may vary.
2
2. Observations
o Description: Gathering data by watching and recording behaviours and events.
o Examples: Ethnographic studies, field notes.
o Advantages:
▪ Provides real-time data.
▪ Useful for understanding context.
o Disadvantages:
▪ Time-consuming and labour-intensive.
▪ Observer bias may affect results.
3. Experiments
o Description: Conducting controlled tests to gather data.
o Examples: Laboratory experiments, clinical trials.
o Advantages:
▪ High level of control over variables.
▪ Can establish cause-and-effect relationships.
o Disadvantages:
▪ May not reflect real-world scenarios.
▪ Can be expensive and complex to set up.
4. Interviews
o Description: Collecting data through direct, face-to-face or virtual conversations.
o Examples: In-depth interviews, focus groups.
o Advantages:
▪ Provides detailed, qualitative insights.
▪ Allows for probing and follow-up questions.
o Disadvantages:
▪ Time-consuming and resource-intensive.
▪ Potential for interviewer bias.
5. Existing Data Sources
o Description: Using previously collected data for analysis.
o Examples: Databases, archives, online resources.
o Advantages:
▪ Saves time and resources.
▪ Provides a wealth of historical data.
o Disadvantages:
▪ May not be specific to current research needs.
▪ Quality and relevance can vary.
Conclusion
• Summary: Data is a vital asset in today’s world, providing the foundation for informed
decision-making, strategic planning, and performance measurement. Understanding the
types, sources, and methods of data collection, as well as the techniques for data analysis,
is essential for leveraging its full potential.
• Final Thought: By effectively managing and analyzing data, individuals and organizations
can gain valuable insights, drive innovation, and achieve their goals more efficiently.
3
References
• Books and Literature:
o Davenport, T. H., & Harris, J. G. (2007). Competing on Analytics: The New Science of
Winning. Harvard Business Review Press.
o Provost, F., & Fawcett, T. (2013). Data Science for Business: What You Need to Know
about Data Mining and Data-Analytic Thinking. O'Reilly Media.
o McKinney, W. (2017). Python for Data Analysis: Data Wrangling with Pandas, NumPy,
and IPython. O'Reilly Media.
• Academic Sources:
o Babbie, E. R. (2013). The Practice of Social Research. Cengage Learning.
o Creswell, J. W., & Creswell, J. D. (2017). Research Design: Qualitative, Quantitative,
and Mixed Methods Approaches. SAGE Publications.
o Silver, N. (2012). The Signal and the Noise: Why So Many Predictions Fail--but Some
Don't. Penguin Books.
4
o Public Health
▪ Description: Analyzing health data to identify public health trends and
threats.
▪ Examples: Disease outbreak tracking, health policy formulation.
▪ Benefits: Supports preventive measures and informed policy decisions.
o Medical Research
▪ Description: Utilizing data for clinical trials and medical research.
▪ Examples: Drug efficacy studies, genomics research.
▪ Benefits: Advances medical knowledge and innovation.
3. Education
o Student Performance
▪ Description: Tracking and analyzing student data to improve educational
outcomes.
▪ Examples: Test scores, attendance records.
▪ Benefits: Identifies learning gaps and tailors instructional strategies.
o Institutional Planning
▪ Description: Using data for strategic planning and resource allocation.
▪ Examples: Enrolment trends, faculty performance metrics.
▪ Benefits: Enhances operational efficiency and educational quality.
o Educational Research
▪ Description: Conducting research to improve teaching methods and
curriculum design.
▪ Examples: Pedagogical studies, educational technology assessments.
▪ Benefits: Informs evidence-based educational practices.
4. Government and Public Policy
o Policy Formulation
▪ Description: Using data to develop and evaluate public policies.
▪ Examples: Socioeconomic data analysis, environmental impact
assessments.
▪ Benefits: Supports data-driven decision-making and policy effectiveness.
o Public Administration
▪ Description: Managing public resources and services based on data insights.
▪ Examples: Budget allocation, public service optimization.
▪ Benefits: Enhances transparency and accountability.
o Civic Engagement
▪ Description: Engaging citizens through data-driven platforms and
initiatives.
▪ Examples: Open data portals, participatory budgeting.
▪ Benefits: Promotes informed citizen participation and trust.
5. Science and Technology
o Research and Development
▪ Description: Utilizing data for scientific research and technological
innovation.
▪ Examples: Experimental data analysis, computational simulations.
5
▪ Benefits: Drives scientific discoveries and technological advancements.
o Environmental Monitoring
▪ Description: Using data to monitor and protect the environment.
▪ Examples: Climate data analysis, biodiversity tracking.
▪ Benefits: Informs conservation efforts and environmental policies.
o Artificial Intelligence and Machine Learning
▪ Description: Leveraging data to train and develop AI and machine learning
models.
▪ Examples: Image recognition, natural language processing.
▪ Benefits: Enhances automation and intelligent decision-making.
6. Finance and Economics
o Financial Analysis
▪ Description: Analyzing financial data to make investment decisions and
manage risks.
▪ Examples: Stock market analysis, credit scoring.
▪ Benefits: Informs investment strategies and risk management.
o Economic Forecasting
▪ Description: Using data to predict economic trends and inform policy
decisions.
▪ Examples: GDP growth projections, inflation rate predictions.
▪ Benefits: Supports economic planning and stability.
o Fraud Detection
▪ Description: Identifying and preventing fraudulent activities through data
analysis.
▪ Examples: Transaction monitoring, anomaly detection.
▪ Benefits: Protects financial assets and maintains trust.
7. Sports and Entertainment
o Performance Analysis
▪ Description: Analyzing athlete data to improve performance and strategy.
▪ Examples: Player statistics, game footage analysis.
▪ Benefits: Enhances training and competitive edge.
o Fan Engagement
▪ Description: Using data to engage fans and enhance their experience.
▪ Examples: Social media analytics, personalized content.
▪ Benefits: Increases fan loyalty and revenue.
o Content Creation
▪ Description: Utilizing data to create and distribute engaging content.
▪ Examples: Viewer preferences, trend analysis.
▪ Benefits: Informs content strategies and maximizes audience reach.
Conclusion
• Data is a powerful resource used across various fields to drive decisions, improve
processes, and innovate. From business and healthcare to education and government, data
plays a crucial role in shaping the modern world.
• Final Thought: Harnessing the full potential of data requires a combination of advanced
techniques, tools, and a deep understanding of its applications. By leveraging data
6
effectively, individuals and organizations can achieve significant improvements and
breakthroughs.
References
• Books and Literature:
o Davenport, T. H., & Harris, J. G. (2007). Competing on Analytics: The New Science of
Winning. Harvard Business Review Press.
o Provost, F., & Fawcett, T. (2013). Data Science for Business: What You Need to Know
about Data Mining and Data-Analytic Thinking. O'Reilly Media.
o McKinney, W. (2017). Python for Data Analysis: Data Wrangling with Pandas, NumPy,
and IPython. O'Reilly Media.
• Academic Sources:
o Babbie, E. R. (2013). The Practice of Social Research. Cengage Learning.
o Creswell, J. W., & Creswell, J. D. (2017). Research Design: Qualitative, Quantitative,
and Mixed Methods Approaches. SAGE Publications.
o Silver, N. (2012). The Signal and the Noise: Why So Many Predictions Fail--but Some
Don't. Penguin Books.
7
o Customer satisfaction ratings (Very Unsatisfied, Unsatisfied, Neutral, Satisfied,
Very Satisfied)
3. Interval Scale
The interval scale not only categorizes and orders data but also specifies the precise differences
between the categories. However, it lacks a true zero point.
• Characteristics:
o Equal intervals between values.
o No true zero point (zero does not indicate the absence of the property).
o Addition and subtraction are meaningful, but not multiplication and division.
• Examples:
o Temperature in Celsius or Fahrenheit (e.g., 10°C, 20°C, 30°C)
o IQ scores
o Calendar years (e.g., 2000, 2020)
4. Ratio Scale
The ratio scale is the most informative scale. It includes all the properties of the interval scale, with
the addition of a true zero point, allowing for the full range of arithmetic operations.
• Characteristics:
o Equal intervals between values.
o True zero point (indicating the absence of the property).
o All arithmetic operations are meaningful (addition, subtraction, multiplication,
division).
• Examples:
o Height (e.g., 150 cm, 160 cm)
o Weight (e.g., 50 kg, 70 kg)
o Time (e.g., 10 seconds, 20 seconds)
o Income (e.g., $30,000, $50,000).
Conclusion
Understanding the type of measurement scale used is fundamental for selecting appropriate
statistical techniques and interpreting data correctly. Here is a brief comparison:
• Nominal: Categories without order (e.g., blood type).
• Ordinal: Categories with order but unequal intervals (e.g., education level).
• Interval: Ordered categories with equal intervals, no true zero (e.g., temperature).
• Ratio: Ordered categories with equal intervals and a true zero (e.g., weight).
By correctly identifying the measurement scale, researchers can ensure their analysis methods
align with the nature of the data, leading to more accurate and meaningful results.