0% found this document useful (0 votes)
78 views23 pages

Big Data

The document provides an overview of big data, including data acquisition, storage, and analysis. It describes collecting a dataset from Kaggle on loan defaults to analyze and predict loan outcomes. If storing the data in the cloud, it recommends assessing storage needs, selecting a cloud provider, transferring data securely, designing data models, implementing extract-transform-load processes, ensuring security and access controls, optimizing performance, and establishing backups. The document stresses both big data's opportunities and challenges regarding privacy, quality, integration, and scalability that must be addressed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views23 pages

Big Data

The document provides an overview of big data, including data acquisition, storage, and analysis. It describes collecting a dataset from Kaggle on loan defaults to analyze and predict loan outcomes. If storing the data in the cloud, it recommends assessing storage needs, selecting a cloud provider, transferring data securely, designing data models, implementing extract-transform-load processes, ensuring security and access controls, optimizing performance, and establishing backups. The document stresses both big data's opportunities and challenges regarding privacy, quality, integration, and scalability that must be addressed.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

1|Page

Big Data

A Data Collection, Structuring and Processing Project, With

Accompanying Analytical Report BY

STUDENT ID

STUDENT NAME:
2|Page

TABLE OF CONTENTS

ABSTRACT ………………………………………………………………………..3

1. INTRODUCTION ……………………………………………………………….4

2. DATA ACQUISITION ………………………………………………………….5

3. DATA STORAGE ……………………………………………………………….6

4. DATA ANALYSIS ……………………………………………………………...8

 Description of Dataset ……………………………………………………8

 Data Wrangling …………………………………………………………...8

 Descriptive Analysis ………………………………………………………9

 Predictive Analysis ……………………………………………………….16

 Diagnostic Analysis ………………………………………………………18

5. RECOMMENDATIONS …………………………………………………………19

6. CONCLUSION …………………………………………………………………...20

REFERENCES ……………………………………………………………………………21
3|Page

ABSTRACT

Data is important to many businesses, affecting many activities and procedures. The notion of

"Big Data" was created as a result of the exponential growth in data volume over time. Big Data

comprises data frequently created in real-time and a bigger volume of data than typical datasets.

As traditional techniques of data analysis and management fail to keep up with the sheer quantity

and velocity of Big Data, the collecting and storage of such big and complex data have grown to

be major challenges in recent years. Data wrangling, which entails addressing issues with data

quality, completeness, and compatibility, is one area of emphasis within Big Data analysis. In

this research, we examine YouTube video analysis using Big Data to gather insightful

knowledge and identify significant trends. Many operations, like the analysis of sizable

transactional databases and the discovery of sizable YouTube video collections, have been

transformed by the arrival of big data. As a result, the use of Big Data technologies and

approaches within enterprises has significantly increased. Organizations may discover subtle

patterns and get priceless insights that were previously unavailable by utilizing the power of big

data. But it's important to understand that Big Data also presents enormous computing obstacles

that must be overcome in addition to its benefits.


4|Page

1. INTRODUCTION

Big Data has changed the way corporations handle and analyze massive quantities of data,

transforming a number of sectors. Big Data has created new opportunities for obtaining

previously unattainable insights, allowing educated decision-making, and fostering innovation

thanks to its capacity to acquire, store, and analyze enormous information. This article examines

how Big Data is affecting many businesses while stressing both its transformational potential and

the difficulties it poses (Samaranayake, 2018). Modern tools and procedures for data

management and analysis have been developed as a result of the exponential rise of data in

recent years. The sheer amount, velocity, and diversity of data created from many sources,

including social media, sensors, and online transactions, are too much for traditional ways to

handle. On the other side, big data solutions offer scalable infrastructure, distributed computing

frameworks, and advanced algorithms to maximize the value of data ( Gray, 2018). While Big

Data presents immense opportunities, it also poses challenges that need to be addressed:

Data privacy and security: Handling vast amounts of sensitive data requires robust security

measures to protect against breaches and unauthorized access.

Data quality and integration: Ensuring data accuracy, completeness, and compatibility across

diverse sources remains a critical challenge in Big Data analytics.

Scalability and infrastructure: Managing the infrastructure to store, process, and analyze massive

datasets requires substantial computational resources and scalable architectures.

Finally, big data has emerged as a transformative force across industries, empowering

organizations to extract valuable insights, optimize processes, and make data-driven decisions.

By leveraging advanced technologies, such as distributed computing, machine learning, and data
5|Page

visualization, organizations can unlock the full potential of Big Data. However, addressing

challenges related to data privacy, quality, and infrastructure remains crucial for successful

implementation. As technology continues to evolve, Big Data will continue to reshape industries,

leading to improved efficiencies, enhanced customer experiences, and innovative solutions.

2. DATA ACQUISITION

Data acquisition refers to the process of collecting data from various sources to prepare a big

data dataset. In its raw form, data from the real world cannot be easily understood by computer

systems. Therefore, data acquisition involves converting the physical parametric data into a

digital format that computers can comprehend (Ahlburg, Arfaoui, Arling, Augustine, Barney,

Benoit & Wieduwilt, 2020). This digital format typically utilizes integers to represent the data.

Traditionally, organizations primarily focused on internal data sources for information. However,

with the advent of big data analytics and predictive analysis, organizations have realized the

value of incorporating external data to facilitate digital transformation. This necessitates the

development of processes for identifying, sourcing, understanding, evaluating, and ingesting

such external data. It is important to note that "data acquisition" is sometimes mistakenly used to

refer only to the data generated within the organization, which is a misconception since internal

data is already acquired.

The dataset used for Loan Default Analysis and Prediction was obtained from Kaggle's Lending

Club Python dataset repository. The data source provides comprehensive information necessary

for analyzing and predicting loan defaults. More details about the dataset can be accessed

through the following link: https://www.kaggle.com/code/jkashish18/lending-club-python/data.


6|Page

3. DATA STORAGE

Currently, the dataset is stored in a local resource, which may have limitations in terms of

scalability, accessibility, and data security. However, considering a "What-if?" scenario, where

there is a need to store the data in a cloud-based system or data warehouse, several steps and

considerations would come into play.

 Data Assessment: Evaluate the size and structure of the dataset to estimate the storage

requirements and ensure compatibility with cloud-based systems or data warehouses.

Consider factors such as data volume, frequency of updates, and data retention policies.

 Cloud Provider Selection: Choose a reliable cloud service provider that offers scalable

storage solutions, robust data management capabilities, and appropriate security

measures. Consider providers such as Amazon Web Services (AWS), Microsoft Azure,

or Google Cloud Platform.

 Data Transfer: Transfer the dataset from the local resource to the cloud-based system or

data warehouse. This may involve uploading the dataset to the cloud storage solution,

ensuring secure data transfer protocols, and optimizing the transfer process to minimize

downtime.
7|Page

 Data Modeling and Schema Design: Design a suitable data model and schema for the

cloud-based system or data warehouse. This may involve defining tables, relationships,

and data organization structures based on the specific requirements of the analysis or

reporting tasks.

 Data Integration and ETL (Extract, Transform, Load): Implement an Extract, Transform,

Load (ETL) process to migrate and integrate the data into the cloud-based system or data

warehouse. This involves data cleansing, transformation, and loading procedures to

ensure data consistency and quality.

 Security and Access Controls: Establish appropriate security measures to protect the data

stored in the cloud-based system or data warehouse. Implement encryption, access

controls, and authentication mechanisms to safeguard the data against unauthorized

access or breaches.

 Scalability and Performance Optimization: Leverage the scalability features of the cloud-

based system or data warehouse to accommodate future growth in data volume and user

demands. Optimize the data storage and retrieval processes to ensure efficient

performance, considering factors such as indexing, partitioning, and data caching.

 Backup and Disaster Recovery: Implement backup and disaster recovery mechanisms to

ensure data resilience and business continuity. Regularly back up the data stored in the
8|Page

cloud-based system or data warehouse and establish procedures for restoring data in case

of any unforeseen events or data loss incidents.

In summary, transitioning from local storage to a cloud-based system or data warehouse involves

assessing the dataset, selecting a suitable cloud provider, transferring the data, designing

appropriate data models and schemas, implementing data integration processes, ensuring security

measures, optimizing performance, and establishing backup and disaster recovery mechanisms.

These steps enable organizations to leverage the scalability, accessibility, and security benefits

offered by cloud-based storage solutions for their data analysis and reporting needs.

4. DATA ANALYSIS

4.1 Data Description

The dataset contains 202 entries and consists of 142 columns. The columns represent various

attributes related to loan default analysis and prediction. Some of the columns have missing

values. The dataset includes information such as loan amount, interest rate, employment details,

home ownership, annual income, credit history, payment details, and loan status. It also provides

data on borrower demographics, credit scores, and financial indicators. The dataset includes a

mix of numerical and categorical data types. Further analysis and processing can be performed to

gain insights and develop models for loan default prediction based on this dataset.

4.2 Data Wrangling and Cleaning

Data wrangling and cleaning are crucial steps in the data preparation process. Wrangling

involves transforming and reshaping the data to make it suitable for analysis. Cleaning refers to

identifying and dealing with missing, erroneous, or inconsistent data. In the given code snippet,

df_filtered is created by dropping columns that have all missing values using dropna(axis=1,
9|Page

how='all'). Then, two specific columns, 'mths_since_last_delinq' and 'mths_since_last_record',

are dropped using drop(). Finally, dropna() with inplace=True is used to remove any remaining

rows with missing values. These steps ensure that the resulting DataFrame, df_filtered, is cleaned

and ready for further analysis.

4.3 Descriptive Analysis

Descriptive statistics provide a statistical summary of the data, offering an overview of the

dataset. It includes measures such as the mean, median, and mode, which collectively represent

the central tendencies of the data. While descriptive statistics provide a broad overview of the

data, they do not uncover deeper insights. There are different types of parameters within

descriptive statistics. Variance encompasses characteristics like quartiles and deviations, which

indicate the spread of the data. This helps understand how the data is distributed. Central

tendencies focus on the centrality of the data distribution and include measures like the mode,

mean, and median.

Numerical Statistical Descriptive Analysis

  count mean std min 25% 50% 75% max


106498 18424. 8224 10670 106810 106907 107750
id 202 2 02 64 27 9 1 1
11546. 6635.5
loan_amnt 202 66 19 1000 7000 10000 15000 35000
11269. 6288.1
funded_amnt 202 93 68 1000 7000 10000 14000 35000
11161. 6224.6 13993.
funded_amnt_inv 202 21 16 1000 7000 10000 75 35000
344.19 190.44 431.16 1140.0
installment 202 73 65 35.31 208.04 321.52 5 7
59000. 30785. 1200
annual_inc 202 83 3 0 39340 50002 74750 225000
14.711 6.3068
dti 201 04 08 1 10.04 15 20 29.85
0.0845 0.3575
delinq_2yrs 201 77 07 0 0 0 0 3
fico_range_low 201 705.22 28.676 660 685 700 725 790
10 | P a g e

39 2
709.22 28.676
fico_range_high 201 39 2 664 689 704 729 794
0.8407 0.9922
inq_last_6mths 201 96 34 0 0 1 1 5
8.8159 3.3077
open_acc 201 2 1 2 7 8 11 20
0.0248 0.1561
pub_rec 201 76 35 0 0 0 0 1
13217. 10289.
revol_bal 201 32 3 0 6842 11095 16576 74351
19.477 9.1684
total_acc 201 61 65 3 12 18 26 51
out_prncp 201 0 0 0 0 0 0 0
out_prncp_inv 201 0 0 0 0 0 0 0
12198. 7734.6 10904. 15451. 40009.
total_pymnt 201 01 43 0 6858.7 84 16 01
12067. 10904. 15352. 40009.
total_pymnt_inv 201 64 7581.4 0 6858.7 84 48 01
9856.5 6456.9 5495.3
total_rec_prncp 201 58 34 0 8 9000 12800 35000
2227.4 1976.0 1553.7 10085.
total_rec_int 201 33 93 0 858.7 4 2949.9 08
1.1581 5.2944
total_rec_late_fee 201 22 06 0 0 0 0 36.247
112.86 472.99 3874.7
recoveries 201 29 66 0 0 0 0 9
12.885 80.213 670.81
collection_recovery_fee 201 02 4 0 0 0 0 93
2795.8 4393.4 3946.2 28412.
last_pymnt_amnt 201 33 2 0 240.64 536.81 4 43
668.42 76.692
last_fico_range_high 201 79 87 499 614 679 719 839
649.65 134.33
last_fico_range_low 201 17 22 0 610 675 715 835
collections_12_mths_ex
_med 201 0 0 0 0 0 0 0
policy_code 201 1 0 1 1 1 1 1
acc_now_delinq 201 0 0 0 0 0 0 0
chargeoff_within_12_m
ths 201 0 0 0 0 0 0 0
delinq_amnt 201 0 0 0 0 0 0 0
0.1400
pub_rec_bankruptcies 201 0.0199 07 0 0 0 0 1
tax_liens 201 0 0 0 0 0 0 0
11 | P a g e

The provided data consists of a summary statistics table for various attributes related to loans.

Here is a brief report on the statistics:

 Count: The count column represents the number of observations available for each

attribute. It appears that there are 202 observations for most attributes, except for "dti"

which has 201 observations.

 Mean: The mean column represents the average value for each attribute across the

available observations. For example, the average loan amount is approximately

$11,546.66, the average annual income is around $59,000.83, and the average installment

amount is approximately $344.20.

 Standard Deviation (Std): The standard deviation column measures the dispersion or

variability of the data points around the mean. It provides information about the spread of

the data. Higher standard deviations indicate greater variability. For instance, the standard

deviation for the loan amount is approximately $6,635.52, indicating a significant

variation in loan amounts.

 Minimum (Min): The minimum column represents the smallest value observed for each

attribute. It gives an idea about the lower boundary of the data. For example, the

minimum loan amount is $1,000, and the minimum FICO score range is 660.

 25th Percentile (25%): The 25th percentile column represents the value below which 25%

of the data falls. It provides information about the distribution of the data and is also
12 | P a g e

known as the first quartile. For instance, 25% of the loan amounts are below $7,000, and

25% of the FICO scores are below 685.

 50th Percentile (50%): The 50th percentile column represents the median value, which is

the middle value of the data. It indicates the point below which 50% of the data falls. For

example, the median loan amount is $10,000, and the median FICO score is 700.

 75th Percentile (75%): The 75th percentile column represents the value below which 75%

of the data falls. It provides information about the distribution of the data and is also

known as the third quartile. For instance, 75% of the loan amounts are below $15,000,

and 75% of the FICO scores are below 725.

 Maximum (Max): The maximum column represents the largest value observed for each

attribute. It gives an idea about the upper boundary of the data. For example, the

maximum loan amount is $35,000, and the maximum FICO score is 790.

This summary statistics table provides a quick overview of the distribution and variation of the

loan numerical attributes. It can be helpful in understanding the range of values and identifying

potential patterns or outliers in the data.

Categorical Summary Analysis

coun uniqu
  t e top freq
term 190 2 36 months 143
int_rate 190 28 9.91% 18
13 | P a g e

grade 190 6 B 79
sub_grade 190 28 B1 18
emp_title 190 188 American Airlines 2
emp_length 190 11 10+ years 41
home_ownership 190 3 RENT 129
verification_statu
s 190 3 Not Verified 75
issue_d 190 1 Dec-11 190
loan_status 190 2 Fully Paid 154
pymnt_plan 190 1 n 190
https://lendingclub.com/browse/loanDetail.action?
url 190 190 loan_id=1077430 1
purpose 190 12 debt_consolidation 105
title 190 129 Debt Consolidation Loan 19
zip_code 190 139 921xx 4
addr_state 190 35 CA 46
earliest_cr_line 190 132 Sep-98 4
revol_util 190 164 29.30% 3
initial_list_status 190 1 f 190
last_pymnt_d 190 46 Jan-15 54
last_credit_pull_d 190 71 May-20 25
application_type 190 1 Individual 190
hardship_flag 190 1 N 190
debt_settlement_
flag 190 2 N 188

This summary statistics table provides a quick overview of the distribution and variation of the

loan categorical attributes. It can be helpful in understanding the range of values and identifying

potential patterns or outliers in the data.


14 | P a g e
15 | P a g e
16 | P a g e

4.4 Predictive Analysis


17 | P a g e

Predictive analysis, also known as predictive modeling, is a branch of data analytics that aims to

forecast or predict future outcomes based on historical data. It involves using statistical or

machine learning models to make predictions or classifications about unknown or future events.

In the context of the provided information, a predictive analysis was conducted using a decision

tree classifier model. Decision trees are a popular machine learning algorithm that uses a tree-

like structure to make decisions based on features or attributes of the data. The reported accuracy

of 100% indicates that the model predicted the outcomes perfectly for the given dataset.

Accuracy is a metric that measures the overall correctness of the model's predictions compared to

the actual outcomes. An accuracy of 100% suggests that the model classified all instances

correctly.

Precision and recall are performance metrics used for binary classification problems. Precision

measures the proportion of correctly predicted positive instances out of all instances predicted as

positive. Recall, also known as sensitivity, measures the proportion of correctly predicted

positive instances out of all actual positive instances. The reported precision of 100% suggests

that all instances predicted as positive were indeed positive. Similarly, the reported recall of

100% indicates that the model correctly identified all positive instances in the dataset. Attaining

100% accuracy, precision, and recall with a decision tree classifier is quite rare and may indicate

either an overfitting issue or potential data quality or sampling bias. It's important to carefully

evaluate the data, model, and evaluation process to ensure that the results are reliable and

generalizable to new or unseen data. Additionally, it is recommended to validate the model's


18 | P a g e

performance on a separate test dataset or using cross-validation techniques to obtain a more

robust assessment of its predictive capabilities.

4.5 Diagnostic analysis

Diagnostic analysis is the examination and evaluation of data and results to identify patterns,

relationships, and potential issues, aiming to gain insights and make informed decisions based on

the findings.

Based on the given diagnostic analysis results, here is a summary of the findings:

Dataset Summary:

The dataset contains information related to loan applications, including various categorical and

numerical variables.

The analysis is based on 190 instances.


19 | P a g e

Categorical Variables:

There are several categorical variables in the dataset, including term, grade, sub_grade,

emp_title, emp_length, home_ownership, verification_status, issue_d, loan_status, pymnt_plan,

url, purpose, title, zip_code, addr_state, earliest_cr_line, revol_util, initial_list_status,

last_pymnt_d, last_credit_pull_d, application_type, hardship_flag, and debt_settlement_flag.

Each categorical variable has different levels of uniqueness, with varying top values and

frequencies.

Loan Status:

The variable loan_status indicates the status of the loan.

The analysis shows that out of the 190 instances, 154 loans are labeled as "Fully Paid" and the

remaining loans have a different status.

Model Performance:

The predictive model, trained using a decision tree classifier, achieved perfect performance on

the given dataset.

The accuracy, precision, and recall metrics all have a value of 100, indicating that the model

accurately predicted the loan status for all instances in the dataset.

5 Recommendations

 Validate the Model: Although the model has shown excellent performance on the current

dataset, it is important to validate its performance on new, unseen data. This can be done

by splitting the dataset into training and testing sets or using cross-validation techniques

to evaluate its generalizability.


20 | P a g e

 Feature Importance: Determine the most important features that contribute to the accurate

prediction of loan status. This can help in understanding the factors that significantly

influence loan outcomes and provide insights for decision-making.

 Model Interpretability: Decision tree classifiers are inherently interpretable models,

which means it is possible to understand the decision-making process of the model.

Explore the decision tree structure to gain insights into the criteria used by the model to

classify loans. This can help in explaining the factors that contribute to loan approval or

rejection.

 Data Quality and Reliability: Ensure the quality and reliability of the data used for

training the model. Clean and preprocess the data, handle missing values, outliers, and

ensure that the dataset is representative of the real-world scenario. High-quality and

reliable data are crucial for accurate predictions.

 Continuous Model Monitoring: As loan data and trends change over time, it is important

to continuously monitor the model's performance. Regularly update the model with new

data and evaluate its performance metrics. If the accuracy, precision, or recall starts to

decline, re-evaluate and update the model accordingly.

6 Conclusions
21 | P a g e

Based on the current analysis, the decision tree classifier has shown exceptional performance in

predicting loan status. However, it is important to note that the evaluation metrics alone

(accuracy, precision, recall) may not provide a complete picture of the model's performance. It is

necessary to consider other factors such as the dataset's representativeness, potential bias, and the

business context to make informed decisions. Further analysis, model validation, and ongoing

monitoring are recommended to ensure the model's reliability and effectiveness in real-world

scenarios. Additionally, domain expertise and collaboration with experts in the lending industry

can provide valuable insights and enhance the accuracy and interpretability of the model.

References

[1] Ahlburg, P., Arfaoui, S., Arling, J.H., Augustin, H., Barney, D., Benoit, M., Bisanz, T.,

Corrin, E., Cussans, D., Dannheim, D. and Dreyling-Eschweiler, J., 2020. EUDAQ—A data

acquisition software framework for common beam telescopes. Journal of Instrumentation,

15(01), p.P01038.

[2] Azeroual, O., 2020. Data Wrangling in Database Systems: Purging of Dirty Data. Data, 5(2),

p.50.

[3] Bathla, G., Rani, R. and Aggarwal, H., 2018. Comparative study of NoSQL databases for big

data storage. International Journal of Engineering & Technology, 7(2.6), pp.83-87.


22 | P a g e

[4] Cartledge, C., 2018. ODU Big Data, Data Wrangling Boot Camp Software Overview, and

Design.

[5] Clark, E.L., Resasco, J., Landers, A., Lin, J., Chung, L.T., Walton, A., Hahn, C., Jaramillo,

T.F. and Bell, A.T., 2018. Standards and protocols for data acquisition and reporting for studies

of the electrochemical reduction of carbon dioxide. Acs Catalysis, 8(7), pp.6560-6570.

[6] Contreras-Ochando, L., Ferri, C., Hernández-Orallo, J., Martinez-Plumed, F., Ramírez-

Quintana, M.J. and Katayama, S., 2017. Domain specific induction for data wrangling

automation. AutoML@ ICML, Sydney, Australia, August, 10.

[7] de Jesús Ramírez-Rivera, E., Díaz-Rivera, P., Ramón-Canul, L.G., Juárez-Barrientos, J.M.,

RodríguezMiranda, J., Herman-Lara, E., Prinyawiwatkul, W. and Herrera-Corredor, J.A., 2018.

Comparison of performance

and quantitative descriptive analysis sensory profiling and its relationship to consumer liking

between the artisanal cheese producers panel and the descriptive trained panel. Journal of dairy

science, 101(7), pp.5851-5864.

[8] Fleuren, L.M., Klausch, T.L., Zwager, C.L., Schoonmade, L.J., Guo, T., Roggeveen, L.F.,

Swart, E.L., Girbes, A.R., Thoral, P., Ercole, A. and Hoogendoorn, M., 2020. Machine learning

for the prediction of sepsis: a systematic

review and meta-analysis of diagnostic test accuracy. Intensive care medicine, 46(3), pp.383-

400.

[9] Alma Digit, S.R.L., A Cloud-Based System for Improving Retention Marketing Loyalty

Programs in Industry 4.0: a Study on Big Data Storage Implications.


23 | P a g e

[10] Kim, D.W., Jang, H.Y., Kim, K.W., Shin, Y. and Park, S.H., 2019. Design characteristics of

studies reporting the performance of artificial intelligence algorithms for diagnostic analysis of

medical images: results from recently

published papers. Korean journal of radiology, 20(3), p.405.

[11] Loeb, S., Dynarski, S., McFarland, D., Morris, P., Reardon, S. and Reber, S., 2017.

Descriptive Analysis in Education: A Guide for Researchers. NCEE 2017-4023. National Center

for Education Evaluation and Regional Assistance.

[12] MacAvaney, S., Yates, A., Feldman, S., Downey, D., Cohan, A., and Goharian, N., 2021.

Simplified Data Wrangling with ir_datasets. arXiv preprint arXiv:2103.02280.

[13] Mazumdar, S., Seybold, D., Kritikos, K. and Verginadis, Y., A survey on data storage and

placement methodologies for Cloud-Big Data ecosystem. J Big Data. 2019; 6 (1): 15.

[14] Mazumdar, S., Seybold, D., Kritikos, K. and Verginadis, Y., 2019. A survey on data storage

and placement methodologies for cloud-big data ecosystem. Journal of Big Data, 6(1), pp.1-37.

[15] McInnes, M.D., Moher, D., Thombs, B.D., McGrath, T.A., Bossuyt, P.M., Clifford, T.,

Cohen, J.F., Deeks, J.J.,

[16] Gatsonis, C., Hooft, L. and Hunt, H.A., 2018. Preferred reporting items for a systematic

review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement.

Jama, 319(4), pp.388-396.

[17] Samaranayake, L. Big data is or big data are. Br Dent J 224, 916 (2018).

https://doi.org/10.1038/sj.bdj.2018.486

[18] Gray, Mikel. Context for Practice: The Power of “Big Data”. Journal of Wound, Ostomy and

Continence Nursing 49(1):p 11, January/February 2022. | DOI: 10.1097/WON.0000000000000845

You might also like