4-Artigo Revista Chinesa Dataset

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Methodology to build labeled corpora and classification

models to assess technological readiness: a case study


with defense technologies described by texts in pt-br.

José Voltan1 , Romullo Girardi1 , Thassia Santos2 , Daniel Abreu1 , Júlio de Farias3 ,
Diogo Salazar1 , Vı́ctor Cruz4 , and Ronaldo Goldschmidt1

1
Military Institute of Engineering, Rio de Janeiro, Brazil
2
Mackenzie Presbyterian University, São Paulo, Brazil
3
Aeronautics Technological Institute, São José dos Campos, Brazil
4
University of Brası́lia, Brası́lia, Brazil

April 21, 2024

Abstract
Purpose: This work aimed to develop a methodology encompassing the collec-
tion and labeling of documents within the context of the Technology Readiness Lev-
els (TRL) scale. The proposed methodology was applied to a case study in order to
demonstrate its utility.
Design/methodology/approach: The proposed methodology encompasses from
the selection of labelers, through document collection, to labeling. It includes training
materials for the labelers, and labeling supported by a questionnaire. This proposal
was applied to a case study, in which 187 documents were collected, including scien-
tific journal articles, symposium papers, and news. This collection was based on three
sources: the Brazilian Army News, the Symposium on Operational Applications in Defense
Areas, and the Military Science and Technology Magazine.
The labeling team consisted of six engineers from various fields. Each document
was labeled by two specialists. Analysis were conducted on the constructed corpus,
and a proposal for automating document classification into three TRL ranges was eval-
uated. For this proposal, a combination of the sparse vector representation technique
TF-IDF was used in conjunction with six Machine Learning algorithms for classifica-
tion.
Findings: The proposed methodology allowed the creation of a labeled corpus ac-
cording to the TRL scale and conducting experiments aimed at building classification
models that achieved an accuracy of 72.6% (Random Forest), and an f1-score of 58.7
(KNN)%.
Research limitations: Manual labeling relies on the prior knowledge of the experts.
In this research, each document was labeled by two specialists, aiming to increase the
reliability of the results.
Practical implications: This study presents a proposed methodology for building
labeled corpora in the context of the TRL scale. By applying the proposed methodol-
ogy, it is possible to mitigate the practical problem of the lack of public corpora that
enable research in automating the identification of TRL levels of technologies and/or
products described through textual documents.
Originality/value: This study not only proposed a methodology but also applied
it to a case study and evaluated the performance of an automated approach for docu-
ment classification within the scope of the TRL scale.
Keywords: TRL; TRA; Corpus.
JEL Classification: L6; O3; C8.

1
1 Introduction
The development of products or critical technologies within the context of a complex
project typically involves the elaboration of different subsystems by different teams. In
this process, some issues may arise, such as communication difficulties among teams re-
garding the technological development status and reliability of the involved subsystems.
Additionally, another issue is the need for efficient resource allocation among the different
project components (J. C. Mankins, 2009) (Girardi, França, & Galdino, 2022).
In this sense, National Aeronautics and Space Administration (NASA) employee Stan
Sadin proposed the Technology Readiness Levels (TRL) scale in the 1970s. Their purpose
was to enable the assessment of the maturity and development of a technology or prod-
uct, facilitating communication and resource management involved in complex projects
(J. C. Mankins, 2009). Mankins has implemented several modifications, such as increasing
the number of levels and providing more precise definitions (J. Mankins, 1995). Currently,
the scale comprises nine levels, as illustrated in Figure 1. As the TRL level increases, the
maturity of the technology or product also increases. Therefore, lower levels indicate lower
maturity, while higher levels indicate higher maturity. (J. C. Mankins, 2009).

Figure 1: Technology Readiness Levels. Adapted from J. C. Mankins (2009) and Girardi et
al. (2022)

From the 2000s, the TRL scale began to be adopted by various organizations, such as
the United States Department of Defense (USDOD) and the European Space Agency (ESA)
(J. C. Mankins, 2009). In Brazil, the Ministry of Science, Technology, and Innovation (MCTI)
has implemented several initiatives for the utilization of the TRL scale, including the in-
corporation of its concepts into the analysis process of the ’Lei do Bem’ (Law of Good), an
important tool for stimulating investments in Research and Development (R&D).
The process of assessing the maturity of a technology is known as Technology Readi-
ness Assessment (TRA). The TRA can be conducted by a team of experts with extensive
knowledge of the TRL scale and the technology (Britt, Berry, Browne, Merrell, & Kolpack,
2008). However, the process with a team of experts is considered expensive and slow, as
well as associated with a certain degree of subjectivity (Lezama-Nicolás, Rodriguez, Rio-

2
Belver, & Bildosola, 2018).
As a way to mitigate these issues, some studies, e.g., Britt et al. (2008) and Hardiy-
ati et al. (2018), point to automated solutions, which employ, for example, text mining in
the analysis of textual documents containing descriptions of technology development. In
this scenario, the authors of Silalahi et al. (2018) investigated the classification of scientific
publications in Indonesia in the context of biomedicine. To achieve this, they employed a
maturity scale with four levels, derived from the TRL scale. However, the corpus used in
the study was not made available. As far as it was possible to observe, such work illus-
trates a recurring gap in other scientific reports on the same subject: the unavailability of
labeled corpora that allow the replication of experiments and comparison among different
classification algorithms. Similarly, the mentioned studies do not describe how to label the
documents.
Therefore, this paper aims to present a methodology for building corpora in the context
of the TRL scale. The methodology encompasses everything from document collection to
selection and labeling by domain experts. In order to illustrate its feasibility for use, results
of a case study comparing different classification algorithms applied to a corpus generated
and made available by this methodology are also presented.
The next sections of this paper were organized as follows: Section 2 presents the pro-
posed methodology for corpus construction. In Section 3, the proposed methodology is
applied to a case study. Finally, Section 4 presents the concluding remarks highlighting
the expected contributions and the direction of the ongoing work.

2 Methodology for Corpus Construction


This section aims to present the proposed methodology for constructing labeled corpora
in the context of TRA and considerations about document classification according to the
TRL scale. This is illustrated graphically in Figure 2, The application of the methodology
begins after defining the problem and analysis domain. The following are the steps of the
methodology explained.

Figure 2: Methodology for Corpus Construction

The first step of the proposed methodology involves selecting a team R ={r1 , ..., r|R| }1
of experts in the application domain who will support the document labeling process.
Ideally, this group should consist of professionals who not only understand the domain of
the documents to be labeled but also know TRA and TRL. However, this dual expertise is
challenging to find in a single individual, so selected domain experts who are not familiar
1
The notation |X| represents the cardinality, i.e., the number of elements, of an arbitrary set X.

3
with the process of technological readiness assessment should participate in the stage of
the methodology focused on training in TRA and the TRL scale.
This training stage comprises videos, meetings, and the provision of documents for
individual study. In this context, for this work, a series of 10 videos on the topic, totaling
71 minutes, was recorded. They can be watched by labelers unfamiliar with TRA/TRL.
These videos are available at https://bit.ly/videoTRL. In addition to explaining concepts
about the scale and TRA, the videos also introduce the standardization of terms and con-
cepts to be considered by labelers during the labeling process. Besides the videos, this
stage involves holding meetings for clarification of doubts, as well as providing articles
on TRL/TRA for study, such as: (J. C. Mankins, 2009), (Girardi et al., 2022) e (NBR ISO
16290:2015, 2015).
In parallel with the training stage, the proposed methodology includes a step focused
on identifying sources of relevant documents, i.e., databases where documents on the ap-
plication domain will be collected.
The main sources include databases containing scientific articles, technical reports, test
results, requirement specifications, news, patents and industrial properties. In order to
facilitate the identification of a set of relevant document sources F = {f1 , f2 , ..., f|F | } to be
considered, interviews with domain experts should be conducted, and the most prevalent
sources indicated by the experts should be prioritized.
From this set of sources F , the collection of documents related to the analyzed tech-
nology or product is carried out. To collect from public databases with large volumes of
documents, a scalable solution is the application of automated techniques, such as Web
Scraping, utilizing filters. This allows an initial identification of the documents. Subse-
quently, it is necessary to manually apply previously established inclusion or exclusion
criteria. At the end of this stage, we have a collection D = {d1 , ..., d|D| }, whose elements
need to be labeled by R.
In the labeling process, the documents are distributed among the labelers, as illustrated
in the Figure 3 . This process should take into consideration the affinities between labeler
and technology. During the selection process and meetings held in the previous stages,
the labelers must each present a set of words related to technologies and products with
which they are most familiar. This way, a specific dictionary is compiled for each labeler.
The distribution involves combining this dictionary with the keywords of the documents,
considering also a uniform distribution of documents among the labelers. Thus, for a
document di , a set of labelers Si is assigned such that Si ⊂ R. These individuals must
independently label the document di . To ensure more reliable labeling, it is advisable to
adopt |Si | ≥ 2. The proposed labeling process is supported by the use of a questionnaire to
guide and direct the identification of the most suitable TRL level for the document. Thus,
the labeling process is conducted through questions, mostly pre-established, to which the
labeler responds after reading the document. Based on the answer to a question, the la-
beler is directed to another question, in a structure similar to a decision tree. The last
question should have a confirmation bias, allowing the labeler to indicate another TRL

4
level, different from the one indicated by the questionnaire, if deemed appropriate.

Figure 3: Document labeling

Thus, let rk (di ) be the labeling assigned by labeler rk to document di . After the labeling
process, each document di will have a set of labels R(di ) = {rj (di )/rj ∈ Si }. If there is a
disagreement in labeling, i.e., there exist r′ (di ), r′′ (di ) ∈ R(di ) such that r′ (di ) ̸= r′′ (di ), it
should be evaluated whether it is possible for the labelers to reach a consensus on the TRL
level through meetings, or if di should be disregarded. An alternative to this procedure,
which can be slow and costly, is the use of Cohen’s Kappa (κ) Concordance Coefficient as
a filter for meetings. This coefficient is used to assess the level of agreement among eval-
uators/judges. Values greater than or equal to 0.61 indicate substantial agreement, and
greater than or equal to 0.81 almost perfect agreement (Landis & Koch, 1977). Thus, docu-
ments receiving different labels from labelers with κ ≥ 0.61 can be immediately discarded,
because their classification is ambiguous in the face of strong agreement among labelers.
After consolidating the labels, the documents should be stored in JSON (JavaScript
Object Notation) files, which offer the advantage of allowing key-value association. Thus,
each document should be recorded with the fields ID, title, author, URL (Uniform Resource
Locator), document type, label (assigned TRL), abstract, and text. Such organization facil-
itates the training and evaluation of Machine Learning (ML) models, as well as enabling
storage in document-oriented databases, such as MongoDB, for example.

3 Case Study
The proposed methodology was applied to a case study focused on the defense industry
domain, restricted to the areas of computer engineering, electronics, and telecommunica-
tions. The defense industry is an area known for producing cutting-edge technology at the
forefront of knowledge, which later spills over into other fields, including civilian applica-
tions (de Freitas Querino, 2022). There are numerous examples supporting this statement,
such as the internet, which originated from the military network known as the Advanced
Research Projects Agency Network (ARPANET), as well as microwave ovens, discovered
as a result of experiments during World War II involving radars (Bueno, 2022).

5
The case study was restricted to technologies related to the Brazilian Army, with docu-
ments originally written in the Portuguese language (PT-BR).
As presented in Figure 2, the first step is the selection of labelers. Thus, a team of spe-
cialists composed of three computer engineers, two electronic engineers, and one telecom-
munications engineer was selected. All of them were linked to the defense area and had 5
or more years of experience. Additionally, 83.33% of them were either currently enrolled
or had already completed graduate programs stricto sensu related to their fields of study.
With the aim of further enhancing the team’s understanding of TRL and TRA, as well
as standardizing some procedures during labeling, a training session was conducted using
the videos and articles mentioned in the previous section, over six meetings with the team.
For the identification of relevant public document sources, in addition to the team of la-
belers, four other domain experts were interviewed. In these interviews, the set of sources
F = {Brazilian Army News (Noticiário do Exército Brasileiro)2 , Symposium on Operational
Applications in Defense Areas (SIGE) (Simpósio de Aplicações Operacionais em Áreas de De-
fesa (SIGE))3 , Military Science and Technology Magazine (RMCT) Revista Militar de Ciência e
Tecnologia (RMCT)4 } was identified.
The Brazilian Army News is intended for the publication of news of interest to the Army,
such as military operations achievements, receipt of equipment, conducting tests and trials
on products, as well as military ceremonies and other events. The Symposium on Operational
Applications in Defense Areas is held annually by the Aeronautics Institute of Technology
(ITA), organized by students of the Postgraduate Program in Operational Applications
(PPGAO). The event has had 25 editions so far with a total audience exceeding 12,000 peo-
ple (SIGE, 2023). Finally, the Military Science and Technology Magazine (RMCT), currently
published quarterly, is produced by the Military Institute of Engineering (IME), being a
scientific journal with over 40 years of existence, focused on Science and Technology in
the field of Defense, covering areas such as Computer Science, Electrical Engineering, and
Defense Engineering, among others (EB Revistas, 2023). All of these sources are published
in the Portuguese language.
In this case study, a grouping of TRL was adopted. This simplification is aligned with
the approach of technological foresight, as in Lezama-Nicolás et al. (2018), Silalahi et al.
(2018) e Hardiyati et al. (2018). Just as used by Girardi et al. (2022), three ranges (classes)
were adopted: Range 1 (TRL 1 to 3), Range 2 (TRL 4 to 6), Range 3 (TRL 7 to 9).
Having made these considerations, after collecting the documents, they were distributed
among the labelers, taking into account the affinity between the technology addressed in
the documents and each labeler’s technical background. After distribution, each labeler
read and analyzed each assigned document.
To assist in the labeling task, the questionnaire available at https://bit.ly/TRASurvey
was used. It employed a structure similar to a decision tree, where the response to one
question directed to complementary questions, so that in the end, the corresponding TRL
2
https://www.eb.mil.br/web/noticias/noticiario-do-exercito
3
https://www.sige.ita.br/
4
https://www.ebrevistas.eb.mil.br/CT

6
range for the document was reached. At the end, the labeler had the opportunity to insert
comments about the document. Thus, after answering the questionnaire, the TRL range
assigned by the labeler for that document was indicated, or the suggestion for discarding.
Each document was analyzed by two labelers (i.e., |Si | = 2).
In cases where the two labelers initially did not converge to the same TRL range, meet-
ings were held. From these meetings, either a common range was reached or the decision
was made to discard the document.
Finally, the text was extracted from the documents and stored in JSON files using a
key-value structure with keys: “id”, “title”, “author”, “url”, “document type”, “label”,
“abstract”, and “text”. Figure 4 illustrates this storage, highlighting the document parts
and their corresponding key-value pairs in the JSON file.

Figure 4: Data extraction

3.1 Analysis of the corpus

Applying the proposed corpus construction methodology to the present case study, 187
documents have been collected. The Table 1 summarizes the distribution among the classes,
disregarding the eliminated documents. The constructed corpus is available at https://
bit.ly/datasetTRL.
Range 1 has the largest number of documents. One possible explanation for this is re-
lated to the innovation “funnel”. At the funnel’s” entrance, there are many TRL1 technolo-
gies, but few are the technologies that will actually reach TRL9 (Harmsen, 2014). Another
point worth mentioning refers to the eliminated documents. In some cases, it was not clear
which maturity range would be the best fit; e.g., in some cases, a product with laboratory

7
Table 1: Distribution of documents in the generated corpus, by TRL ranges

TRL Range Number of documents %


Range 1 (TRL 1, 2, 3) 94 55.95
Range 2 (TRL 4, 5, 6) 41 24.40
.
Range 3 (TRL 7, 8, 9) 33 19.64
Eliminated 19 -
Total without eliminated documents 168 100

characteristics was tested in an operational environment.


It is possible to perform some quantitative and qualitative analyses from the con-
structed corpus. Table 2 presents some metrics regarding the number of words in the “text”
key, grouping this information by document type. This allows some inferences about the
types of documents analyzed: News are shorter texts, symposium papers are intermediate,
and journal articles are longer.

Table 2: Number of words per document type (key “text”).

Type Average Median Minimum Maximum


Journal Article 3938 3976 1281 6778
Symposium Article 2560 2504 826 4946
News 298 252 171 723

The news published in the Brazilian Army Newsletter are concise and objective, hence
the lower word count. They fulfill their role of informing the general public; however, they
may not provide technical details about the activities involved. The articles (journal and
symposium papers) are more technical texts intended for a specialized audience. They typ-
ically have a very similar structure to an introduction, theoretical framework, experiments,
results, and conclusion.
Another interesting analysis is the construction of word clouds for each TRL range5 .
These word clouds can provide insights into whether there are words that are more com-
mon in one level than in another. For the construction of Figure 5, 6 and 7, a preprocessing
step was applied, which consisted of removing stopwords and non-alphabetic characters,
as well as standardizing to lowercase. After that, the frequency of words in the text set of
each TRL level was counted. The 100 most frequent words were considered. These words
were translated from Portuguese to English, resulting in the following word clouds.
From these word clouds, it is possible to verify that some words are common to all
TRL levels, such as “system”, “signal”, and “data”. Indeed, they are associated with the
context of the case study, i.e., computer engineering, electronics, and telecommunications.
It can be observed that TRL Range 1 and 2 share a higher number of words. In Range 3,
words such as “army” and “tests” can be observed. The appearance of the word “army”
may be associated with the innovation “funnel”. The documents in this range carry words
associated with their application and operational environment. A technology at lower TRL
5
The codes are available at: https://bit.ly/TRLclouds

8
Figure 5: Word cloud Range 1 (TRL1-3)

Figure 6: Word cloud Range 2 (TRL4-6)

levels presents itself more generically; as it becomes more mature, its use becomes more
specific. The word “army” alludes to this, to applications in the defense sector.
This generated corpus can be used for training machine learning models that combine
vector representation techniques with classification models, thus becoming an automated
process. To demonstrate this automation, we employed the generated corpus to train and
assess models. The case study employed the combination of the TF-IDF technique with
classical ML classification algorithms for document classification in the TRL ranges.
The codes used are available at https://bit.ly/TRLmodels. Figure 8 depicts the
steps for the proposed experiment. Each step is explained in detail. Initially, each docu-
ment undergoes preprocessing, as illustrated in Figure 9. The following main operations
were applied: removal of stopwords, i.e., removal of common words that typically do not
contribute to the meaning of the text; removal of non-alphabetic characters, conversion
of letters to lowercase, removal of accents from words, and stemming, i.e., reduction of
words to their root form, removing prefixes and suffixes. These procedures aim to reduce
the dimensionality of the text (Jurafsky & Martin, 2022).
For the vector representation of words, the Term Frequency-Inverse Document Fre-
quency (TF-IDF) technique was used. TF-IDF is a variation of Bag-of-Words (BoW) that
considers the product of two terms, tft,d and idft . The first refers to the frequency of a
word t in a document d, the latter aims to give greater importance to words that occur
in few documents, avoiding assigning importance to common words in the corpus that
provide little information (Jurafsky & Martin, 2022).

9
Figure 7: Word cloud Range 3 (TRL7-9)

Figure 8: Steps of the proposed experiment.

For classification, six traditional classification algorithms were used: Multinomial Naive
Bayes (MNB) (Jurafsky & Martin, 2022), Complement Naive Bayes Classifier (CNB) (1.9.
Naive Bayes, 2023), K-Nearest Neighbors (KNN) (Martins et al., 2020), Support Vector Ma-
chine (SVM) (Andrade, 2015), Random Forest (RF) (Andrade, 2015) and AdaBoost (Adb)
(Faceli, Lorena, Gama, Almeida, & Carvalho, 2021).
For training and evaluation of these six classification algorithms, the cross-validation
technique with k- folds was used, k=5. In this way, in each iteration (round), four folders
were used for training the models, and one folder was used for evaluation. This process
was repeated five times.
Table 3 presents the results of the cross-validation, displaying the average accuracy,
precision, recall and F1-score (macro). The mean µ presented in the table is followed by
the standard deviation σ , i.e., µ ± σ. In bold, the highest mean per performance metric is
highlighted, i.e., the algorithm that provided the classification model with the best perfor-
mance (highest value) for that metric. The last line (Random) considered the scenario where
the classifier labeled the majority class (i.e. range 1) to all instances in its predictions.
The results of the experiments showed that the Random Forest algorithm achieved the
highest accuracy, while the KNN performed better in all other metrics. We can still observe
that, except for the accuracy of AdaBoost, all algorithms had better performance than the
random one.
From the analysis of the experiments, it was possible to observe that the models per-
formed well in labeling Range 1, but poorly in Ranges 2 and 3. This may have occurred
due to the larger number of documents in this range (1).

10
Figure 9: Preprocess the Text.

Table 3: Results.

Algorithm Accuracy Precision Recall F-1 score


MNB 0,684 ± 0,029 0,502 ± 0,040 0,544 ± 0,046 0,502 ± 0,029
CNB 0,720 ± 0,031 0,509 ± 0,057 0,608 ± 0,052 0,549 ± 0,053
KNN 0,702 ± 0,059 0,664 ± 0,040 0,635 ± 0,034 0,587 ± 0,075
SVM 0,708 ± 0,038 0,602 ± 0,158 0,603 ± 0,065 0,489 ± 0,139
RF 0,726 ± 0,037 0,614 ± 0,172 0,626 ± 0,070 0,549 ± 0,051
Adb 0,522 ± 0,158 0,593 ± 0,089 0,474 ± 0,149 0,427 ± 0,097
Random 0,560 ± 0,011 0,187 ± 0,003 0,333 ± 0,000 0,240 ± 0,002

4 Final Considerations
The TRL scale has gained international prominence, both in the public and private sectors,
as a way to track the maturity of a technology development project or product, or even in
technological prospecting, monitoring the development of new technologies. In this sense,
several authors point to the need for automated solutions to make the evaluation process
scalable.
In this sense, an important gap is the lack of labeled corpora and a standardized method-
ology for such. This work presented contributions of a methodology for creating labeled
corpora according to the TRL scale. It includes the selection of labelers, identification of
relevant document sources, data collection, application of inclusion and/or exclusion cri-
teria, distribution of documents among labelers, and labeling itself. Additionally, within
this scope, educational material was also prepared in order to instruct the labelers about
TRL and TRA.
This methodology was applied in a specific domain, resulting in a corpus consisting of
168 documents in Portuguese, with different technologies and products within the scope
of defense, in the areas of computer engineering, electronics, and telecommunications. In
this case study, the nine TRL levels were divided into three ranges. It is expected that
the mentioned corpus will evolve to fill an important gap in the research area related to
automating classification on the TRL scale.
The constructed labeled corpus was further utilized for evaluating a combined ap-
proach of TF-IDF with classical ML algorithms, achieving a maximum accuracy of 72.6%
with Random Forest. The KNN algorithm also showed good performance when analyzing
the other metrics (precision, recall and f1-score), achieving the best performance in these
three metrics.
These promising results indicate that solutions for document classification on the TRL
scale may be feasible. Therefore, as future work initiatives, the utilization of other lan-
guage models for vector representation of documents (e.g., BERT and GPT) and other

11
classification algorithms (e.g., recurrent deep neural networks) are considered. Another
research opportunity would be the construction of a larger corpus. Additionally, to miti-
gate the class prevalence problem, LLM (Large Language Models) can be used to enrich the
produced corpus by generating new documents based on the existing ones.

References
1.9. naive bayes. (2023). Retrieved from https://scikit-learn.org/stable/
modules/naive bayes.html
ABNT. (2015). NBR ISO 16290:2015: Sistemas espaciais — definição dos nı́veis de ma-
turidade da tecnologia (TRL) e de seus critérios de avaliação (Vol. 2015) [Computer
software manual].
Andrade, P. H. M. A. d. (2015). Aplicação de técnicas de mineração de textos para classificação
de documentos: um estudo da automatização da triagem de denúncias na CGU (Mestrado
Profissional em Computação Aplicada). Universidade de Brası́lia, Brası́lia.
Britt, B. L., Berry, M. W., Browne, M., Merrell, M. A., & Kolpack, J. (2008). Document clas-
sification techniques for automated technology readiness level analysis. Journal of the
American Society for Information Science and Technology, 59(4), 675-680. Retrieved from
https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.20770 doi:
https://doi.org/10.1002/asi.20770
Bueno, C. (2022). Ciência para a guerra e para a paz: uso militar ajudou a ciência a avançar,
mas o papel da ciência na busca pela paz é fundamental. Ciência e Cultura, 74. doi:
10.5935/2317-6660.20220074
de Freitas Querino, L. (2022). Movimentos societários da indústria de defesa brasileira.
In C. E. F. Azevedo & C. E. D. F. Ramos (Eds.), Estudos de defesa: inovação, estratégia e
desenvolvimento industrial (p. 136-168). Editora FGV.
EB Revistas. (2023). Sobre a revista— revista militar de ciência e tecnologia. Retrieved from
http://www.ebrevistas.eb.mil.br/CT/about (12 jul. de 2023)
Faceli, K., Lorena, A., Gama, J., Almeida, T., & Carvalho, A. (2021). Inteligência artificial :
uma abordagem de aprendizado de máquina (2nd ed.). Rio de Janeiro: LTC.
Girardi, R., França, A., & Galdino, J. (2022, 09). A customização de processos de
avaliação de prontidão tecnológica baseados na escala TRL: desenvolvimento de
uma metodologia para o Exército Brasileiro. Coleção Meira Mattos, 16, 491-527. doi:
10.52781/cmm.a084
Hardiyati, R., Silalahi, M., Amelia, M., Nadhiroh, I. M., Rahmaida, R., & Handayani, T.
(2018, oct). A conceptual model for classification of biomedicine research. IOP
Conference Series: Earth and Environmental Science, 197(1), 012006. Retrieved from
https://dx.doi.org/10.1088/1755-1315/197/1/012006 doi: 10.1088/
1755-1315/197/1/012006
Harmsen, J. (2014). Novel sustainable industrial processes: from idea to commercial scale
implementation. Green Processing and Synthesis, 3(3), 189–193. Retrieved 2024-04-09,

12
from https://doi.org/10.1515/gps-2013-0102 doi: doi:10.1515/gps-2013
-0102
Jurafsky, D., & Martin, J. H. (2022). Speech and language processing: An introduction to natural
language processing, computational linguistics, and speech recognition (3st ed.). USA: no
prelo.
Landis, J. R., & Koch, G. G. (1977, mar). The measurement of observer agreement for cate-
gorical data. Biometrics, 33(1), 159. Retrieved from https://doi.org/10.2307/
2529310 doi: 10.2307/2529310
Lezama-Nicolás, R., Rodriguez, M., Rio-Belver, R., & Bildosola, I. (2018, 11). A bibliometric
method for assessing technological maturity: the case of additive manufacturing.
Scientometrics, 117. doi: 10.1007/s11192-018-2941-1
Mankins, J. (1995, 01). Technology readiness level – a white paper.
Mankins, J. C. (2009). Technology readiness assessments: A retrospective. Acta Astro-
nautica, 65(9), 1216-1223. Retrieved from https://www.sciencedirect.com/
science/article/pii/S0094576509002008 doi: https://doi.org/10.1016/
j.actaastro.2009.03.058
Martins, J. S., Lenz, M. L., da Silva, M. B. F., de Oliveira, R. A., Pichetti, R. F., Mariano,
D. C. B., . . . Bezerra, W. R. (2020). Processamentos de linguagem natural (1st ed.). Porto
Alegre: SAGAH.
SIGE. (2023). Sige — simpósio de aplicações operacionais em Áreas de defesa. Retrieved from
http://https://www.sige.ita.br/ (12 jul. de 2023)
Silalahi, M., Hardiyati, R., Nadhiroh, I. M., Handayani, T., Amelia, M., & Rahmaida, R.
(2018). A text classification on the downstreaming potential of biomedicine publica-
tions in indonesia. In 2018 icoiact (p. 515-519). doi: 10.1109/ICOIACT.2018.8350778

13

You might also like