0% found this document useful (0 votes)

28 views

Approach To Textual Data Analysis

In this manuscript, approaches to processing textual data, based on which models and algorithms for classification and analysis of textual data are proposed. Developed algorithms serve to improve the efficiency of classification and analysis of textual data. https://cajotas.centralasianstudies.org/index.php/CAJOTAS/article/view/1314/1369 https://cajotas.centralasianstudies.org/index.php/CAJOTAS/article/view/1314

Uploaded by

Central Asian Studies

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

Approach To Textual Data Analysis

Uploaded by

Central Asian Studies

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

CENTRAL ASIAN JOURNAL OF THEORETICAL

AND APPLIED SCIENCES

Volume: 04 Issue: 10 | Oct 2023 ISSN: 2660-5317
https://cajotas.centralasianstudies.org

Approach to Textual Data Analysis

O. Babomuradov
Executive director of the Kazan Federal University branch in Jizzakh, Jizzakh, Uzbekistan
O. Turakulov, Sh. Karaxanova
Tashkent University of Information Technologies named after Muhammad al-Khwarizmi, Tashkent,
Uzbekistan

Received 25th Aug 2023, Accepted 26th Sep 2023, Online 27th Oct 2023

Abstract: In this manuscript, approaches to processing textual data, based on which models and
algorithms for classification and analysis of textual data are proposed. Developed algorithms serve to
improve the efficiency of classification and analysis of textual data. A core algorithm for analyzing
textual documents, a modification of a dictionary search algorithm, and algorithms A1, A2, and A3 for
classification and analysis have been developed. The software developed on the basis of these algorithms
is based on experimental research. A study sample of 2000 words was used in the experimental
researches. The knowledge base is dynamic and expands during the training process.
_____________________________________________________________________________________________________

Introduction.
Currently, the amount of information belonging to different categories and types is increasing rapidly in
the data ocean. Since the amount of data is very large, it becomes more difficult for the user to extract the
information he/she needs from it. In order to search for the necessary information and extract it, humanity
has to process the data, analyze it, or more precisely, extract the necessary pieces from the data. This
statement of the problem shows that it is more appropriate to use intellectual analysis to identify the data
structure, previously unknown relationships, and regularities between the data, rather than the traditional
methods of data analysis, which are mainly focused on testing pre-existing hypotheses about the data [1-
4].
Data collection varies according to the purpose of its use and the type of storage. Different types of data
require different approaches. A homogeneous approach may produce different processing results in one
species and another in another category. Especially nowadays, having very large volumes of data causes
difficulties in their processing [5-7].
The Big Data phenomenon has a significant impact on data processing technology [8, 9]. The results of
the research of the leading research institutions stated that in 2020, the world’s data demand will exceed
40 Zettabytes (40 trln. GB) [10, 11].
Before the information is consumed in the form of knowledge (metadata), it is intended to be processed in
the form of simple information, but the increase in the information flow requires the improvement of

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 170

Copyright (c) 2023 Author (s). This is an open-access article distributed under the terms of Creative Commons
Attribution License (CC BY).To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/
CENTRAL ASIAN JOURNAL OF THEORETICAL AND APPLIED SCIENCES
Volume: 04 Issue: 10 | Oct 2023, ISSN: 2660-5317

recording and storage technologies [12]. Initial data are not always complete or given with errors that
satisfy the mathematical apparatus, in such cases, the accuracy of the solution of the problem of the
traditional mathematical apparatus, in particular, the methods of mathematical statistics, is observed to
decrease significantly. Human consumption of processed data has led to a rapid increase in the volume
and flow of information. A voluntary organization (commercial, manufacturing, medical, scientific, etc.)
depends on the correct organization of calculations and records, covering the entire process of its
activities. The question arises as to what to do with the created information array. Proper processing of
data and information array simplifies its appearance and structure, making it easier to use. Mathematical
statistics, which was considered the main tool of data analysis many times, lost its leadership role due to
the complexity of the data structure. The main reason is the concept of approximation by choice, which
leads to operations on spurious quantities (operations such as the average temperature of a patient in a
hospital, the average height of a building). In such cases, it can be seen that the methods of mathematical
statistics are useful for testing pre-specified hypotheses and for rough intelligence analysis based on rapid
data analysis [13-15].
Intelligent Analysis of Data (IAD) technology combines rigorously formalized methods and unformalized
analysis of methods. IAD’s methods and algorithms are related to follovings: artificial neural networks,
decision tree, symbolic rules, basis vector method, Bayes networks, linear regression, correlation-
regression analysis; hierarchical methods in cluster analysis, non-hierarchical methods in cluster analysis;
The Aprior algorithm; bounded search method, evolutionary programming and genetic algorithms,
various types of data visualization methods and other sets of methods [16-18].
The continuous growth of the data volume and information, limitation of primary data, complexity of
their structure, uncertainty, and non-stationarity of parameters require the development of data analysis
methods and algorithms. IAD develops technological indicators and complexes based on the conceptual
principles of using new and previously unknown knowledge, their hidden properties and laws, their
interdependence, and features of random and temporary processes that represent non-stationary objects in
technical, economic, social, and monitoring systems [19, 20].
Data preprocessing serves to efficiently implement the problem of classification, which is one of the main
problems of IAD. This problem requires determining whether the incoming object belongs to one of the
specified classes (Si) on the basis of the objects of the educational sample given by Xi symbols. Various
models have been proposed for data classification. A decision tree forms a hierarchical model of training
data. An effective path in the tree is used to classify each incoming object. It is reasonable to think of each
path in the tree as a rule used to classify an incoming entity. Rule-based classifiers can be thought of as
generalized decision trees where data need not be represented hierarchically. Therefore, multiple
conflicting rules can be used to cover the same training or test case. Probabilistic classifiers assign
probabilistic values to the features of the training sample. A simple Bayes rule or Boolean function is
used to efficiently estimate probabilities. When using Support Vector Machine (SVM) and neural
networks, the effectiveness of the objective functions is increased in different ways. In SVM, the
maximum threshold principle is used, and for neural networks, the efficiency is increased by the least
square error of the probability. A classifier based on a learning sample is a classifier that depends on the
time of learning. A simple, uncomplicated form of sample-based learning is the nearest-neighbor
classification algorithm. Many complex transformations can be made by applying different distance
functions and center point-based models [21-27].
Theoretical structure of textual data processing. In general, it is appropriate to consider the issue of
automatic or learning classification of natural language texts based on classification symbols through
direct preliminary processing and classification of textual data processing. The problem of classification
implies the creation of some form of meta-data, the emergence of knowledge by revealing the hidden laws

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 171

of the data. Text analysis is basically an intellectual analysis of data by extracting useful concepts from
the text or determining whether they belong to a class based on various algorithms. The expansion of the
text data segment can be seen in the rapid development of various relatively new areas, including text data
on the web, social networks, e-mail, digital libraries, and communication sites. In these areas, the issues of
generating metadata are effectively solved by means of intelligent data analysis. Over the time, various
methods of data processing are being developed. The main reason for this can be seen in the fact that the
array of data that needs to be processed incorporates the characteristics of different types and large mass.
Information resources, which have a complicated structure, reduce the effectiveness of individual
approaches. In this case, it would be appropriate to propose an approach to distinguishing the internal
structure of the problem and using different methods. Classification of text documents with a unified
approach is carried out in 3 stages:
In step 1, normalization of incoming data is carried out. In this, the steamer algorithm is used, exactly the
Ripple down vules approach is used. The step is used to generate a table of keywords based on the rules.
The resulting table forms the basis for the analysis of texts. Unstructured text is converted to structured
view: . The classification (analysis) process works exactly with X.
In the 2nd step, the structured text X is analyzed in order to eliminate the destructured views according to
the dictionary. The structure of organizing a ―topical‖ search in the dictionary is not always available in
the information that appears to be broken. In this case, it is desirable to use a modified approach of
dictionary search, with the help of which the dictionary is automatically filled with broken words, the
approach serves to increase the accuracy of the final results. Based on the completion of step 2, by finding
such a word or some modification of it in the dictionary, the information processing is terminated and the
text structure is classified as corrupted. Otherwise, text analysis is performed in Step 3 using machine
learning-based methods.
The text classification issue on the basis of the three-stage combined method is based on the accuracy of
the matching of the words determined based on the broken indicators depending on the condition of the
issues or some probability of matching one or more indicators.
Algorithms based on the text processing model. Any information that does not have a fixed structure
will need a preliminary processing. The main complication of text analysis is the large number of words
in the analyzed text, not all of which may obey some natural language laws. Especially, problematic is the
fact that the incoming data flow is not in a certain pattern, which increases the time resource of the
algorithm and the resulting errors. For this, a preliminary processing step is carried out, that is, the process
of making the incoming data stream look normal (Fig. 1).

Figure 1. Steps of the coring algorithm.

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 172

The implementation of the coring algorithm performs a process tailored to the problem:
Step 1. Eliminating non-alphanumeric characters is done by deleting numbers, punctuation marks and
other special characters. This allows us to create a vector or matrix that can be manipulated.
Step 2. Homogeneity means that all letters are represented in either upper or lower case (uppercase or
lowercase). For example, texts in the form of ―Text‖; ―TEXT‖; ―teXt‖ are normalized by being expressed
in lower case: ―text‖.
Step 3. Stop words are removed by removing auxiliary words that do not affect the content of the text.
They can include the following parts of working words: particles, adverbs, adjectives, conjunctions,
pronouns, etc. For this, a list of auxiliary words is formed. After that, cuts will be made based on this list.
Step 4. Character search and replacement replaces some letters with close case, for example the word ―h‖
with ―x‖ to reduce the time resource in words.
Step 5. It implies extracting texts at certain sizes. For example by character or word counting.
Step 6. Extraction of word parts is done by splitting the word suffixes present in the list and shortening by
comparison.
Step 7. The result is presented in natural language form or graphically.
The following forms of coring are used in text preprocessing:
 search algorithm (full selection);
 reduction of suffixes (formation of the word core based on the rules);
 lemming (making words appear in the original dictionary);
 stochastic algorithms that determine the core of words;
 statistical algorithms such as N-gram or comparison
Normalization allows us to reduce the size of the space (characters). Because of this, it is important to
reduce characters in text analysis to leave only words of significant value. Reducing the size increases the
precision in the process.
In this section, we can convert , which does not have a fixed structure, into a structured X
representation. After the normalization step is performed, analysis is performed for the presence of
indicators that do not have a structure:

Figure 2. Search chart modified according to the dictionary.

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 173

Taking into account the above, a simple classification model can be implemented in the form of the
following scheme (Fig. 3):

Figure 3. Implementation of the classification model.

Here, the incoming text data is parsed into words to form a vector of words, from which the punctuation
marks are extracted and discarded. Various digital data are also discarded. A vector of words is generated
and occurrences of words in the text are determined. After that, one of the classification algorithms (such
as linear regression, Bayesian classifier, random forest, logic computing tools) is applied and the result is
obtained.
The implementation of the considered process serves to implement the following issue: Let the piece of
text is given and it is { }, , where X is the number of n-words in the text with the
text vector highlighted. For each , = { , }, , is an equation of letters in
words. The vector of words in the dictionary is С={ }, , where x is the number
of words in the dictionary that have no structure. For each , we have ={ , }, ,
number of letters. In order to reduce the uncertainty of the results, the threshold value - is introduced,
the presence of unstructured information in the text. ; - is an indicator of the loss of some
abstract words, which are determined by the addition of word-formers from the words in the dictionary,
and P_s∈ is defined in the interval [0.5; 0.75].
So, from unstructured text information, X text part is mapped into a vector, creating a dictionary vector.
Thresholds and are set for the accuracy of the result.
Қуйидаги алгоритм таклиф этилади:
Step 1. Comparison of the amount of word symbols that do not have the structure and under
analysis. X_i → C_j s.e.b. to increase the accuracy of in m>t. we reduce to the number of t
characters in the word s.e.b. in m<t
Step 2. We compare the Hamming distance letters with symbols of letters in a row:

Step 3. and words to determine the number of overlaps of letters:

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 174

Step 4. - calculating the indicator of the degree of relevance of the analyzed words to the unstructured:

Step 5. Determine the exponent of R:

Step 6. Comparing the index P with the threshold value and classifying the text X.
If then .
Such a classification is carried out when .
Step 7. Forming a set of words ∆ t.e.b. After setting this threshold value, it is identified in the text that the
analysis is: , , where e is found t.e.b. words. In some cases, for example, when
is not sufficient, , , where z are unique words in the text X.

Step 8. found t.e.b. words determine the compatibility of words and fill in the dictionary as
necessary:

If then is a new t.e.b word is added to the indicator dictionary of t.e.b..

Step 9. set whether the text belongs to a class.
Step 10. If even at least 1 is found, it is considered to belong to class . Otherwise, work
continues in step 3, which is based on machine learning. We introduce a Bayesian classifier to implement
machine learning. This means that the automatic determination of the presence of the indicator in
based on the training sample is carried out using the a posteriori maximum estimate.
Let it is , , - structured word dictionary, in the condition of
, t.e.b dictionary of words. , , where is the
number of unique words in the dictionary with indicators.
During the training process in the Bayesian classifier, the set is appropriately filled with structured
words and so on.
The frequency of occurrence of each occurring word in the text is determined based on the training
sample (variable representing the frequency of occurrence of the word in text X of
class I). The frequency of the dictionary is filled up by one unit of value, which is the frequency
of this text encounter. Through this function, the Bayesian classifier becomes a self-educator.
The analyzed texts are counted: is a variable representing the number of texts belonging to class; T
is the total amount of texts in the study sample. During training, the allocator calculates the relevance

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 175

with probability . , , values are adjusted in each text analysis session. This
improves as the classifier works. The creation of new benchmarks and the increase of the educational
sample increase the probability of qualitative classification of the incoming object.
We perform this process based on the A2 (Bayes classifier) algorithm.
Algorithm A2:
Step 1. Determine the degree of relevance of words in :
,
where is the total amount of non-important words of the texts belonging to the class .
Step 2. Applying Laplace smoothing on given ( ):
( +∑ )
where ∑ is the amount of important (unique) words in the educational sample in .
Step 3. Determination of relevance index of the text:

Step 4. Calculation of final indicators

The relation is obtained at the maximum value .

Step 5. Updating the variables related to the training process and filling the dictionaries ,
which are the results of the Bayesian classifier, with unknown words.
Step 6. classification into the appropriate class.
Depending on the level of complexity, it will be possible to apply a number of approaches to the
classification of texts consisting of social network correspondence. A Neuro-fuzzy model with a cascade
structure is proposed for text analysis based on combined and different modifications, on the basis of
which it will be possible to group texts (Fig. 4).
A proposed cascaded neuro-fuzzy model
 it is used in the above approaches as a preprocessing algorithm. As a result, in text data with X input
structure. The generated result is .
 classify text data into groups, applying a relevance function or a weight coefficient in two processes:
first, each group is compared to the corresponding indices in the weight coefficient base, and each
comparison is against . Second, weight coefficients are accumulated and normalized;
 a set of neuro-fuzzy models of assessment of belonging to separate groups is designed to form the
degree of belonging of texts to a separate group;
 the grouping model serves for the resulting classification of the analyzed textual information.
The algorithm being built using this model will look like the following algorithm A3:

Step 1. in the form of the initial data collection is formed. where

The word in the group is relevance level of word in group

, frequency of occurrence of word in group – threshold value of
use of word in group - .
Step 2. A collection of pre-separated groups of text is separated. This process rule can be defined as
―applicable‖ or ―not applicable‖, with a value of 1 or 0, respectively.
Step 3. For each text word, a weight coefficient is determined, which indicates whether the word
belongs to the group. As a result, a training sample is formed, in our case a term pair.
Step 4. Weight coefficients are sought using the training sample. According to it, – the importance is
changed based on the level of relevance depending on the group belonging. Based on it, a dictionary of
the group is created
Step 5. Classification results are obtained using a neuro-fuzzy mechanism, based on expert evaluations of
the performed grouping.

Figure 4. Grouping structure of a cascaded neuro-fuzzy classifier.

Results of experimental research. Based on the developed software tool, the incoming data was
classified based on the study sample of 2000 words, and the analysis of the text data was carried out. As a
comparison, the results obtained from several algorithms were studied.
Table 1. Accuracy of applying text analysis models

Linear vector based neural network model 0.8002

Logistic regression model (based on word sequences) 0.8547
Recurrent+convolutional neuro-fuzzy model 0.8653
A recurrent neuro-fuzzy model (based on a portfolio of words) 0.8782
Logistic regression (based on word portfolio) 0.8846
A recurrent neuro-fuzzy model 0.8868
Logistic regression model (based on n-gram model) 0.8868
Convolutional neuro-fuzzy model 0.8888
LSTM model 0.8987

At the same time, the evaluation of the location of the words according to the content was calculated, the
evaluation of the analysis of the experimental data produced the following image in a five-point system:

Figure 5. Placement of words by polarity

From this view, the color of words used in text correspondence can be extracted. This gives us an
opportunity to form the emotional relevance of the textual information, the evaluation of correspondence
(negative or positive).
More than 2,000 correspondences obtained as an experiment were obtained in this order, and the accuracy
of the used A1-A3 algorithms was 88-89% based on different tools.
Conclusion. The conducted theoretical researches have shown that the formation of an adaptive
mechanism in the models built for the analysis of textual data and the identification of emotional shades
in them is of great importance. Depending on the conditions of the problem, the use of algorithms with a
logical heuristic approach is appropriate. It can be seen in the high results obtained that the use of
coordination of different approaches in the analysis of textual data (documents) in Uzbek language
obtained for experimental research is effective.
References
1. Xin-She Yang Introduction to Algorithms for Data Mining and Machine Learning// Copyright © 2019
Elsevier Inc. All rights reserved. Academic Press, ISBN: 978-0-12-817216-2, 171p.
2. Hemlata Sahu, Shalini Shrma, Seema Gondhalakar A Brief Overview on Data Mining Survey,
International Journal of Computer Technology and Electronics Engineering (IJCTEE), 2013, Volume
1, Issue 3; P. IndiraPriya, Dr. D.K. Ghosh A Survey on Different Clustering Algorithms in Data
Mining Technique, International Journal of Modern Engineering Research (IJMER) www.ijmer.com
Vol.3, Issue.1, Jan-Feb. 2013 pp-267-274.
3. M. A. Deshmukh, Prof. R. A. Gulhane Importance of Clustering in Data Mining, International Journal
of Scientific & Engineering Research, Volume 7, Issue 2, February-2016
4. Jaro M. A. Advances in record linkage methodology as applied to the 1985 census of Tampa Florida //
Journal of the American Statistical Association.1989. | 84 (406). | Pp. 414{420. | DOI:
10.1080/01621459. 989.10478785.
5. Rassel S. Iskusstvenniy intellekt. Sovremenniy podxod [Artificial intelligence. Modern approach] / S.
Rassel, P. Norvig, 2-ye izd.: Per. s angl. – M.: Izdatelskiy dom «Vilyams», 2006. – 1408 s.
6. Feldman R. The text mining handbook: advanced approaches in analyzing unstructured data [Tekst] /
R. Feldman, J. Sanger. – Cambridge University Press, 2007. – 410 p.
7. Moyotl-Hernandez E. An Analysis on Frequency of Terms for Text Categorization [Tekst] / E.
Moyotl-Hernandez, H. Jimenez-Salazar // Procesamiento del lenguaje natural. – 2004. – Vol. 33. – P.
141-146.
8. Moyotl-Hernandez E. Some Tests in Text Categorization using Term Selection by DTP [Tekst] / E.
Moyotl-Hernandez, H. Jimenez-Salazar // Proceedings of the Fifth Mexican International Conference
on Computer Science ENC'04. – Colima. – 2004. – P. 161-167.
9. Bolshakova Ye., Lukashevich N., Nokel M. Izvlechenie odnoslovnix terminov iz tekstovix kolleksiy
na osnove metodov mashinnogo obucheniya [Extracting single-word terms from text collections based
on machine learning methods] // Informatsionnie texnologii. — 2013. — S. 31—37
10. Usama F., Smyth P., Piatetsky–Shapiro G. From Data Mining to Knowledge Discovery in Databases
// Arti_cal intelligence Magazine. | 1996. |17(3). | Pp. 34-54.
11. Gmurman V. Ye. Teoriya veroyatnostey i matematicheskaya statistika [Theory of Probability and
Mathematical Statistics]. — Moskva : Visshaya shkola, 2013. — 479 s.

12. Roussopoulos N. Conceptual Modeling: Past, Present and the Continuum of the Future // Conceptual
Modeling: Foundations and Applications. 2009. | Pp. 139{152.
13. Hutchins J. ALPAC: The (In)Famous Report // Readings in machine translation. 2003. Vol. 14. P.
131–135.
14. Manning K. D., Ragxavan P., Shyutse X. Vvedenie v informatsionniy poisk [Introduction to
Information Retrieval]. : Per. s angl. / Pod red. P. I. Braslavskogo, D. A. Klyushina, I. V. Segalovicha.
M.: OOO «I.D. Vilyams», 2011. 528 s.
15. Lukashevich N. V. Tezaurusi v zadachax informatsionnogo poiska [Thesauruses in information
retrieval tasks]. M.: Izd-vo Moskovskogo universiteta, 2011. 512 s.
16. Deliyanni A., Kowalski R. A. Logic and Semantic Networks // Communications of the ACM. 1979.
Vol. 22, no. 3. P. 184–192.
17. Shapiro S. C. Encyclopedia of Artificial Intelligence. 2nd edition. New York, NY, USA: John Wiley
& Sons, Inc., 1992. 1724 pp.
18. Gavrilova T. A., Xoroshevskiy V. F. Bazi znaniy intellektualnix sistem [Intelligent systems
knowledge bases]. SPb: Piter, 2000. 384 s.
19. Apresyan Yu.D., BoguslovskiyI.M., IomdinL.L. i. dr. Lingvisticheskiy protsessor dlya slojnix
informatsionnix sistem [Linguistic processor for complex information systems]. M.: Nauka 1992.-
256s.
20. Osipov G.S. Metodi iskusstvennogo intellekta [Artificial Intelligence Methods].-FIZMATLIT, 2011.
21. Osipov G, Smirnov I., Tikhamirov I. Relation-situational method for text search and analysis and its
applications// Seientific and Technical Information Processing. -2010.-vol.37, no b.-P.432-437.
22. O. J. Babomuradov, N. S. Mamatov, L. B. Boboev, B. I. Otaxonova, ―Text documents classification in
Uzbek language,‖ International journal of recent technology and engineering, vol. 8, no. 2, pp. 3787–
3789, 2019.
23. Y. Du, J. Liu, W. Ke, and X. Gong, ―Hierarchy construction and text classification based on the
relaxation strategy and least information model,‖ Expert Systems with Applications, vol. 100, pp. 157–
164, 2018
24. G. Vinodhini and R. M. Chandrasekaran, ―A comparative performance evaluation of neural network
based approach for sentiment classification of online reviews,‖ Journal of King Saud University-
Computer and Information Sciences, vol. 28, no. 1, pp. 2–12, 2016.; A. Abbasi, H. Chen, and A.
Salem, ―Sentiment analysis in multiple languages: Feature selection for opinion classification in web
forums,‖ ACM Transactions on Information Systems, vol. 26, no. 3, p. 12, 2008.

VELEZ Reflection2
No ratings yet
VELEZ Reflection2
1 page
Article 6
No ratings yet
Article 6
6 pages
Data Mining With Fuzzy Methods: Status and Perspectives
No ratings yet
Data Mining With Fuzzy Methods: Status and Perspectives
8 pages
Lecture+Notes (Upgrad)
No ratings yet
Lecture+Notes (Upgrad)
5 pages
Insight Into Theoretical and Applied Informatics I... - (2.2.4 Data Mining)
No ratings yet
Insight Into Theoretical and Applied Informatics I... - (2.2.4 Data Mining)
5 pages
8 X October 2020
No ratings yet
8 X October 2020
9 pages
A Brief Survey On Data Mining For Biological and Environmental Problems.
No ratings yet
A Brief Survey On Data Mining For Biological and Environmental Problems.
46 pages
Compusoft, 3 (10), 1124-1127 PDF
No ratings yet
Compusoft, 3 (10), 1124-1127 PDF
4 pages
Challenging Tools On Research Issues in Big Data Analytics: Althaf Rahaman - SK, Sai Rajesh.K .Girija Rani K
No ratings yet
Challenging Tools On Research Issues in Big Data Analytics: Althaf Rahaman - SK, Sai Rajesh.K .Girija Rani K
8 pages
.Quantitative Data Analysis and Representat
No ratings yet
.Quantitative Data Analysis and Representat
4 pages
Big Data Vs Data Mining: Abstract
No ratings yet
Big Data Vs Data Mining: Abstract
5 pages
Research On Data Mining System Based
No ratings yet
Research On Data Mining System Based
12 pages
Mathematical Programming For Data Mining: Formulations and Challenges
No ratings yet
Mathematical Programming For Data Mining: Formulations and Challenges
35 pages
Complete Doc - Lavanya
No ratings yet
Complete Doc - Lavanya
95 pages
A Comparative Study of Various Approaches To Explore Factors For Vehicle Collision
No ratings yet
A Comparative Study of Various Approaches To Explore Factors For Vehicle Collision
12 pages
Activity 1 PDF
No ratings yet
Activity 1 PDF
3 pages
Mastering Deep Learning Fundamentals With Python
No ratings yet
Mastering Deep Learning Fundamentals With Python
172 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
76 pages
InTech-Mining Enrollment Data Using Descriptive and Predictive Approaches
No ratings yet
InTech-Mining Enrollment Data Using Descriptive and Predictive Approaches
21 pages
(fa) fianl research paper Data mining..
No ratings yet
(fa) fianl research paper Data mining..
59 pages
Analysis of Machine Learning Algorithm With Road Accidents Data Sets
No ratings yet
Analysis of Machine Learning Algorithm With Road Accidents Data Sets
11 pages
Whats App
No ratings yet
Whats App
23 pages
The Survey of Data Mining Applications and Feature Scope
No ratings yet
The Survey of Data Mining Applications and Feature Scope
16 pages
Machine Learning in Big Data Analytics IJERTCONV9IS11032
No ratings yet
Machine Learning in Big Data Analytics IJERTCONV9IS11032
5 pages
DWDMunit 2
No ratings yet
DWDMunit 2
27 pages
Unit 1
No ratings yet
Unit 1
21 pages
IEEE Representation Learning
No ratings yet
IEEE Representation Learning
6 pages
Knowledge Discovery in Heterogeneous and Unstructured Data of Industry 4 Systems
No ratings yet
Knowledge Discovery in Heterogeneous and Unstructured Data of Industry 4 Systems
6 pages
Data Warehousing & Mining: Unit - Iv
No ratings yet
Data Warehousing & Mining: Unit - Iv
32 pages
Handling Missing Value in Decision Tree Algorithm PDF
No ratings yet
Handling Missing Value in Decision Tree Algorithm PDF
6 pages
Data Mining in Education Data Classification and Decision Tree Approach 097 Z00080E10038 2
No ratings yet
Data Mining in Education Data Classification and Decision Tree Approach 097 Z00080E10038 2
5 pages
An enhanced cascade ensemble method for big data analysis
No ratings yet
An enhanced cascade ensemble method for big data analysis
12 pages
An Approach To Analysis and Classification of Data From Big Data by Using Apriori Algorithm
No ratings yet
An Approach To Analysis and Classification of Data From Big Data by Using Apriori Algorithm
4 pages
Lecture 6-Data Mining and Warehousing
No ratings yet
Lecture 6-Data Mining and Warehousing
7 pages
6 IJAEST Volume No 2 Issue No 2 Representative Based Method of Categorical Data Clustering 152 156
No ratings yet
6 IJAEST Volume No 2 Issue No 2 Representative Based Method of Categorical Data Clustering 152 156
5 pages
(IJCST-V3I1P21) : S. Padmapriya
No ratings yet
(IJCST-V3I1P21) : S. Padmapriya
5 pages
Data Mining For Humanity: An Overview
No ratings yet
Data Mining For Humanity: An Overview
4 pages
Evaluation_of_Student_Academic_Performan
No ratings yet
Evaluation_of_Student_Academic_Performan
7 pages
An Introduction To Data Mining
No ratings yet
An Introduction To Data Mining
3 pages
Data Mining
No ratings yet
Data Mining
87 pages
Data Mining and Constraints: An Overview: (Vgrossi, Pedre) @di - Unipi.it, Turini@unipi - It
No ratings yet
Data Mining and Constraints: An Overview: (Vgrossi, Pedre) @di - Unipi.it, Turini@unipi - It
25 pages
Cp5293 Big Data Analytics 1
No ratings yet
Cp5293 Big Data Analytics 1
9 pages
Future Data Mining
No ratings yet
Future Data Mining
11 pages
Chapter 1 - What is Data Mining
No ratings yet
Chapter 1 - What is Data Mining
8 pages
Dataminingarticle PDF
No ratings yet
Dataminingarticle PDF
6 pages
Database Ass
No ratings yet
Database Ass
25 pages
Project cycle- Key points
No ratings yet
Project cycle- Key points
3 pages
educational-data-mining-the-case-of-department-of-mathematics-and-computing-in-the-period-2009-2018
No ratings yet
educational-data-mining-the-case-of-department-of-mathematics-and-computing-in-the-period-2009-2018
5 pages
unit2
No ratings yet
unit2
20 pages
Knowledge Mining: Data Patterns
No ratings yet
Knowledge Mining: Data Patterns
16 pages
Class 9 AI Project Cycle Notes
No ratings yet
Class 9 AI Project Cycle Notes
8 pages
DW&DM(Unit -4)
No ratings yet
DW&DM(Unit -4)
9 pages
Data Mining Roles in Extracting The Knowledge: Full Length Research Paper
No ratings yet
Data Mining Roles in Extracting The Knowledge: Full Length Research Paper
6 pages
Data Mining: A Prediction of Performer or Underperformer Using Classification
No ratings yet
Data Mining: A Prediction of Performer or Underperformer Using Classification
5 pages
Dzemyda G. Data Science in Applications 2023
No ratings yet
Dzemyda G. Data Science in Applications 2023
260 pages
World's Largest Science, Technology & Medicine Open Access Book Publisher
No ratings yet
World's Largest Science, Technology & Medicine Open Access Book Publisher
22 pages
Visual Analytics And Interactive Technologies Data Text And Web Mining Applications Premier Reference Source 1st Edition Qingyu Zhang pdf download
100% (1)
Visual Analytics And Interactive Technologies Data Text And Web Mining Applications Premier Reference Source 1st Edition Qingyu Zhang pdf download
86 pages
Atrition Analysis Using XG Boost and Support Vector Machine Algorithms
No ratings yet
Atrition Analysis Using XG Boost and Support Vector Machine Algorithms
17 pages
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
The Future of Search
From Everand
The Future of Search
Andres J. Clary
No ratings yet
Effectiveness of The Drug L-Montus (Montelukast) in The Treatment of Bronchial Asthma
No ratings yet
Effectiveness of The Drug L-Montus (Montelukast) in The Treatment of Bronchial Asthma
4 pages
Risk Factors For Arterial Hypertension in Young Ages
No ratings yet
Risk Factors For Arterial Hypertension in Young Ages
6 pages
Morphological Changes in The Adrenal Glands in Rheumatoid Arthritis
No ratings yet
Morphological Changes in The Adrenal Glands in Rheumatoid Arthritis
6 pages
Review of Standard and Advanced Surgical Treatments For Invasive Bladder Cancer
No ratings yet
Review of Standard and Advanced Surgical Treatments For Invasive Bladder Cancer
11 pages
The Role of Endothelial Dysfunction Genes in The Development of Postpartum Hemorrhage
No ratings yet
The Role of Endothelial Dysfunction Genes in The Development of Postpartum Hemorrhage
6 pages
Immunological Effectiveness of Immuno - and Biocorrective Treatment of Dental Diseases in Pregnant Women
No ratings yet
Immunological Effectiveness of Immuno - and Biocorrective Treatment of Dental Diseases in Pregnant Women
4 pages
A Comprehensive Approach To The Treatment of Facial Neuropathy in Children
No ratings yet
A Comprehensive Approach To The Treatment of Facial Neuropathy in Children
6 pages
The Importance of Magnetic Resonance Imaging in The Diagnosis and Treatment of Diabetic Foot Syndrome
No ratings yet
The Importance of Magnetic Resonance Imaging in The Diagnosis and Treatment of Diabetic Foot Syndrome
7 pages
Volume: 05 Issue: 01 - Jan-Feb 2024
No ratings yet
Volume: 05 Issue: 01 - Jan-Feb 2024
7 pages
Prevention of Dental Diseases
No ratings yet
Prevention of Dental Diseases
4 pages
Volume: 05 Issue: 01 - Jan-Feb 2024
No ratings yet
Volume: 05 Issue: 01 - Jan-Feb 2024
8 pages
Volume: 05 Issue: 01 - Jan-Feb 2024
No ratings yet
Volume: 05 Issue: 01 - Jan-Feb 2024
4 pages
Assessing The Efficiency of Ultrasound For Diagnosing Developmental Dysplasia of The Hip (DDH) in Infants Below 6 Months
No ratings yet
Assessing The Efficiency of Ultrasound For Diagnosing Developmental Dysplasia of The Hip (DDH) in Infants Below 6 Months
8 pages
Effect of The Concept Mapping Strategy On Teaching and Retaining Some Basic Football Skills To Fifth-Grade Middle School Students
No ratings yet
Effect of The Concept Mapping Strategy On Teaching and Retaining Some Basic Football Skills To Fifth-Grade Middle School Students
10 pages
Volume: 05 Issue: 01 - Jan-Feb 2024: Central Asian Journal of Medical and Natural Sciences
No ratings yet
Volume: 05 Issue: 01 - Jan-Feb 2024: Central Asian Journal of Medical and Natural Sciences
14 pages
Spectroscopy, Thermal Analysis, Bioavailability and Anticancer Activity of Copper (Ii) Complex With Heterocyclic Azo Dye Ligand
No ratings yet
Spectroscopy, Thermal Analysis, Bioavailability and Anticancer Activity of Copper (Ii) Complex With Heterocyclic Azo Dye Ligand
13 pages
Structural and Functional Changes in Periodontal Tissues During Prosthetics With Metal-Ceramic and Zirconium Dentures
No ratings yet
Structural and Functional Changes in Periodontal Tissues During Prosthetics With Metal-Ceramic and Zirconium Dentures
11 pages
Changes in SEP Indicators in Patients Using RPMS in Muscle Hypotonia Syndrome
No ratings yet
Changes in SEP Indicators in Patients Using RPMS in Muscle Hypotonia Syndrome
4 pages
Comparative Evaluation of The Effectiveness of Ultrasound and X-Ray Imaging in The Diagnosis of Hip Dysplasia in Children Under 6 Months of Age
No ratings yet
Comparative Evaluation of The Effectiveness of Ultrasound and X-Ray Imaging in The Diagnosis of Hip Dysplasia in Children Under 6 Months of Age
8 pages
Effectiveness of Instruction Program in Improving Balance Level Among Seniors With Osteoporosis
No ratings yet
Effectiveness of Instruction Program in Improving Balance Level Among Seniors With Osteoporosis
8 pages
Analysis of Somatic and Reproductive History of Women With Inflammatory Diseases of The Pelvic Organs Due To Hiv Infection
No ratings yet
Analysis of Somatic and Reproductive History of Women With Inflammatory Diseases of The Pelvic Organs Due To Hiv Infection
9 pages
Volume: 04 Issue: 06 - Nov-Dec 2023
No ratings yet
Volume: 04 Issue: 06 - Nov-Dec 2023
6 pages
Hygienic Assessment of The Impact of Harmful Substances Formed During The Production of Mineral Fertiliser On The Immune Health Status of Children
No ratings yet
Hygienic Assessment of The Impact of Harmful Substances Formed During The Production of Mineral Fertiliser On The Immune Health Status of Children
6 pages
Cortical Fixing Screws Are The Method of Choice For Conservative Treatment of Mandibular Fractures
No ratings yet
Cortical Fixing Screws Are The Method of Choice For Conservative Treatment of Mandibular Fractures
13 pages
Needs For Resort and Health Care and Innovative Approaches To Its Meeting
No ratings yet
Needs For Resort and Health Care and Innovative Approaches To Its Meeting
6 pages
The Importance of Microelements in The Development of Chronic Kidney Disease
No ratings yet
The Importance of Microelements in The Development of Chronic Kidney Disease
3 pages
Vitamin D Deficiency For Patients With Depression (Case Control Study)
No ratings yet
Vitamin D Deficiency For Patients With Depression (Case Control Study)
15 pages
Iraqibacter As A New Emerging Pathogen
No ratings yet
Iraqibacter As A New Emerging Pathogen
6 pages
Separable Cubic Stochastic Operators
No ratings yet
Separable Cubic Stochastic Operators
8 pages
Pregnancy and Undifferentiated Connective Tissue Dysplasia
No ratings yet
Pregnancy and Undifferentiated Connective Tissue Dysplasia
5 pages
Environmental Engineering Questions
No ratings yet
Environmental Engineering Questions
6 pages
Industrial Security Management Lecture Kcast
No ratings yet
Industrial Security Management Lecture Kcast
180 pages
Recognizing Talent Magnet
No ratings yet
Recognizing Talent Magnet
3 pages
Banglalink
No ratings yet
Banglalink
29 pages
Kingspan Insulated Panels Color Chart en Us Ca
No ratings yet
Kingspan Insulated Panels Color Chart en Us Ca
4 pages
TTSH Environmental Strategy Presentation S
No ratings yet
TTSH Environmental Strategy Presentation S
12 pages
Planning and Design Hospitals Other Facilities 2
No ratings yet
Planning and Design Hospitals Other Facilities 2
6 pages
BSNL Postpaid Mobile Plans-290317
No ratings yet
BSNL Postpaid Mobile Plans-290317
4 pages
Saras Putri Wulandari 193110191 - 2B (Loogbook Pratikum)
No ratings yet
Saras Putri Wulandari 193110191 - 2B (Loogbook Pratikum)
23 pages
CS1311A Lecture 4 - Computer Software
No ratings yet
CS1311A Lecture 4 - Computer Software
39 pages
Safari - 10 Jun 2023 at 16_41
No ratings yet
Safari - 10 Jun 2023 at 16_41
1 page
Đề Thi Thử TN THPT Tiếng Anh 2024 - THPT Chuyên Hạ Long - Quảng Ninh - File Word Có Lời Giải
No ratings yet
Đề Thi Thử TN THPT Tiếng Anh 2024 - THPT Chuyên Hạ Long - Quảng Ninh - File Word Có Lời Giải
35 pages
1701946874862
No ratings yet
1701946874862
3 pages
Pipeline Coatings Nov 2013
100% (3)
Pipeline Coatings Nov 2013
46 pages
Usage of Organic Manure
No ratings yet
Usage of Organic Manure
4 pages
Resume Sinan Icik
No ratings yet
Resume Sinan Icik
25 pages
Practical - 1 - Data Exploration and Data Preparation - DAL - Lab
100% (1)
Practical - 1 - Data Exploration and Data Preparation - DAL - Lab
8 pages
Water Scarcity
No ratings yet
Water Scarcity
21 pages
BRE Modern Methods of Construction
No ratings yet
BRE Modern Methods of Construction
10 pages
TG589vac User Guide
No ratings yet
TG589vac User Guide
71 pages
8.1 SailPoint Provisioning Infrastructure Modules Guide
No ratings yet
8.1 SailPoint Provisioning Infrastructure Modules Guide
20 pages
Reading Comprehension - Food Fright
No ratings yet
Reading Comprehension - Food Fright
4 pages
Room Cleaning
No ratings yet
Room Cleaning
29 pages
2ES01
No ratings yet
2ES01
2 pages
Geographical Information System and Crime Mapping 1st Edition Monika Kannan 2024 Scribd Download
100% (3)
Geographical Information System and Crime Mapping 1st Edition Monika Kannan 2024 Scribd Download
65 pages
Designing female friendly toilets in delhi _ Srijita Chakraborty
No ratings yet
Designing female friendly toilets in delhi _ Srijita Chakraborty
2 pages
Waterside Angling Club Rules 2014
No ratings yet
Waterside Angling Club Rules 2014
3 pages
Noticia Ingles
No ratings yet
Noticia Ingles
2 pages
DAO 8, s.2003 - Guidelines For The Mandatory Accreditation of Rebuilding Centers B
No ratings yet
DAO 8, s.2003 - Guidelines For The Mandatory Accreditation of Rebuilding Centers B
20 pages
FIA MA2 Mock Exam - FREE PakAccountants - Com P
No ratings yet
FIA MA2 Mock Exam - FREE PakAccountants - Com P
2 pages

Approach To Textual Data Analysis

Uploaded by

Approach To Textual Data Analysis

Uploaded by

CENTRAL ASIAN JOURNAL OF THEORETICAL

AND APPLIED SCIENCES

Approach to Textual Data Analysis

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 170

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 171

Figure 1. Steps of the coring algorithm.

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 172

Figure 2. Search chart modified according to the dictionary.

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 173

Figure 3. Implementation of the classification model.

Step 3. and words to determine the number of overlaps of letters:

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 174

Step 5. Determine the exponent of R:

If then is a new t.e.b word is added to the indicator dictionary of t.e.b..

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 175

Step 4. Calculation of final indicators

The relation is obtained at the maximum value .

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 176

Step 1. in the form of the initial data collection is formed. where

The word in the group is relevance level of word in group

Figure 4. Grouping structure of a cascaded neuro-fuzzy classifier.

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 177

Linear vector based neural network model 0.8002

Figure 5. Placement of words by polarity

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 178

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 179

© 2023, CAJOTAS, Central Asian Studies, All Rights Reserved 180

You might also like