Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
Ebook164 pages2 hours

Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis

Rating: 0 out of 5 stars

()

Read preview

About this ebook

What Is Group Method of Data Handling


The Group Method of Data Handling (GMDH) is a series of inductive algorithms for the computer-based mathematical modeling of multi-parametric datasets that incorporates fully automatic structural and parametric optimization of models. These algorithms are used in the Group Method of Data Handling (GMDH).


How You Will Benefit


(I) Insights, and validations about the following topics:


Chapter 1: Group Method of Data Handling


Chapter 2: Supervised Learning


Chapter 3: Artificial Neural Network


Chapter 4: Machine Learning


Chapter 5: Perceptron


Chapter 6: Alexey Ivakhnenko


Chapter 7: Multilayer Perceptron


Chapter 8: Minimum Description Length


Chapter 9: Nonlinear System Identification


Chapter 10: Types of Artificial Neural Networks


(II) Answering the public top questions about group method of data handling.


(III) Real world examples for the usage of group method of data handling in many fields.


Who This Book Is For


Professionals, undergraduate and graduate students, enthusiasts, hobbyists, and those who want to go beyond basic knowledge or information for any kind of group method of data handling.


What Is Artificial Intelligence Series


The artificial intelligence book series provides comprehensive coverage in over 200 topics. Each ebook covers a specific Artificial Intelligence topic in depth, written by experts in the field. The series aims to give readers a thorough understanding of the concepts, techniques, history and applications of artificial intelligence. Topics covered include machine learning, deep learning, neural networks, computer vision, natural language processing, robotics, ethics and more. The ebooks are written for professionals, students, and anyone interested in learning about the latest developments in this rapidly advancing field.
The artificial intelligence book series provides an in-depth yet accessible exploration, from the fundamental concepts to the state-of-the-art research. With over 200 volumes, readers gain a thorough grounding in all aspects of Artificial Intelligence. The ebooks are designed to build knowledge systematically, with later volumes building on the foundations laid by earlier ones. This comprehensive series is an indispensable resource for anyone seeking to develop expertise in artificial intelligence.

LanguageEnglish
Release dateJun 21, 2023
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis

Read more from Fouad Sabry

Related to Group Method of Data Handling

Titles in the series (100)

View More

Related ebooks

Intelligence (AI) & Semantics For You

View More

Related articles

Reviews for Group Method of Data Handling

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Group Method of Data Handling - Fouad Sabry

    Chapter 1: Group method of data handling

    The Group Method of Data Handling, or GMDH, is a series of inductive methods for computer-based mathematical modeling of multi-parametric datasets. It incorporates completely automated structural and parametric optimization of models, and it was developed by Group Method of Data Handling (GMDH).

    Data mining, knowledge discovery, prediction, modeling of complex systems, optimization, and pattern recognition are just few of the applications that make use of GMDH. GMDH algorithms are distinguished by an inductive method that accomplishes sorting-out of ever more intricate polynomial models and finding the optimal solution by making use of an external criteria.

    A GMDH model that has several inputs but only one output is considered to be a subset of the base function's components (1):

    Y(x_{1},\dots ,x_{n})=a_{0}+\sum \limits _{{i=1}}^{m}a_{i}f_{i}

    where fi are elementary functions dependent on different sets of inputs, ai are coefficients and m is the number of the base function components.

    In order to locate the optimal answer, GMDH algorithms take into account several component subsets of the base function (1). These subsets are referred to as partial models. The technique of least squares is used to estimate the values of these models' coefficients. The number of partial model components used in GMDH algorithms is progressively increased in order to locate a model structure with the optimum level of complexity, as measured by the lowest possible value of an external criteria. This method is referred to as the self-organization of models..

    The initial base function that was used in GMDH was the Kolmogorov–Gabor polynomial, which eventually became more sophisticated (2):

    Y(x_{1},\dots ,x_{n})=a_{0}+\sum \limits _{{i=1}}^{n}{a_{i}}x_{i}+\sum \limits _{{i=1}}^{n}{\sum \limits _{{j=i}}^{n}{a_{{ij}}}}x_{i}x_{j}+\sum \limits _{{i=1}}^{n}{\sum \limits _{{j=i}}^{n}{\sum \limits _{{k=j}}^{n}{a_{{ijk}}}}}x_{i}x_{j}x_{k}+\cdots

    In most cases, more straightforward partial models with functions of up to the second degree are used.

    Professor Alexey G. Ivakhnenko of the Institute of Cybernetics in Kyiv is credited with having developed the approach in the year 1968. Because this inductive technique was a computer-based method right from the start, the key practical outcomes obtained at the foundation of the new theoretical principles were a collection of computer programs and algorithms. The author adopted a strategy of free code sharing, which contributed to the rapid adoption of the approach by a significant number of research labs all over the globe. Because most regular labor is now done on a computer, the amount of effect that human input has on the end output is significantly reduced. In point of fact, this strategy is capable of being seen as one of the implementations of the Artificial Intelligence thesis, which asserts that a computer is capable of performing the duties of a competent adviser to human beings.

    The development of GMDH is the result of a synthesis of ideas from a variety of scientific fields, including: the cybernetic concept of black box and the principle of successive genetic selection of pairwise features; Godel's incompleteness theorems and the Gabor's principle of freedom of decisions choice; Godel's completeness theorems; Godel's completeness theorems; Godel's completeness theorem Use of two or more subsets of a data sample is required before optimum models can be selected and compared. This makes it feasible to avoid making early assumptions because, throughout the process of automatically constructing the ideal model, sample division implicitly recognizes various kinds of uncertainty. This makes it possible to avoid making preliminary assumptions.

    During the process of development, an organic connection was developed between the challenge of developing models for noisy data and the difficulty of signal propagation across a channel that contains noise. This theory's most important finding is that the amount of uncertainty in the data determines the complexity of the ideal predictive model: the greater the level of uncertainty in the data (for example, owing to noise), the more straightforward the optimal model has to be (with less estimated parameters). The development of the GMDH theory as an inductive way of automatically adapting optimum model complexity to the degree of noise variation in fuzzy data was started as a result of this. As a result of this, the GMDH is often regarded to be the first information technology for the purpose of extracting knowledge from experimental data.

    From 1968 to 1971, the sole criteria that was used to solve issues involving identification, pattern recognition, and short-term forecasting was the regularity criterion. This period is distinguished by this singular criterion. Polynomials, logical nets, fuzzy Zadeh sets, and Bayes probability formulae were used as reference functions in this research. The very high accuracy of the forecasts produced by the new method served to encourage the authors. There was no investigation on noise immunity.

    Period 1972–1975. It was possible to find a solution to the issue of modeling noisy data and working with little information. We advocated using multicriteria selection in conjunction with the exploitation of extra priory information in order to increase noise immunity. The best trials shown that when the specification of the ideal model is expanded to include new criteria, the amount of noise may be up to 10 times higher than the signal. After then, the theory of general communication based on Shannon's theorem was used to make improvements.

    Period 1976–1979. An investigation was conducted on the convergent behavior of multilayered GMDH algorithms. It has been shown that some multilayered algorithms suffer from multilayerness error, which is comparable to the static error that control systems experience. In 1977, a proposal was made for a solution to challenges involving the objective study of systems using multilayered GMDH algorithms. It came out that sorting by criterion ensemble identifies the only optimum system of equations, and therefore in order to present the constituents of complex objects, their primary input and output variables, it is necessary to do so.

    Period 1980–1988. There were a great deal of significant theoretical findings obtained. It is now abundantly obvious that comprehensive physical models cannot be used for the purpose of making predictions for the long future. It has been shown that non-physical models of GMDH are superior than physical models of regression analysis in terms of accuracy when it comes to approximation and forecasting. For the purpose of modeling, two-level algorithms that operate on two distinct time scales were designed.

    Since 1989, a number of novel methods, including AC, OCC, and PF, for the non-parametric modeling of fuzzy objects and SLP for expert systems have been created and researched. The current stage of development for GMDH may be defined as a blossoming out of deep learning neuronets and parallel inductive algorithms designed for multiprocessor machines.

    One of the most important aspects of GMDH is the inclusion of an external criteria. The criterion outlines the conditions that the model has to fulfill, such as the minimization of least squares. It is always computed using a distinct portion of the data sample that has not been included into the estimate of the coefficients. Because of this, it is now feasible to choose a model with the optimum amount of complexity in accordance with the degree of uncertainty present in the input data. There are a few widely used standards here:

    The Criterion of Regularity (CR) is the model with the fewest squares that fits the data at sample B.

    Criterion of Minimum Bias or Consistency – a squared error of difference between the estimated outputs (or coefficients vectors) of two models developed on the basis of two distinct samples A and B, divided by squared output estimated on sample B. Criterion of Minimum Bias or Consistency is also known as the Criterion of Consistency. Comparison of models made with its help makes it possible to obtain consistent models and to unearth a concealed physical rule from among the chaotic data.

    Cross-validation criterion.

    When utilizing GMDH for modeling, the only parameters that are pre-selected are the selection criteria and the maximum model complexity. After then, the process of design starts with the first layer and continues beyond. The number of layers, as well as the neurons and connections in the hidden layers, are all determined automatically by the model structure. It is feasible to take into account any conceivable combination of permissible inputs (every potential neuron). After that, the polynomial coefficients are found by using one of the several available techniques for minimizing, such as the singular value decomposition (with training data). The neurons that have a higher value according to an external criteria are then retained, while those with a lower value are eliminated. If the external criterion for the layer's best neuron reaches the minimum or surpasses the stopping criterion, the network design is finished, and the polynomial expression of the best neuron of the last layer is introduced as the mathematical prediction function. If the external criterion for the layer's best neuron does not reach the minimum or surpass the stopping criterion, the network design is not finished, and the next layer will be generated, and this process will continue.

    The selection of an order for the study of partial models may be done in a variety of different ways. The most common choice is the very first consideration order that is used in GMDH. This order was first referred to as the multilayered inductive method. It involves the classification of models that get progressively more complex after starting with a basic function. The model with the lowest number of negative external criteria characteristics is considered to be the best. A multilayered approach is analogous to an artificial neural network in which the activation function of neurons is polynomial. As a result, the algorithm that utilizes this methodology is often referred to as a GMDH-type Neural Network or a Polynomial Neural Network. Li demonstrated that GMDH-type neural networks perform superiorly than traditional forecasting algorithms as Single Exponential Smooth, Double Exponential Smooth, ARIMA, and back-propagation neural networks. These traditional forecasting algorithms are used to predict future events.

    Combinatorial searches, which may be either restricted or comprehensive, are becoming an increasingly common alternative to partial model consideration as one of the most essential methods available. This strategy has certain benefits over Polynomial Neural Networks, but it demands a significant amount of computer power, and as a result, it is not useful for things that have a huge number of inputs. If the noise level in the input data is larger than zero, one of the significant accomplishments of Combinatorial GMDH is that it totally outperforms the linear regression technique. This is an important success. It ensures that the model with the most potential for improvement will be discovered after extensive sorting.

    The following are the stages that the basic combinatorial algorithm takes:

    At a minimum, the data sample is split into two separate samples, A and B.

    Generates subsamples from A based on partial models that gradually increase in level of complexity.

    provides estimates of the coefficients of partial models at each layer of the complexity of the models.

    Performs calculations to determine the value of the external criteria for models based on sample B.

    selects the model (or group of models) that performs the best as measured by the least value of the

    Enjoying the preview?
    Page 1 of 1