Practical Implementation of Random Forest-Based Mineral
Practical Implementation of Random Forest-Based Mineral
https://doi.org/10.1007/s11053-019-09598-y
Original Paper
1,2
Arianne Ford
With the increasing use of machine learning for big data analytics, several methods have
been implemented for the purpose of exploration targeting using mineral potential mapping
in a GIS environment. Random forests (RF) have been successfully applied to data-driven
mineral potential mapping using relatively small numbers of input maps that have typically
been pre-classified by a geologist familiar with the mineral system being targeted. However,
it is useful to understand how well RF perform for mineral potential mapping when a large
number of multi-class categorical or non-thresholded numeric input maps are used in the
classification or when weighted or ranked training data are used. Four different imple-
mentations of RF are presented to examine how the results vary depending on the degree of
intervention from an expert in the modeling process. A case study has been devised using
data from the eastern Lachlan Orogen in New South Wales (Australia) for the purposes of
targeting porphyry Cu–Au mineralization related to the Macquarie Arc. The results
demonstrate that the use of a large number of multi-class categorical or non-thresholded
numeric predictive input maps results in a poor mineral potential map outcome. An expert
review to determine reclassifications or thresholds that produce geologically meaningful
maps as proxies for the mineral system being targeted results in more effective RF-based
mineral potential maps being produced. Weighting or ranking the deposits used as training
data produces more narrowly defined prospective areas that may assist with targeting tier-
one economic deposits. Comparison of the RF results to a standard weights of evidence
analysis highlighted some significant differences in which predictive maps should be con-
sidered important for modeling, and in the extent of prospective area delineated from each
output mineral potential map.
KEY WORDS: Lachlan Orogen, Machine learning, Mineral potential mapping, Porphyry copper–gold,
Random forests, Weights of evidence.
criteria represented by predictive maps as spatial simple implementation of RF for mineral potential
proxies for the mineral system of interest. Mineral mapping. Although the aim of machine learning is
potential mapping using data-driven methods in- often to find patterns in ‘‘big data’’ (Sagiroglu and
cludes semi-automated techniques such as weights of Sinanc 2013), and the aim of mineral potential
evidence (WofE) (e.g., Bonham-Carter 1994; Singer mapping is to effectively target large mineral de-
and Kouda 1999; Partington 2010; Joly et al. 2012; posits for mineral exploration purposes (Hronsky
Ford et al. 2015, 2019b), logistic regression (e.g., and Kreuzer 2019; Yousefi et al. 2019), most of the
Harris et al. 2003; Fallon et al. 2010; Agterberg 2011), published mineral potential mapping studies that
and evidential belief functions (e.g., Carranza and utilize machine learning to date seem to use rela-
Hale 2002; Carranza et al. 2005; Carranza 2014) that tively few input maps and treat the deposit training
require expert input, as well as machine learning data equally (e.g., Singer and Kouda 1999; Porwal
methods such as neural networks (e.g., Brown et al. et al. 2003; Fung et al. 2005; Carranza and Laborte
2000; Bougrain et al. 2003; Porwal et al. 2003; Fung 2015a, b; Carranza and Laborte 2016; Zhang et al.
et al. 2005) and random forests (RF) (e.g., Rodriguez- 2016; Wang et al. 2019). Similar questions relating to
Galiano et al. 2014; Carranza and Laborte 2015a, b, input map selection and training point weighting
2016; Rodriguez-Galiano et al. 2015; McKay and were addressed by Harris et al. (2006) and Harris
Harris 2016; Zhang et al. 2016; Hariharan et al. 2017; and Sanborn-Barrie (2006) for gold mineralization
Wang et al. 2019). Two key benefits of machine in Ontario; however, at most, only 27 input predic-
learning methods are that they are designed to handle tive maps were used in the models. These studies
a large number of feature vectors (i.e., predictive recognized that weighting of training data was lim-
maps) and that they are not susceptible to conditional ited by the sparsity of economic deposits. However,
dependence as in WofE (Agterberg and Cheng 2002), it is noted that Nykänen (2008) weighted training
which can make them more appropriate methods for data for a neural network implementation. Xiong
mineral potential mapping depending on the partic- et al. (2018) examined 42 input maps for ‘‘big data’’
ular case study being evaluated. mineral potential mapping using a deep autoencoder
RF represents an advanced machine learning network. However, the methodology suggests that
implementation of a decision-tree algorithm (Reddy thresholds were applied, or reclassified maps used as
and Bonham-Carter 1991). A RF is a set of decision input.
trees that are organized hierarchically by a set of Using maps produced as part of a regional-scale
rules governing which decision is made between a mineral potential mapping study using WofE for
parent node and its child nodes in order to make porphyry Cu–Au mineralization in the Macquarie
predictions about training data characteristics Arc of the eastern Lachlan Orogen in NSW, Aus-
(Breiman 2001; Carranza and Laborte 2015b). For tralia (Ford et al. 2019b; Fig. 2), a number of ap-
the purposes of mineral potential mapping, the aim proaches have been evaluated for creating the input
is for the RF to decide whether a given predictive maps to be classified using RF.
map predicts a known deposit location or not (e.g., The main aim of this study is to evaluate whe-
Fig. 1). ther robust and reliable mineral potential maps can
Studies have shown that the selection of a par- be created using RF by using a large number of
ticular method for data-driven mineral potential multi-class categorical or non-thresholded numeric
mapping is largely dependent on the availability of a predictive maps relevant to the mineral system as
sufficient amount of training data with which the input for classification. In this study, a multi-class
model can be trained as well as the number of pre- categorical or non-thresholded numeric map is de-
dictive maps with missing data (e.g., Brown et al. fined as a predictive map that has had no favorability
2003; Carranza and Laborte 2015b; Wang et al. criteria applied. This result is then compared to: (1)
2019). Carranza and Laborte (2015b) show that both RF results using a large number of binary predictive
WofE and RF mineral potential mapping can be maps that have had statistically valid and geologi-
successfully implemented with a relatively low cally meaningful thresholds determined through
number of training points. However, this study only WofE analysis and expert review; (2) RF results
incorporated ten predictive maps into the analysis. using a subset of the binary predictive maps that
To date, no study has examined the impact of were used in the WofE posterior probability map;
predictive map selection or how the weighting of and (3) the WofE posterior probability map. In
training points impacts the outputs of a relatively addition, the deposits used as training data for the
Practical Implementation of Random Forest-Based Mineral Potential Mapping
Figure 1. A simple RF example for mapping the mineral potential for a porphyry Cu–Au system based on five predictive maps (magma
fertility, shoshonitic volcanics, contraction faults, reactivity contrast, and anomalous Cu–Au in drillholes). (a), (b) and (c) show three
random decision trees with a tree depth of 3, where the decision at each branch is whether the presence or absence of a given feature
predicts the location of a deposit or not. (d) shows the average of all the results from the decision trees, which are aggregated, based on the
majority vote.
Figure 2. Macquarie Arc study area and training points. Coordinates shown in MGA Zone 55.
Practical Implementation of Random Forest-Based Mineral Potential Mapping
tism have been recognized that relate to the evolu- Both high and low sulfidation epithermal Au
tion of the Macquarie Arc and its associated mineral deposits associated with the porphyry mineral sys-
systems (Crawford et al. 2007; Glen et al. 2007, tem can be offset by thrust faulting from the asso-
2011). Phase 1 is non-mineralized. Phases 2 and 3 are ciated porphyry (e.g., Peak Hill, Gidginbung, Cowal;
restricted to the mid-late Ordovician (‡ 445 Ma) Ford et al. 2019b).
and are genetically associated with porphyry Cu–Au Based on this understanding of the porphyry
mineralization at Cowal, Peak Hill, and Copper Hill, Cu–Au mineral system, predictive variables that
although modern mining in these districts has been represent spatial proxies for four key mineral system
of associated epithermal and skarn deposits rather components were determined for: (1) source, (2)
than the porphyries. Evidence for phases 2 and 3 transport, (3) trap, and (4) deposition/preservation
based on petrology, geochemistry and radiogenic (Knox-Robinson and Wyborn 1997; Ford et al.
isotopes is consistent with calc-alkaline magmatism 2019b). Table 1 shows predictive variables for the
in a primitive subduction-related back-arc setting mineral system include, but are not limited to (Ford
(Ford et al. 2019b). et al. 2019b): calc-alkaline magmas of Ordovician to
Phase 4 magmatism commenced around early Silurian age within the Macquarie Arc, asso-
460 Ma, with the latter stages involving arc accretion ciation with skarn and epithermal mineral occur-
with the Gondwanan margin and restriction of vol- rences, large elliptical magnetic highs and smaller
canism (Ford et al. 2019b). Porphyry mineralization anomalies with an inner ‘‘donut’’ low indicative of
occurred between 446 and 434 Ma and resulted in porphyry signatures at depth, regional-scale and
the formation of the very large to giant Northparkes cross-volcanic belt structures, dilational fault bends
and Cadia Au–Cu systems, along with Kaiser and and jogs, veining, and geochemical anomalies with
other deposits in the northern Molong Volcanic Belt elevated Cu, Au, Ag, and Mo.
(Ford et al. 2019b). Notably, these porphyry systems
are K-rich compared with older (phases 1–3) parts of
the Macquarie Arc and with calc-alkaline terranes DATA AND METHODOLOGY
worldwide. It has been suggested that phase 4
magmas may have tapped a long-lived, LILE-en- Data
riched, mantle-like reservoir (Blevin 2002; Forster
et al. 2011). Available geoscience data were compiled from
Emplacement of these porphyries at the re- the Geological Survey of New South Wales (Ta-
gional-scale was controlled by pre-existing, arc-ob- ble 2) and were reviewed, analyzed, and reclassified
lique to arc-transverse master faults, controlling in accordance with the porphyry Cu–Au mineral
grabens, and releasing bends and jogs. At Cadia, system previously described (Ford et al. 2019b).
such an arc-transverse graben focused the mineral- Processing of the data included classifying and
ized corridor (Fox et al. 2015). attributing rock units, creating derivative datasets
The porphyry Cu–Au ore bodies are typically from fault data, and determining anomalous
pipe-like (Northparkes, E22, E26) or tabular and thresholds for the geochemistry data relevant to the
steeply dipping developed about apophyses, while mineral system.
others such as Cadia Hill are hosted by carapace For the WofE and RF mineral potential map-
phases of the intrusive complex (Ford et al. 2019b). ping, training data were chosen from the MetIndEx
Stockwork and sheeted vein deposits with K-rich mineral occurrence database (Geological Survey of
selvages and hydrothermal breccias developed New South Wales 2019). There were 14 deposit
above and proximal to porphyritic intrusive dykes training points selected for their relevance to the
and pipe-like intrusions. Disseminated Cu–Au oc- Benambran Cycle porphyry Cu–Au mineralizing
curs within deeper porphyry stocks (e.g., Cowal). event in the Macquarie Arc (Fig. 2). A separate
Low total sulfides in K-rich examples have subset of 211 porphyry Cu–Au occurrences was
typical porphyry-like zonation (inwards) from py created for validation purposes. An equal number of
only fi py > cp fi cp fi py fi cp > bn with non-deposit locations were randomly mapped from
a barren sodic core in some systems. Late-stage Au– areas that had a strong likelihood of not being
Zn bearing phyllic–pyrite zones are narrow in phase undiscovered deposit locations. The criteria used for
4 systems, which is a function of the alkaline nature mapping non-deposit locations were to create ran-
of the alteration chemistry (Ford et al. 2019b). dom points in areas with geology that is unrelated to
A. Ford
Table 1. Predictive variables related to the porphyry Cu–Au mineral system processes in the Macquarie Arc (Ford et al. 2019b)
Source Calc-alkaline magmas of Ordovician to early Silurian age within the Macquarie Arc
Intrusions which are oxidized magnetite-series diorite to quartz monzonite—syenite and pegmatitic phases
Shoshonitic (high-K) subaqueous volcanics
Association with skarn and epithermal mineral occurrences
Large elliptical magnetic highs and smaller anomalies with an inner ‘‘donut’’ low indicative of porphyry signatures
at depth.
Transport Regional structures
NW or WNW trending cross-volcanic belt structures
Trap Graben structures
Dilational fault bends and jogs
Pipe-like, finger-like, and dyke-like complexes emplaced near the base of K-rich volcanics
Veining and propylitization
Deposition/Preserva- Known Ag–Au-base metal epithermal mineralization
tion Mesothermal carbonate-Au–As-base-metal mineralization
Au–Zn bearing phyllic-pyrite zones
High Au/Cu ratios in relatively restricted late phyllic and silicic (cap) alteration zones
Cu, Au, Ag, and Mo geochemical anomalies with peripheral Pb and Zn
Elevated Ti, V, P, F, Ba, Sr, Rb, Nb, Te, Pb, Zn, and PGE assays
Table 2. Data used in the WofE and RF mineral potential mapping in the eastern Lachlan Orogen
phase 4 Macquarie Arc porphyry Cu–Au mineral- strained to the Macquarie Arc. The extents of the
ization. This was achieved by generating the random Macquarie Arc were derived from the mapped
points at least 1 km away from the nearest Ordovi- seamless basement geology and a geophysical
cian–Silurian magmatic unit with a minimum spac- interpretation undercover (Ford et al. 2019b; Fig. 2).
ing between points of 100 m. These randomly
generated points were then reviewed to ensure they
had a good spatial distribution across the study area. Weights of Evidence
Due to the established constraints on the per-
missive host rocks for the Ordovician–Silurian por- The WofE workflow used in this study is com-
phyry Cu–Au mineral system in the eastern Lachlan prehensively reviewed in Ford et al. (2019a). A de-
Orogen, the study area for all models was con- tailed list of predictive maps generated for the
Practical Implementation of Random Forest-Based Mineral Potential Mapping
porphyry Cu–Au mineral system in the eastern rior probability is greater than the prior probability,
Lachlan Orogen is available in the spatial data accounts for 15.2% of the Macquarie Arc study area
table within the digital data package published by (Fig. 3) and predicted the location of 13 of the 14
the Geological Survey of New South Wales (Ford training points (Kaiser is located in an unprospective
et al. 2019b). The WofE analysis was implemented area based on its posterior probability value).
using the ArcSDM toolbox for ArcGIS (https://gith The posterior probability map was then vali-
ub.com/gtkfi/ArcSDM). dated by calculating the efficiency of classification
Using the detailed mineral system model and using area–frequency analysis (Chung and Fabbri
data outlined in Ford et al. (2019b), 197 multi-class 2003; Ford et al. 2019a). This measures how well the
categorical or non-thresholded numeric maps were relevant mineralization is classified by the mineral
made for the porphyry Cu–Au mineral system in the potential map. To avoid using the same deposits for
eastern Lachlan Orogen and the spatial statistics both training the WofE model and validating it, a
reviewed to determine the relevant threshold for separate subset of 211 porphyry Cu–Au mineral
each. The thresholds are required to meet three occurrences located within the study area were used
criteria: (1) the threshold must be statistically valid for validation using the area–frequency analysis
(i.e., meet a minimum studentized contrast value; resulting in an efficiency of classification of 91.2%.
Bonham-Carter 1994); (2) the threshold must be Conditional independence was tested using the
geologically meaningful (e.g., not suggest an associ- omnibus test (Agterberg and Cheng 2002), and the
ation with K-depleted magmas when the porphyry porphyry Cu–Au posterior probability map was
Cu–Au mineralization is related to K-rich magmas); found to fail the test. However, lack of conditional
and (3) the threshold must correspond to an area independence is typical in mineral potential map-
that reduces the mineral exploration search space. ping as the physical and chemical processes that
The strength of the spatial relationship between form the mineral system are rarely independent of
the training data and predictive maps is quantified each other (e.g., existing faults controlling the
by the contrast value (C), and the confidence in the emplacement of fertile magmas). Lack of condi-
contrast is measured by the studentized contrast tional independence is known to overestimate the
(StudC; Bonham-Carter 1994). In this study, statis- posterior probability values (Agterberg and Cheng
tical significance was defined as having a StudC value 2002), and as such, the posterior probability values
> 0.5 for the relationship between the training should be considered a relative measure of the
points and a given predictive map. Each map that mineral potential rather than an absolute measure of
had a statistically significant relationship with the the probability of finding mineralization at any given
training data then had its corresponding numerical location in the study area (Ford et al. 2019a). Ma-
threshold or categorical classes evaluated to deter- chine learning methods such as RF or neural net-
mine whether they made geological sense in the works are designed to handle large numbers of
context of the porphyry Cu–Au mineral system, and predictive maps, and as they do not have problems
whether these thresholds or classes equated to a with conditional independence, can potentially
favorable area that was less than 20% of the make them more appropriate than WofE.
overall study area in order to reduce the exploration
search space. This resulted in 164 binary favorable
and unfavorable predictive maps being produced for Random Forests
the mineral system.
These binary maps were then re-analyzed using A RF is a set of randomly selected decision
WofE and the maps that had the best regional cov- trees based on a set of rules that make a decision
erage, significant spatial associations with the train- between a parent node and its child nodes based on
ing points, and had minimal duplication of map some decision criteria in order to make a prediction
patterns were selected as inputs for the mineral about some particular characteristic of the training
potential map (Ford et al. 2019a). A subset of nine data (Breiman 2001; Carranza and Laborte 2015b).
predictive maps was then integrated to produce a In mineral potential mapping, the aim is for the
posterior probability map (model WofE9) that rep- decision between the parent and child nodes in each
resents the relative geological potential of each grid tree in the RF to determine whether a given pre-
cell in the study area for hosting the relevant min- dictive map value predicts a known deposit location
eralization. The prospective area, where the poste- or not (Fig. 1). The votes for each predictive map
A. Ford
Figure 3. WofE mineral potential map for (a) the eastern Lachlan Orogen constrained to the Macquarie Arc, and (b) zoomed into the
highly prospective area over the Cadia-Ridgeway porphyry Cu–Au system. The map has been converted to binary prospective/
unprospective for ease of comparison with the RF results, with the threshold for prospectivity being set to posterior probability values
greater than the prior probability. Coordinates shown in MGA Zone 55.
value are tallied across all the randomly generated the training points to be used with class fields that
trees in the forest, and the majority vote for the most represent the integer target class value (i.e., ‘‘de-
popular predictive map value wins. These decisions posit’’ = 1, ‘‘non-deposit’’ = 0); (3) maximum num-
are then aggregated to classify the combination of all ber of trees in the forest; and (4) maximum tree
input predictive maps as prospective or unprospec- depth for each tree in the forest (i.e., the maximum
tive (or, however many target classes are defined in number of rules each tree is allowed to create to
the training data). Comprehensive reviews of using come to a decision). The training accuracy (based on
RF for mineral potential mapping are provided in the confusion matrix) and variable importance were
Rodriguez-Galiano et al. (2014), Carranza and La- evaluated for each model run. The output from the
borte (2015a, b), Carranza and Laborte (2016) and RF training was then used as input for classifying the
McKay and Harris (2016). Based on these ap- input raster into the required number of target
proaches, RF classification and prediction were classes.
implemented using R. Three different approaches for creating the
The implementation for this study set the fol- multi-band image to be classified using the RF
lowing key parameters: (1) a multi-band raster im- algorithm were evaluated in this study in order to
age that includes the set of predictive maps to be determine whether robust and geologically mean-
classified, with each raster image band representing ingful results can be obtained from the machine
one predictive map; (2) training data that represent learning algorithm with differing levels of interven-
Practical Implementation of Random Forest-Based Mineral Potential Mapping
tion from an expert, with a fourth approach used to The third approach was to use the 197 multi-
examine variation of the training data. The training class categorical or non-thresholded numeric maps
data used to train the RF classifier were the 14 de- produced for the WofE analysis that had not had any
posits used as training points used in the WofE favorability criteria applied. These 197 maps were
analysis (class ‘‘deposit’’ = 1) and 14 randomly used to produce a 197-band raster (model MC197),
generated points derived from areas that had a which was used as input for the RF classifier. The
strong likelihood of not being undiscovered deposit aim of this was to let the RF algorithm determine
locations (class ‘‘non-deposit’’ = 0). The out of bag the most appropriate thresholds and the subsequent
(OOB) errors are shown for each implementation in variable importance for each map, and to use the
Figure 4. The maximum number of trees in the output from the classifier to classify the multi-band
forest and the maximum tree depth were set to 501 raster. The training accuracy of the model was
and 5, respectively, for the RF models. The OOB 89.3%, and the resulting mineral potential map
error plots in Figure 4 were assessed to ascertain accurately predicted all of the training data in the
whether the errors had leveled off or whether fur- correct class. However, the prospective area covers
ther improvement was likely if the number of trees 47.7% of the Macquarie Arc study area (Fig. 5c).
were to be increased, and the R default of 500 trees Using the additional 211 porphyry Cu–Au occur-
was increased by 1 to give an odd number to ensure rences resulted in model MC197 accurately pre-
there would be no ties in the majority vote for binary dicting 97.0% validation points in the prospective
classification models. The tree depth was set to 5 area.
after tests using a tree depth of 10 were found to In order to test the impact of ‘‘weighting’’ the
produce larger OOB errors. training data, the 14 deposit training points for the
The first approach was to use the same subset of porphyry Cu–Au mineral system model were
nine binary predictive maps used to produce the merged with a subset of 14 validation points (from
posterior probability map in the WofE analysis. A 9- the total 211) used in the previous WofE analysis.
band raster was created as input for the RF classifier The 14 deposit training points were assigned a value
(model Bin9). The training accuracy of the model of 2 (target class ‘‘highly prospective’’), as these
was 85.7% and resulted in mineral potential map were considered to be more representative evidence
that accurately predicted all training points within of the mineral system. The 14 validation training
the correct output class and produced a prospective points were assigned a value of 1 (target class
area that accounted for 23.4% of the study area ‘‘prospective’’), as they were still considered to be
(Fig. 5a). Using the additional 211 porphyry Cu–Au representative of the porphyry Cu–Au mineral sys-
occurrences for validation, model Bin9 predicted tem, but not as important as the initial 14 training
93.5% of the validation points in the prospective points. Using the same method as previously de-
area. scribed, 14 non-deposit training points were created.
The second approach was to use the binary The non-deposit locations retained the value of 0
classified maps from the WofE analysis (164 maps) previously assigned for training (target class ‘‘un-
that had already been reviewed by an expert in or- prospective’’). This resulted in a training dataset
der to determine whether the thresholds were sta- with 42 total points. Using the same 9-band predic-
tistically valid, geologically meaningful, and whether tive map raster as in model Bin9, the RF classifier
the map was practically useful in helping to reduce was rerun with the weighted training data (model
the exploration search space. All of the binary pre- Bin9Wt). The training accuracy of the model was
dictive maps that met these criteria were combined 69% and resulted in an output map that accurately
to produce a 164-band raster for input into the RF predicted all of the training points within the correct
classifier (model Bin164). The training accuracy of output class. The highly prospective area accounted
the model was 89.3%, and the resulting mineral for 11.7% of the study area, and the prospective area
potential map accurately predicted all of the training (cumulative highly prospective and prospective) ac-
data in the correct class. The prospective area is just counted for 40.4% (Fig. 5d). Using the additional
6.5% of the study area (Fig. 5b). Using the addi- 197 porphyry Cu–Au occurrences (as 14 were fil-
tional 211 porphyry Cu–Au occurrences for valida- tered out for use as additional training data) resulted
tion resulted in model Bin64 accurately predicting in model Bin9Wt accurately predicting 43.3% vali-
88.1% validation points in the prospective area. dation points in the highly prospective area and
A. Ford
Figure 4. Out of bag error rates from training 4 RF models. Bin9 = 9-band binary input raster trained on unranked training data;
Bin164 = 164-band binary input raster trained on unranked training data; MC197 = 197-band multi-class categorical or non-thresholded
numeric input raster trained on unranked training data; Bin9Wt = 9-band binary input raster trained on ranked training data.
54.2% in the prospective area, for an overall 97.5% would typically be considered critical for targeting
efficiency of classification. the mineral system. Comparison with the contrast
values for the 9 maps selected as inputs for the
WofE posterior probability map highlights a number
DISCUSSION of differences (Table 3), with the Ordovician–Sil-
urian intrusions having the 2nd highest contrast va-
The results of the RF mineral potential map- lue, and the oxidized and K-enriched magmas
ping highlight a number of interesting outcomes. having the 2nd lowest out of the 9 input maps in the
The two RF mineral potential maps generated using WofE analysis.
the 9-band input raster images both highlight the The RF classification for the binary 164-band in
importance of the high Au–Cu–Ag–Zn mineral model Bin164 raster highlighted the importance of
occurrence density based on the variable impor- the presence of geochemical anomalies, porphyry-
tance, while much lower variable importance was related mineral occurrences, and K-enriched and
assigned to the Ordovician–Silurian intrusions and oxidized magmas, which confirms ideas relating to
shoshonitic or high-K volcanics (Table 3). The the porphyry Cu–Au mineral system in the eastern
drillhole-rock Cu–Au anomalies and oxidized and Lachlan Orogen (cf. Ford et al. 2019b). In addition,
K-enriched magma both had a high variable the favorable host stratigraphy (Lake Cowal Vol-
importance when the unweighted training data were canic Complex, Gidginbung Volcanics, Forest Reef
used in model Bin9, but a lower importance when Volcanics, Cheesemans Creek Formation, Kenyu
the weighted training data were used in model Formation, Unassigned Ordovician Intru-
Bin9Wt. This is an interesting result from the per- sions—dacite, Unassigned Ordovician Intru-
spective of targeting porphyry Cu–Au mineraliza- sions—Copper Hill Intrusives, Goonumbla
tion in the Macquarie Arc, as these are all maps that Monzonite 26, Mingelo Volcanics) and dominant
Practical Implementation of Random Forest-Based Mineral Potential Mapping
Figure 5. RF mineral potential maps using (a) a 9-band binary input raster trained on unranked training data (model Bin9), (b) a 164-band
binary input raster trained on unranked training data (model Bin164), (c) a 197-band multi-class categorical or non-thresholded numeric
input raster trained on unranked training data (model MC197), and (d) a 9-band binary input raster trained on ranked training data (model
Bin9Wt).
A. Ford
Table 3. Rankings for nine input maps used in WofE and RF mineral potential maps
WofE rankings are based on each mapÕs contrast value, and RF rankings are based on each mapÕs variable importance
lithology (andesite, intermediate composition ig- particular, the Ordovician–Silurian intrusions and
neous rock, mafic igneous rock, dacite, mudstone, the shoshonitic or high-K volcanics which are
tonalite, monzonite) based on WofE reclassifications understood to be key targeting criteria (Ford et al.
had high variable rankings. Interestingly, the 2019b), ranked relatively poorly in all RF models.
shoshonitic or high-K volcanics ranked 54th, while Although surprising, this result does not appear to
the Ordovician–Silurian intrusions ranked only 160 have negatively impacted the mineral potential
out of 164. maps, due to the high variable importance assigned
Classification of the 197-band multi-class cate- to more detailed magma fertility maps.
gorical or non-thresholded numeric raster in model Notably, as the complexity of the RF classifi-
MC197 also highlighted the importance of the cation increased, either through an increased num-
presence of geochemical anomalies, porphyry-re- ber of binary predictive maps or weighting of the
lated mineral occurrences, and fertile magmas, as training data, the predictive capacity of the mineral
well as Silurian basins over the Macquarie Arc. potential map improved. While the WofE analysis
However, the magma fertility parameters such as produced a prospective area of 15.2%, model Bin9
average Mg*, average Ce, and average Na2O/K2O was 23.4%, model Bin164 was 6.5%, and model
notably ranked higher than K-enrichment or oxi- Bin9Wt was 11.7%. The mineral potential map for
dized magmas, which would typically be considered model MC197 in Figure 5c with a prospective area
critical targeting criteria for the Macquarie Arc of 47.7% effectively just maps the distribution of the
porphyry Cu–Au mineral system. As with the 164- Benambran Cycle magmatic units, regardless of
band binary raster, the 197-band raster confirmed their magma fertility. It is unsuccessful at narrowing
that the Ordovician–Silurian intrusions and the down the exploration search space, as the models
shoshonitic or high-K volcanics ranked relatively with binary input maps do, despite the five highest
poorly (69th and 173rd out of 197, respectively). variable importanceÕs for model MC197 being at-
The RF-based mineral potential models using tributed to Mo drillhole-rock chip anomalies, Au
the binary inputs (models Bin9, Bin164, and mineral occurrence density, Ag drillhole-rock chip
Bin9Wt) each highlight the importance of key geo- anomalies, Silurian basins over the Macquarie Arc,
chemical and magma fertility predictors that are and Cu–Au drillhole-rock chip anomalies.
relevant to the porphyry Cu–Au mineral system in Validation of all mineral potential maps was
the Macquarie Arc based on the variable impor- undertaken using a separate subset of porphyry Cu–
tance rankings. This result confirms ideas relating to Au occurrences that were not used to train either
key targeting criteria for the mineral system, such as the WofE or RF models. Each map showed a strong
the presence of oxidized and K-enriched magmas, predictive efficiency determined by how well the
Cu–Au anomalies in geochemistry, and proximity to map classified the validation points as prospective or
porphyry-related mineral occurrences such as skarns not, with the strongest results being achieved by
and epithermals. Notably however, is that several model MC197 and Bin9Wt. However, it must be
predictive maps that were expected to rank highly noted that the extent of prospective area in each of
due to their relevance to the mineral system had these two models is more than double that of any of
relatively poor variable importance rankings. In the other models run. When only the highly
Practical Implementation of Random Forest-Based Mineral Potential Mapping
prospective area for model Bin9Wt is considered, its tions. Although Carranza and Laborte (2015b) show
predictive efficiency is much lower. that mineral potential mapping using RF can be
All RF models have mapped discrete prospec- successfully implemented using even fewer training
tive areas away from the known porphyry Cu–Au points, the dimensionality of the input map to be
deposits and occurrences used as training data as classified beyond which the reduction in prospective
well as reducing the exploration search space area remains meaningful for mineral exploration
(Fig. 6). However, it is acknowledged that this targeting, and where it simply becomes an artifact of
reduction in the prospective area may be at least in over-fitting due to insufficient training data is a
part due to over-fitting to the known deposits used challenge to quantify. One way to overcome the
as training data in the RF, although how much effect ‘‘curse of dimensionality’’ may be to reduce the
it has cannot be readily quantified. Over-fitting using dimension of the problem by combining multiple
WofE is typically limited by an expert carefully predictive maps (i.e., with principal component
selecting input maps that minimizes duplication of analysis or expert advice). However, this can only
map patterns when producing the posterior proba- reduce the problemÕs dimensionality so far before
bility map (Ford et al. 2019a). However, the use of a combining maps becomes geologically nonsensical
large number of inputs using RF in order to allow and risks losing detailed information that may help
the machine learning to determine the variable characterize the mineralization (cf. Porwal et al.
importance for each predictive map and better uti- 2001). Since economically significant mineral de-
lize all of the data rather than just a small subset, can posits are rare, even in world-class metallogenic
still result in over-fitting due to the limited amount districts, producing a training dataset that contains a
of training data used in mineral potential mapping sufficient number of economic deposits that can
due to the ‘‘curse of dimensionality’’ (Rodriguez- overcome this ‘‘curse of dimensionality’’ appears to
Galiano et al. 2015). This is in contrast to Breiman represent a constraint on using machine learning on
(2001), who suggests that as the number of trees in a large set of predictive maps for the purposes of
the forest increases, the OOB error converges even mineral potential mapping.
without pruning the trees in the forest and that over- The relatively poor training accuracy for the
fitting is not a problem due to the ensemble learning weighted or ranked training data in model Bin9Wt
and randomization used in the method. The (69%) is interpreted to be caused by the misclassi-
diverging opinions on RF over-fitting in published fication of the validation data in the confusion ma-
literature may be theoretical versus practical in trix. Only half of the 14 validation training points
nature, with over-fitting being a practical challenge were correctly classified as ‘‘prospective’’ during
due to the scarcity of training data rather than a training, with the others being classified as ‘‘highly
theoretical problem when an ideal number of prospective’’. This suggests that some of the vali-
training data are available. Over-fitting is an issue dation training data may be more important repre-
that is common in high-dimensional feature space sentations of the porphyry Cu–Au mineral system in
with limited training data, for example, where the the Macquarie Arc than is currently understood.
problem involves a large number of predictive maps The analysis clearly demonstrated that the input
with each map having a range of possible values as in file size and time taken to produce each of the
mineral potential mapping. It has been suggested mineral potential maps using WofE and RF in-
that, ideally, several training samples should exist creased significantly with the complexity of the input
for each unique combination of values, however as to be classified. The RF in this study was successfully
this is often impractical, that at least five training implemented on a desktop PC using R. However,
samples should exist for each dimension initial attempts were made to undertake the process
(Koutroumbas and Theodoridis 2008). For a 9-band using inbuilt functionality in both ArcGIS (ESRI
binary input map, there may be up to 512 (29) un- 2019) and a standalone Python script (it is noted that
ique combinations present, so ideally around 1500 the ArcGIS RF implementation is Python under the
training points should be used. However, taking the hood). Both of these RF implementations failed
suggestion of five training examples per dimension when either the 164-band binary raster image or the
(i.e., per raster band), only 45 training samples 197-band multi-class categorical or non-thresholded
would be required. For the 197-band multi-class numeric raster image was used as input using a
categorical or non-thresholded numeric input map, desktop PC processing environment. More success
that would require a minimum of 985 training loca- may be achieved using these Python-based RF
A. Ford
Practical Implementation of Random Forest-Based Mineral Potential Mapping
b Figure 6. RF mineral potential maps zoomed into the highly used to define the classification thresholds for the
prospective Cadia-Ridgeway porphyry Cu–Au system predictive maps. For example, as in this study, the
highlighting the reduction in prospective area using (a) a 9-band
thresholds or classifications could be determined
binary input raster trained on unranked training data (model
Bin9), (b) a 164-band binary input raster trained on unranked through WofE based on contrast and/or studentized
training data (model Bin164), (c) a 197-band multi-class contrast values to produce binary maps. An alter-
categorical or non-thresholded numeric input raster trained on native method may be to define thresholds or clas-
unranked training data (model MC197), and (d) a 9-band binary sifications subjectively based on expert knowledge of
input raster trained on ranked training data (model Bin9Wt).
the mineral system (i.e., predictive maps with three
classes, namely favorable/permissive/unfavorable).
Further implementations of the RF for mineral
implementations in a supercomputing or cloud- potential mapping may include more effectively
based processing environment, which are not com- weighting or ranking the training data by size or
monly available to the mineral exploration industry. resources where a sufficient number of training
The results of the different RF implementations points are available to represent each size or re-
indicate that results that are more robust are ob- source class for training the RF classifier. The chal-
tained when the binary classified maps that have lenge with weighting or ranking the training data
been reviewed by an expert are used as input and and creating separate target classes based on deposit
are in agreement with the conclusions reached by size or resources is that there are typically very few
Harris et al. (2006) and Harris and Sanborn-Barrie large economically significant deposits and many
(2006). While RF allows the user to evaluate the smaller ones (cf. Guj et al. 2011; Harris et al. 2006).
relative importance of each predictive map in the The lack of sufficient larger and more economically
multi-band raster through the variable importance, significant deposits presents a challenge for training
the use of the multi-class categorical or non-thresh- the RF model, particularly when the mineral
olded numeric maps as inputs means that, although potential mapping is undertaken at a regional to
based on the majority vote, a particular threshold or district scale. However, this may be more practical at
classification may produce the best result, it may not a continental scale. This would provide additional
be geologically meaningful or help to reduce the benefit in using RF rather than WofE, which treats
exploration search space. This is analogous to the all training data equally, and produce a mineral
problem in WofE where the best statistically derived potential map that may more effectively target the
threshold based on the contrast and/or studentized larger deposits for mineral exploration purposes (cf.
contrast values may not make geological sense in the Hronsky and Kreuzer 2019; Yousefi et al. 2019).
context of the mineral system being modeled (Ford
et al. 2019a).
Although only the RF machine learning meth- CONCLUSIONS
od was implemented in this study, mineral potential
mapping using neural networks faces the same lim- Different implementations of RF confirm that
itations. However, the main advantage of using RF increasing complexity of the inputs improved the
over neural networks is the capacity to deal with predictive capacity of the mineral potential maps for
missing data, and the ability of an expert to review porphyry Cu–Au mineralization in the Macquarie
the variable importance for each predictive map Arc of the eastern Lachlan Orogen when expert
included in the analysis (Carranza and Laborte review was used to ascertain meaningful thresholds
2015b). and classifications for the input predictive maps.
Based on the results of this study, it is suggested However, the results also highlight that the main
that robust implementations of the RF algorithm for limitation of using machine learning for mineral
mineral potential mapping should use (re)classified potential mapping is the lack of sufficient numbers
predictive maps rather than multi-class categorical of economically significant deposits with which to
or non-thresholded numeric maps as inputs. This train a large number of input predictive maps, likely
minimizes the risk of producing output mineral resulting in over-fitting of the model to at least some
potential maps that do not make geological sense or degree. The use of multi-class categorical or non-
that do not provide a practically useful result for thresholded numeric predictive maps that had no
exploration targeting. Different approaches could be favorability criteria applied did not produce a useful
output for mineral exploration targeting.
A. Ford
Using WofE analysis to determine statistically Carranza, E. J. M. (2014). Data-driven evidential belief modeling
of mineral potential using few prospects and evidence with
valid, geologically meaningful, and practically useful missing values. Natural Resources Research, 24, 291–304.
thresholds and reclassifications for the input maps, Carranza, E. J. M., & Hale, M. (2002). Evidential belief functions
combined with RF to produce the output mineral for data-driven geologically constrained mapping of gold
potential, Baguio district, Philippines. Ore Geology Reviews,
potential map has produced results that can be used 22, 117–132.
effectively for mineral exploration targeting. The Carranza, E. J. M., & Laborte, A. G. (2015a). Data-driven pre-
results of the study clearly demonstrate that an dictive mapping of gold prospectivity, Baguio district,
Philippines: Application of random forests algorithm. Ore
exploration geologist should review all of the out- Geology Reviews, 71, 777–787.
puts at each possible stage of a data-driven mineral Carranza, E. J. M., & Laborte, A. G. (2015b). Random forest
potential mapping analysis in order to ensure the predictive modeling of mineral prospectivity with small
number of prospects and data with missing values in Abra
results make geological sense. (Philippines). Computers and Geosciences, 74, 60–70.
Carranza, E. J. M., & Laborte, A. G. (2016). Data-driven pre-
dictive modeling of mineral prospectivity using random for-
ests: A case study in Catanduanes Island (Philippines).
Natural Resources Research, 25, 35–50.
ACKNOWLEDGMENTS Carranza, E. J. M., Woldai, T., & Chikambwe, E. M. (2005).
Application of data-driven evidential belief functions to
prospectivity mapping for aquamarine-bearing pegmatites,
The author would like to thank the Geological Lundazi District, Zambia. Natural Resources Research, 14,
Survey of New South Wales for financial and in-kind 47–63.
support for the WofE mineral potential mapping Chung, C.-J. F., & Fabbri, A. G. (2003). Validation of spatial
prediction models for landslide hazard mapping. Natural
presented in this paper. Thanks also go to the many Hazards, 30, 451–472.
geologists in industry, government, and academia Crawford, A. J., Meffre, S., Squire, R. J., Barron, L. M., & Fal-
who have contributed to the data collection and loon, T. J. (2007). Middle and Late Ordovician magmatic
evolution of the Macquarie Arc, Lachlan Orogen, New South
mineral system ideas used in this study. Two Wales. Australian Journal of Earth Sciences, 54, 181–214.
anonymous reviewers and the editor are also ESRI. (2019). ArcGIS Pro Release 2.4. Redlands, California.
thanked for their comments, which helped to im- Fallon, M., Porwal, A., & Guj, P. (2010). Prospectivity analysis of
the plutonic Marymia greenstone belt, Western Australia.
prove the quality of the manuscript. Ore Geology Reviews, 38, 208–218.
Ford, A., & Hart, C. J. R. (2013). Mineral potential mapping in
frontier regions: A Mongolian case study. Ore Geology Re-
views, 51, 15–26.
Ford, A., Miller, J. M., & Mol, A. G. (2015). A comparative
REFERENCES analysis of weights of evidence, evidential belief functions,
and fuzzy logic for mineral potential mapping using incom-
plete data at the scale of investigation. Natural Resources
Agterberg, F. P. (2011). A modified weights-of-evidence method
Research, 25, 19–33.
for regional mineral resource estimation. Natural Resources
Ford, A., Peters, K., Greenfield, J., Blevin, P., Downes, P., Fitz-
Research, 20, 95–101.
herbert, J., & Simpson, B. (2019b). Eastern Lachlan Orogen
Agterberg, F. P., & Cheng, Q. (2002). Conditional independence
mineral potential data package first edition [Digital Dataset].
test for weights-of-evidence modeling. Natural Resources
Geological survey of New South Wales, Maitland. https://sea
Research, 11, 249–255.
rch.geoscience.nsw.gov.au/product/9253. Accessed 17 Oct
Blevin, P. L. (2002). The petrographic and compositional char-
2019..
acter of variably K-enriched magmatic suites associated with
Ford, A., Peters, K. J., Partington, G. A., Blevin, P. L., Downes, P.
Ordovician porphyry Cu–Au mineralisation in the Lachlan
M., Fitzherbert, J. A., et al. (2019a). Translating expressions
Fold Belt, Australia. Mineralium Deposita, 37, 87–99.
of intrusion-related mineral systems into mappable spatial
Bonham-Carter, G. (1994). Geographic information systems for
proxies for mineral potential mapping: Case studies from the
geoscientists: Modelling with GIS. Oxford: Pergamon Press.
Southern New England Orogen, Australia. Ore Geology
Bougrain, L., Gonzalez, M., Bouchot, V., Cassard, D., Lips, A. L.
Reviews, 111, 102943.
W., Alexandre, F., et al. (2003). Knowledge recovery for
Forster, D. B., Carr, G. A., & Downes, P. M. (2011). Lead isotope
continental-scale mineral exploration by neural networks.
systematics of ore systems of the Macquarie Arc—Implica-
Natural Resources Research, 12, 173–181.
tions for arc substrate. Gondwana Research, 19, 686–705.
Breiman, L. (2001). Random forests. Machine Learning, 24, 123–
Fox, N., Cooke, D. R., Harris, A., Collett, D., & Eastwood, G.
140.
(2015). Porphyry Au–Cu mineralization controlled by reac-
Brown, W. M., Gedeon, T. D., & Groves, D. I. (2003). Use of
tivation of an arc-transverse volcanosedimentary subbasin.
noise to augment training data: A neural network method of
Geology, 43, 811–814.
mineral-potential mapping in regions with limited known
Fung, C. C., Iyer, V., Brown, W., & Wong, K. W. (2005). Com-
deposit examples. Natural Resources Research, 12, 141–152.
paring the performance of different neural networks archi-
Brown, W. M., Gedeon, T. D., Groves, D., & Barnes, R. G.
tectures for the prediction of mineral prospectivity. In
(2000). Artificial neural networks: A new method for mineral
Proceedings of the fourth international conference on machine
prospectivity mapping. Australian Journal of Earth Sciences,
learning and cybernetics, Guangzhou (pp. 394–398).
47, 757–770.
Practical Implementation of Random Forest-Based Mineral Potential Mapping
Geological Survey of New South Wales. (2019). NSW MetIndEx around the Huritz Group and Nueltin Suite, Nunavut, Ca-
(metallic, industrial mineral and exploration discoveries) da- nada. Natural Resources Research, 25, 125–143.
tabase [Digital Dataset]. Maitland: Geological Survey of New Nykänen, V. (2008). Radial basis functional link nets used as a
South Wales. prospectivity mapping tool for orogenic gold deposits within
Glen, R. A., Crawford, A. J., & Cooke, D. R. (2007). Tectonic the Central Lapland Greenstone Belt, Northern Fennoscan-
setting of porphyry Cu–Au mineralization in the Ordovi- dian Shield. Natural Resources Research, 17, 29–48.
cian—Early Silurian Macquarie Arc, Eastern Lachlan Oro- Partington, G. (2010). Developing models using GIS to assess
gen, New South Wales. Australian Journal of Earth Sciences, geological and economic risk: An example from VMS copper
54, 465–479. gold mineral exploration in Oman. Ore Geology Reviews, 38,
Glen, R. A., Saeed, A., Quinn, C. D., & Griffin, W. L. (2011). U– 197–207.
Pb and Hf isotope data from zircons in the Macquarie Arc, Porwal, A., Carranza, E. J. M., & Hale, M. (2001). Extended
Lachlan Orogen: Implications for arc evolution and Ordovi- weights-of-evidence modelling for predictive mapping of
cian palaeogeography along part of the east Gondwana base metal deposit potential in Aravalli Province, Western
margin. Gondwana Research, 19, 670–685. India. Exploration and Mining Geology, 10, 273–287.
Guj, P., Fallon, M., McCuaig, T. C., & Fagan, R. (2011). A time- Porwal, A., Carranza, E. J. M., & Hale, M. (2003). Artificial
series audit of ZipfÕs Law as a measure of terrane endowment neural networks for mineral-potential mapping: A case study
and maturity in mineral exploration. Economic Geology, 106, from Aravalli Province, Western India. Natural Resources
241–259. Research, 12, 155–171.
Hariharan, S., Tirodkar, S., Porwal, A., Battacharya, A., & Joly, Reddy, R. K. T., & Bonham-Carter, G. F. (1991). A decision-tree
A. (2017). Random forest-based prospectivity modelling of approach to mineral potential mapping in Snow Lake area,
greenfield terrains using sparse deposit data: An example Manitoba. Canadian Journal of Remote Sensing, 17, 191–200.
from the Tanami Region, Western Australia. Natural Re- Rodriguez-Galiano, V. F., Chica-Olmo, M., & Chica-Rivas, M.
sources Research, 26, 489–507. (2014). Predictive modelling of gold potential with the inte-
Harris, J. R., & Sanborn-Barrie, M. (2006). Mineral potential gration of multisource information based on random forest:
mapping: Examples from the Red Lake greenstone belt, A case study on the Rodalquilar area, Southern Spain. In-
northwest Ontario. In J. R. Harris (Ed.), GIS for the earth ternational Journal of Geographical Information Science, 28,
sciences (pp. 1–21). London: Geological Association of Ca- 1336–1354.
nada Special Publication 44. Rodriguez-Galiano, V., Sanchez-Castillo, M., Chica-Olmo, M., &
Harris, J. R., Sanborn-Barrie, M., Panagapko, D. A., Skulski, T., Chica-Rivas, M. (2015). Machine learning predictive models
& Parker, J. R. (2006). Gold prospectivity maps of the Red for mineral prospectivity: An evaluation of neural networks,
Lake greenstone belt: Application of GIS technology. random forest, regression trees and support vector machines.
Canadian Journal of Earth Sciences, 43, 865–893. Ore Geology Reviews, 71, 804–818.
Harris, D., Zurcher, L., Stanley, M., Marlow, J., & Pan, G. (2003). Sagiroglu, S., & Sinanc, D. (2013). Big data: A review. In Pro-
A comparative analysis of favorability mappings by weights ceedings of the international conference on collaboration
of evidence, probabilistic neural networks, discriminant technologies and systems, San Diego, California (pp. 42–47).
analysis, and logistic regression. Natural Resources Research, Singer, D. A., & Kouda, R. (1999). A Comparison of the weights
12, 241–255. of evidence method and probabilistic neural networks. Nat-
Hronsky, J. M. A., & Kreuzer, O. P. (2019). Applying spatial ural Resources Research, 8, 287–298.
prospectivity mapping to exploration targeting: Fundamental Wang, J., Zuo, R., & Xiong, Y. (2019). Mapping mineral
practical issues and suggested solutions for the future. Ore prospectivity via semi-supervised random forest. Natural
Geology Reviews, 107, 647–653. Resources Research. https://doi.org/10.1007/s11053-019-09510
Joly, A., Porwal, A., & McCuaig, T. C. (2012). Exploration tar- -8.
geting for orogenic gold deposits in the Granites-Tanami Xiong, Y., Zuo, R., & Carranza, E. J. M. (2018). Mapping mineral
Orogen: Mineral system analysis, targeting model and prospectivity through big data analytics and a deep learning
prospectivity analysis. Ore Geology Reviews, 48, 349–383. algorithm. Ore Geology Reviews, 102, 811–817.
Knox-Robinson, C. M., & Wyborn, L. A. I. (1997). Towards a Yousefi, M., Kreuzer, O. P., Nykänen, V., & Hronsky, J. M. A.
holistic exploration strategy: Using geographic information (2019). Exploration information systems—A proposal for the
systems as a tool to enhance exploration. Australian Journal future use of GIS in mineral exploration targeting. Ore
of Earth Sciences, 44, 453–463. Geology Reviews, 111, 103005.
Koutroumbas, K., & Theodoridis, S. (2008). Pattern recognition. Zhang, Z., Zuo, R., & Xiong, Y. (2016). A comparative study of
Amsterdam: Elsevier. fuzzy weights of evidence and random forests for mapping
McKay, G., & Harris, J. R. (2016). Comparison of the data-driven mineral prospectivity for skarn-type Fe deposits in the
random forests model and a knowledge-driven method for southwestern Fujian metallogenic belt, China. Science China
mineral prospectivity mapping: A case study for gold deposits Earth Sciences, 59, 556–572.