Knowledge Discovery from Geochemical Data with Supervised and Unsupervised Methods
Cevik, S. Ilkay
Olivo, Gema Ribeiro
Ortiz, Julian M.
As mineral exploration activities tend to aim at deeper targets, costs per discoveries are getting higher. Therefore, the effective utilization of all the available data is critical in the decision-making process. In recent years, several different machine learning algorithms (MLA) have emerged and have been adopted by the mineral explorers because of their ability to identify the multidimensional relationship between evidential features. The Random Forests (RF) algorithm is an MLA which is presented by many studies as a practicable technique for classification mostly because of its simplicity in terms of understanding the internal decision-making criteria, i.e. identification of feature importance, ability to handle missing data and to overcome overfitting. RF is an assembly of decision trees that have demonstrated good performance as compared to other classification and regression methods. In this paper, we present an exploratory study, where the random forest is used both in supervised and unsupervised mode along with principal component analysis over a dataset of geochemical analysis, to provide insights about the potential of mineralization in the Vazante District in Brazil. The paper is aimed at demonstrating a workflow of an application of advanced statistical methods and MLA to rock geochemistry data for classification purposes to provide geological insights about the underlying processes which may help to identify similar deposits in different parts of the earth.