Which method to use? An assessment of data mining methods in Environmental Data Science

Data Mining (DM) is a fundamental component of the Data Science process. Over recent years a huge library of DM algorithms has been developed to tackle a variety of problems in fields such as medical imaging and traffic analysis. Many DM techniques are far more flexible than more classical numerial simulation or statistical modelling approaches. These could be usefully applied to data-rich environmental problems. Certain techniques such as artificial neural networks, clustering, case-based reasoning or Bayesian networks have been applied in environmental modelling, while other methods, like support vector machines among others, have yet to be taken up on a wide scale. There is greater scope for many lesser known techniques to be applied in environmental research, with the potential to contribute to addressing some of the current open environmental challenges. However, selecting the best DM technique for a given environmental problem is not a simple decision, and there is a lack of guidelines and criteria that helps the data scientist and environmental scientists to ensure effective knowledge extraction from data. This paper provides a broad introduction to the use of DM in Data Science processes for environmental researchers. Data Science contains three main steps (pre-processing, data mining and post-processing). This paper provides a conceptualization of Environmental Systems and a conceptualization of DM methods, which are in the core step of the Data Science process. These two elements define a conceptual framework that is on the basis of a new methodology proposed for relating the characteristics of a given environmental problem with a family of Data Mining methods. The paper provides a general overview and guidelines of DM techniques to a non-expert user, who can decide with this support which is the more suitable technique to solve their problem at hand. The decision is related to the bidimensional relationship between the type of environmental system and the type of DM method. An illustrative two way table containing references for each pair Environmental System-Data Mining method is presented and discussed. Some examples of how the proposed methodology is used to support DM method selection are also presented, and challenges and future trends are identified.

Informació addicional

  • Any: 2018
  • Autors: Gibert, K., Izquierdo, J., Sànchez-Marrè, M., Hamilton, S.H., Rodríguez-Roda, I., Holmes, G.
  • Referència: Environmental Modelling and Software Volume 110, December 2018, Pages 3-27

Cercar articles

Nom/Títol

Any

Autors

Laboratori d’Enginyeria Química i Ambiental

Institut de Medi Ambient
Universitat de Girona
Campus Montilivi
17003 Girona

Parc Científic i Tecnològic de la UdG
Edifici Jaume Casademont, Porta B
Pic de Peguera, 15
17003 Girona
Tel. +34 972 41 98 59
info.lequia@udg.edu

 

Cercar

Xarxes socials

Segueix-nos a ...

Facebook Twitter Youtube Linkedin

NOTE! This site uses cookies and similar technologies. If you not change browser settings, you agree to it. Cookie Policy