Advances in Heap Leach Pad Surface Moisture Mapping ...

i

Advances in Heap Leach Pad Surface Moisture Mapping

using Unmanned Aerial Vehicle Technology

and Aerial Remote Sensing Imagery

by

Mingliang Tang

A thesis submitted in conformity with the requirements

for the degree of Master of Applied Science

Graduate Department of Civil and Mineral Engineering

University of Toronto

© Copyright by Mingliang Tang 2020

ii

Advances in Heap Leach Pad Surface Moisture Mapping

using Unmanned Aerial Vehicle Technology

and Aerial Remote Sensing Imagery

Mingliang Tang

Master of Applied Science

Graduate Department of Civil and Mineral Engineering

University of Toronto

2020

Abstract

As easily accessible high-grade mineral reserves are depleting, heap leaching (HL) is gaining an

increased interest in the mining industry due to its economic feasibility for processing low-grade

ores. For HL operations, monitoring heap leach pad (HLP) surface moisture distribution is

essential to ensure optimal leaching conditions and to achieve a high metal recovery. Conventional

monitoring methods rely on manual sampling and naked-eye observation by technical staff, which

are labour-intensive and expose personnel to hazardous leaching reagents frequently. To

complement the conventional approaches, the use of unmanned aerial vehicles (UAVs) combined

with aerial imaging techniques can acquire representative data depicting the moisture status across

the HLP surface. This thesis presents a practical framework for HLP surface moisture monitoring,

consisting of UAV-based data collection and advanced data analytics to generate HLP surface

moisture maps, which provide direct visualization of the surface moisture distribution and are

effective tools to streamline the HLP monitoring process.

iii

Acknowledgments

The work presented in this thesis would not have been possible without the effort and support of

many brilliant and generous individuals. First, I would like to thank my supervisor, Professor

Kamran Esmaeili, for the constructive guidance and encouragement. Kamran, thank you so much

for granting me the opportunity to work on this exciting and meaningful project while giving me

the freedom in conducting my research. I have learned much from you and been inspired by your

high standards and professional integrity. To my co-supervisor, Professor Angela Schoellig, thank

you for sharing the laboratory for conducting my experiments and providing all the insightful and

helpful comments and suggestions. Special thanks to my colleagues, Thomas Bamford and Filip

Medinac, who have provided me tremendous assistance and support to my project. I am also

thankful to other members in the Mine Modeling & Analytics Lab and Dynamic Systems Lab for

providing ideas, support, and discussion.

I am grateful to McEwen Mining Inc. for supporting the project and making the site available for

field experiment and data collection. I would also like to thank the financial support provided by

Natural Science and Engineering Research Council of Canada (NSERC), University of Toronto,

and Vector Institute.

Last but not least, I sincerely appreciate and thank all the love, encouragement, and limitless

patience from my family and friends. Thank You!

iv

Table of Contents

Acknowledgments.......................................................................................................................... iii

Table of Contents ........................................................................................................................... iv

List of Tables ................................................................................................................................. vi

List of Figures ............................................................................................................................... vii

List of Abbreviations .......................................................................................................................x

Chapter 1 Introduction .....................................................................................................................1

Introduction and Motivation .......................................................................................................1

1.1 Research Objectives .............................................................................................................3

1.2 Thesis Outline ......................................................................................................................3

Chapter 2 Literature Review ............................................................................................................5

Background Information and Literature Review ........................................................................5

2.1 Heap Leaching .....................................................................................................................5

2.2 Data Acquisition Using Unmanned Aerial Vehicle in Mining Environments ..................12

2.3 Moisture Estimation Using Remote Sensing .....................................................................18

2.4 Thermal Infrared Remote Sensing .....................................................................................22

2.5 Deep Learning and Convolutional Neural Networks.........................................................32

2.6 Convolutional Neural Network Based Surface Water and Moisture Recognition and

Monitoring .........................................................................................................................53

Chapter 3 Field Data Collection ....................................................................................................56

Field Experiment and Data Acquisition ....................................................................................56

3.1 Site Information .................................................................................................................56

3.2 Equipment ..........................................................................................................................57

3.3 Field Experiment and Data Collection ...............................................................................59

Chapter 4 Surface Moisture Mapping Based on Thermal Imaging ...............................................62

Mapping Heap Leach Pad Surface Moisture Distribution Based on Thermal Imaging ...........62

v

4.1 Overview ............................................................................................................................62

4.2 Data Preprocessing.............................................................................................................63

4.3 Linear Regression Model Development ............................................................................64

4.4 Orthomosaics Generation...................................................................................................67

4.5 Moisture Maps Generation ................................................................................................67

4.6 Discussion and Conclusion ................................................................................................70

Chapter 5 Surface Moisture Mapping Using Convolutional Neural Networks .............................77

Mapping Heap Leach Pad Surface Moisture Distribution Using Convolutional Neural

Networks ...................................................................................................................................77

5.1 Overview and Methodology ..............................................................................................77

5.2 Data Preparation.................................................................................................................79

5.3 Classification-Based Heap Leach Pad Surface Moisture Mapping ...................................96

5.4 Segmentation-Based Heap Leach Pad Surface Moisture Mapping .................................118

5.5 Discussion and Conclusion ..............................................................................................127

Chapter 6 Conclusion ...................................................................................................................131

Conclusion, Recommendation, and Future Work ...................................................................131

6.1 Major Contributions .........................................................................................................134

6.2 Future Work .....................................................................................................................135

Bibliography ................................................................................................................................136

vi

List of Tables

Table 3-1: Thermal and digital cameras specifications ...............................................................................58

Table 3-2: Details of flight missions for phase two of the field experiment ...............................................59

Table 3-3: The number of colour and thermal collected during the field experiment*................................61

Table 5-1: The number of remote sensing data collected during the field experiment* ..............................79

Table 5-2: The number of tiles generated from each overview raster* ........................................................91

Table 5-3: Summarization of dataset statistics for the classification task ...................................................94

Table 5-4: Summarization of dataset statistics for the segmentation task ...................................................95

Table 5-5: Frequency and percentage of the number of classes contained per segmentation example* .....95

Table 5-6: Architecture of ResNet50 ........................................................................................................100

Table 5-7: The modified MobileNetV2 architecture employed in this study ............................................104

Table 5-8: Comparison of computer specifications ...................................................................................107

Table 5-9: Network architecture of MobileNetV2 A* ...............................................................................110

Table 5-10: Network architecture of MobileNetV2 B* .............................................................................111

Table 5-11: Evaluation results of the final classification models on the test set .......................................112

Table 5-12: Performance for the modified U-Net models on the segmentation dataset. ...........................123

Table 5-13: Evaluation results of the final segmentation model on the test set. .......................................123

vii

List of Figures

Figure 2-1: Illustration of a typical heap leach flow sheet. ...........................................................................6

Figure 2-2: Three main types of heap leach pad configurations. ..................................................................9

Figure 2-3: Illustration of overlaps and flight lines for heap leach pad photogrammetry data collection ...17

Figure 2-4: Blackbody radiation curves at various temperatures. ...............................................................24

Figure 2-5: Spectral radiant exitance of a) water, b) Granite, and c) Dunite in 0-25 μm region at 350 K

compared to a blackbody at the same temperature. .....................................................................................26

Figure 2-6: Atmospheric absorption effect in the 0-15 μm region of the electromagnetic spectrum. Notice

the existence of atmospheric windows in 3-5 μm and 8-14 μm regions. ....................................................27

Figure 2-7: Illustration of thermal crossovers and relative diurnal radiant temperature of water versus dry

soils and rocks. ............................................................................................................................................31

Figure 2-8: Typical relationship between model capacity and error. ..........................................................34

Figure 2-9: Summarization of the development of a deep learning model using supervised learning. .......34

Figure 2-10: Illustration of a one-hidden-layer multilayer perceptrons as a directed acyclic graph. ..........35

Figure 2-11: Illustration of a one-hidden-layer MLP with four units in the hidden layer. ..........................36

Figure 2-12: Illustration of the identity, rectified linear unit (ReLU), and leaky rectified linear unit

(LReLU, α = 0.1) activation functions. ......................................................................................................38

Figure 2-13: Illustration of a typical convolutional neural network (CNN) architecture. ...........................39

Figure 2-14: An example of 2D convolution followed by a nonlinear ReLU activation function. .............40

Figure 2-15: Comparison of the number of connections between a convolutional layer (top) and a fully

connected layer (bottom) with the same input and output dimensions.. .....................................................41

Figure 2-16: Illustration of spatial max pooling and average pooling. ........................................................43

Figure 2-17: Illustration of global minimum, local minimum and saddle point. ........................................46

Figure 2-18: Illustration of the forward propagation through a feedforward network using dropout. ........50

viii

Figure 3-1: Location of the El Gallo mine. .................................................................................................56

Figure 3-2: Material particle size distribution of the studied heap leach pad ..............................................57

Figure 3-3: Equipment used during the field experiment ............................................................................58

Figure 3-4: Flight mission 2 and locations of ground control points with respect to the heap leach pad. ..60

Figure 4-1: General workflow of the data processing and moisture map generation. .................................63

Figure 4-2: Visual comparison example between initial and processed thermal images. ...........................64

Figure 4-3: Determination of the remotely sensed surface temperature at a sampling location. ................65

Figure 4-4: (a) Empirically derived univariate linear regression between gravimetric moisture and

remotely sensed surface temperature; (b) Predicted vs. measured gravimetric moisture content (%). .......66

Figure 4-5: Generated orthomosaics of the HLP by using the acquired thermal image datasets. ...............68

Figure 4-6: Generated moisture maps by using the orthomosaics and the linear regression model. ...........69

Figure 4-7: Illustration of the Sun’s positions related to the HLP (not to scale). ........................................73

Figure 5-1: Schematic illustration of the moisture map generation workflow by using a classification

model (upper) and a segmentation model (lower). ......................................................................................78

Figure 5-2: (a) The generated point cloud without GPS information was not adequately oriented. (b) The

generated point cloud with GPS information was appropriately positioned. ..............................................82

Figure 5-3: Generated colour orthomosaics for the top two lifts of the HLP by using the acquired visible-

light image datasets. ....................................................................................................................................83

Figure 5-4: Generated colour orthomosaics for the whole HLP by using the visible-light image datasets.84

Figure 5-5: Generated thermal orthomosaics for the top two lifts of the HLP by using the acquired thermal

image datasets. .............................................................................................................................................85

Figure 5-6: Illustration of the feature correspondences over the colour and thermal orthomosaics. ..........87

Figure 5-7: Generation of a four-channel raster by overlaying a colour orthomosaic over a remotely

sensed surface temperature map of the heap leach pad. ..............................................................................89

ix

Figure 5-8: The three steps of the deep learning datasets construction process. .........................................90

Figure 5-9: The modified AlexNet architecture employed in this study. ....................................................97

Figure 5-10: (a) A plain convolutional (Conv) block with two Conv layers. (b) A basic building block of

residual learning. .........................................................................................................................................99

Figure 5-11: (a) An original residual block (b) A bottleneck residual block. ...........................................100

Figure 5-12: Illustration of the differences between a classical bottleneck residual block and an inverted

residual block with linear bottleneck. ........................................................................................................101

Figure 5-13: Comparison between regular, depthwise, and pointwise convolution. .................................103

Figure 5-14: : Inner structure of the inverted residual blocks. ..................................................................104

Figure 5-15: Training curves of the modified AlexNet, ResNet50, and modified MoblieNetV2. ............108

Figure 5-16: Comparison of learning performance of the modified MobileNetV2 (red), MobileNetV2 A

(magenta), and MobileNetV2 B (cyan) on the training and validation sets. .............................................112

Figure 5-17: Validation accuracy of the three employed architectures. ....................................................113

Figure 5-18: Moisture map generation using a convolutional neural network (CNN) classifier. .............115

Figure 5-19: A comparison of the generated moisture maps using the modified AlexNet, ResNet50, and

modified MobileNetV2 moisture classifiers..............................................................................................117

Figure 5-20: The modified U-Net architecture employed in this study. ....................................................120

Figure 5-21: Moisture map generation using CNN-based semantic segmentation. ..................................125

Figure 5-22: A comparison example between our generated moisture maps and the ground truth. .........126

Figure 5-23: Comparison examples between the HLP moisture maps generated by using classification and

segmentation CNN models. .......................................................................................................................129

x

List of Abbreviations

BLS Barren Leach Solution

BN Batch Normalization

CNN Convolutional Neural Network

CONV Convolutional

CP Control Point

CRS Coordinate Reference System

DL Deep Learning

ELU Exponential Linear Unit

EM Electromagnetic

ESA European Space Agency

FC Fully-Connected

FN False Negative

FP False Positive

GCP Ground Control Point

GD Gradient Descent

GPR Ground Penetrating Radar

GPS Global Positioning System

GSD Ground Sampling Distance

HDPE High-Density Polyethylene

HL Heap Leaching

HLP Heap Leach Pad

IFOV Instantaneous Field Of View

KNN K-Nearest Neighbours

LReLU Leaky Rectified Linear Unit

MIoU Mean Intersection Over Union

MLP Multilayer Perceptron

NN Neural Network

PGMs Platinum Group Metals

PLS Pregnant Leach Solution

PSD Particle Size Distribution

ReLU Rectified Linear Unit

RF Random Forest

RGB Red, Green, Blue

RMSE Root Mean Square Error

ROI Region Of Interest

ROM Run-Of-Mine

RS Remote Sensing

SGD Stochastic Gradient Descent

SIFT Scale-Invariant Feature Transform

SMOS Soil Moisture And Ocean Salinity

SSM Surface Soil Moisture

SVM Support Vector Machine

SVR Support Vector Regression

TF TensorFlow 2

TIR Thermal Infrared

TP True Positive

UAV Unmanned Aerial Vehicle

1

Chapter 1 Introduction

Introduction and Motivation

Depletion of high-grade ore reserves has led to an increasing interest in the extractive

hydrometallurgical technologies that are suitable for low-grade ore deposits. Heap leaching, as a

prominent option for processing low-grade ores, has been widely adopted in recent years due to

its easy implementation and high economic feasibility (Ghorbani et al., 2016). For heap leaching

operations, a high metal recovery requires a uniform leach solution coverage over the surface of

the heap leach pad (the facility that contains the ore material) because an uneven distribution of

moisture can lead to suboptimal leaching conditions and challenging operational problems

(Lankenau and Lake, 1973; Roman and Poruk, 1996). As heap leaching (HL) is a continuously

ongoing operation, monitoring plays a critical role in optimizing the production process and

providing sufficient feedback to the decision makers. Appropriate monitoring of HL operations

relies on the collection of high-quality data and the generation of informative analysis results based

on the acquired measurements. Hence, it is essential to have an efficient data collection routine

and advanced data analytics to optimize productivity and resolve technical challenges.

A good understanding of the spatial and temporal variations of surface moisture content over a

heap leach pad (HLP) is essential for HL production and to achieve a high metal recovery.

Therefore, a fundamental task in HL production optimization is to collect representative data from

the HLP to monitor production performance. However, the conventional data collection method

relies on manual sampling and naked-eye observation of the HLP by technical staff, which exposes

the personnel to the hazardous leaching reagent (e.g., cyanide solution) (Pyke, 1994). Moreover,

this labour-intensive method provides data with low spatial and temporal resolutions, resulting in

inefficient data analysis of the manually collected samples due to the cumbersome laboratory

experiment procedures. In contrast, using unmanned aerial vehicles (UAVs) combined with aerial

imaging techniques to obtain image data remotely can significantly improve the data acquisition

process in terms of time efficiency, data quality and quantity. The UAV-based approach is fast,

on-demand, and automated. It can also collect data with high temporal and spatial resolution. With

this approach, the regions inaccessible by human operators can be covered, and the obtained

images become a permanent record of the field conditions at a specific point of time, which can

2

be revisited in the future for various monitoring applications. In this work, a UAV platform

equipped with one digital camera and one thermal camera was used to acquire colour and thermal

images simultaneously over an HLP. The collected data were used to perform spatial analyses of

the moisture distribution over the HLP surface by using thermal remote sensing methods and

advanced computer vision techniques.

Thermal remote sensing has been widely utilized for terrestrial surface moisture estimation in a

vast variety of studies (Zhang and Zhou, 2016). It is shown that a strong relationship between

thermal measurements and material moisture content generally exists, and such a relationship can

be exploited to effectively estimate ground surface moisture (Kuenzer and Dech, 2013; Liang et

al., 2012). Among the various analytic methods, empirically derived correlations between

temperature measurements and material moisture content can be used to generate surface moisture

maps with high spatial resolution and adequate accuracy (Sugiura et al., 2007). The generated

moisture maps provide direct visualization of surface moisture variation over the surveyed area,

and such graphical results are effective tools for inspecting HL operations. From an HLP

monitoring perspective, surface moisture maps are useful for illustrating the moisture coverage

over the HLP surface and can be involved in the irrigation optimization process to depict the

performances of different solution application strategies quantitatively. Therefore, a framework

for generating HLP surface moisture maps based on thermal remote sensing data is introduced in

this thesis, and the proposed method can be utilized to streamline the HLP monitoring process.

Recent advances in deep learning-based computer vision techniques have shown promising

performance in a broad range of applications, including terrestrial surface moisture estimation

based on remote sensing imagery (Ge et al., 2018; Sobayo et al., 2018). One particular type of

deep leaching (DL) model that has shown remarkable performance on processing image data is

called convolutional neural network (CNN) (LeCun, 1989). CNN models have the capacity of

accommodating inputs with different modalities (e.g., images taken by different types of cameras),

and the models can extract latent information contained in the sensor data. Such property allows

the models to learn complex functions automatically without the need of feature engineering and

variable selection. To leverage the power of CNN models, this thesis proposes two CNN-based

moisture map generation approaches in which the acquired thermal and colour image data are used

as input simultaneously. Moisture maps are generated in an end-to-end fashion, and the proposed

methods have the potential to be further developed towards a fully automated data analysis process.

3

1.1 Research Objectives

The main goal of this thesis is to develop a practical and effective HLP surface moisture monitoring

workflow, starting from UAV-based data collection, followed by off-line data processing, and

ending with surface moisture map generation. To achieve this goal, the specific research objectives

include:

1) Designing and implementing a UAV-based data acquisition method to collect field data

from a heap leaching operation;

2) Conducting appropriate data preprocessing and preparation for moisture map generation;

3) Exploring a correlation between aerial thermal measurements and HLP surface material

moisture content;

4) Mapping heap leach pad surface moisture distribution using a thermal remote sensing

method; and

5) Developing frameworks that incorporate convolutional neural networks for generating

heap leach pad surface moisture maps.

1.2 Thesis Outline

This thesis consists of six chapters:

• Chapter 1: introduces the project motivation, research objectives, and thesis structure.

• Chapter 2: provides background information and literature review on the theory, concepts,

and recent applications relating to the data analyses performed in this work.

• Chapter 3: describes the field experiment conduced and the mine site in which the data

was collected. The details of surveying schedule, equipment specification, and data

collection campaigns are provided.

• Chapter 4: elaborates the process of empirical model development and moisture map

generation based on the acquired thermal images and in-situ moisture measurements. An

4

in-depth discussion about the advantages, limitations, and possible improvement of the

proposed method is included.

• Chapter 5: presents a thorough description of the two CNN-based moisture map

generation approaches. The explanation of both methods starts with data preparation,

followed by network construction, model training, and ends with model evaluation and

moisture map generation. A discussion comparing the two methods is included, and the

possible improvement of the proposed approaches is also outlined.

• Chapter 6: provides a summary of the thesis and outlines recommendations for future

work.

5

Chapter 2 Literature Review

Background Information and Literature Review

This chapter provides a review on the background information and related work associated with

the experiment and data analyses presented in this thesis. Section 2.1 outlines a brief review of the

heap leaching technology, followed by Section 2.2 in which the use of unmanned aerial vehicles

for data acquisition in mining environments is discussed. Section 2.3 presents a high-level

overview of the different remote sensing methods for soil moisture estimation, and Section 2.4

includes an explanation of the thermal remote sensing principles and concepts related to this work.

As convolutional neural networks (CNNs) are used for processing of the collected data, Section

2.5 summarizes the key deep learning theory and concepts, whilst Section 2.6 provides a review

on the CNN-based moisture recognition and monitoring applications presented in the literature.

2.1 Heap Leaching

Heap leaching (HL) is a mineral extraction technology that has been widely adopted in recent years

due to its high economic feasibility. HL operation is a hydrometallurgical recovery process where

metal-bearing ore is piled on an impermeable pad (i.e., an engineered liner), and a water-based

lixiviant, or leaching reagent, is irrigated on top of the heap surface (Ghorbani et al., 2016). The

leach solution flows through the pile and contact with the ore, such that the metal or mineral of

interest is extracted from the rock and dissolved into the solution (Kappes, 2002). Solution exits

the base of the heap through slotted pipes and a gravel drainage layer located above the liner (Pyper

et al., 2019). The metal-bearing pregnant leach solution (PLS) is collected in the PLS pond (i.e.,

pregnant pond) and then pumped to the processing facility for recovery of the extracted metal

(Ghorbani et al., 2016; Pyper et al., 2019). After the valuable metal is recovered from the PLS, the

barren leach solution (BLS) is pumped to the barren solution pond and reapplied to the heap after

refortifying with lixiviant chemicals (Pyper et al., 2019; Watling, 2006). A typical heap leaching

circuit is illustrated in Figure 2-1.

As described above, the technology of HL encompasses multiple scientific disciplines, including

physics, hydrology, geology, chemistry and biology (Bhappu et al., 1969; Pyper et al., 2019), and

there is a vast number of topics involved in the study area. In this section, we briefly introduce

6

several topics that are related to our experiment, where Ghorbani et al. (2016) provided an in-depth

and comprehensive review of the heap leaching technology, and Pyper et al. (2019) presented a

thorough introduction to the different operational components of dump and heap leaching. In the

literature, heap leaching can sometimes refer to both run-of-mine (ROM) dump leaching and

crushed ore heap leaching. In this section, we refer the term to crushed ore heap leaching, and our

emphasis is on HL of gold-bearing ore.

In practice, HL is most commonly used for low-grade ore deposits, although it is sometimes

applied to small high-grade deposits to control capital cost in higher-risk jurisdictions (Ghorbani

et al., 2016). Several advantages of HL as compared to milling ores include: low capital and

operating costs, quick up-front construction and installation, simple equipment and operation, no

liquid/solid separation step, less water requirement compared to flotation, no tailing disposal, and

most importantly, practical and effective for processing low-grade deposits (Kappes, 2002;

Ghorbani et al., 2016). Thanks to the high practicality and economic feasibility, HL has been

applied to extract a wide range of metals, such as gold, copper, silver, uranium, zinc, nickel, cobalt,

and platinum group metals (PGMs) (Mwase et al., 2012; Padilla et al., 2008; Pyper et al., 2019).

According to Marsden and House (2006), 10% of the world’s gold production was produced from

heap leaching in 2006, and HL is gaining an increased interest in the mining industry nowadays

(Ghorbani et al., 2016).

Figure 2-1: Illustration of a typical heap leach flow sheet. Extracted from Pyper et al. (2019).

7

In order to successfully extract the valuable metals from the stacked ore, the applied lixiviant

should first diffuse within the heap leach pad (HLP) and then chemically react with the target

mineral. The reaction should allow the solution to dissolve valuable metal while minimally

dissolve gangue material (Pyper et al., 2019). The metal-rich solution should then diffuse away

from the reaction site and finally percolate out from the bottom of the heap (Kappes, 2002).

However, this process is highly affected by the permeability within the HLP. Since different

regions within the heap may have different permeabilities, if a regional solution application rate

surpasses the permeability of the area, the solution will travel horizontally until a more permeable

zone is reached. A significant flow channelling can occur if large variations of permeability are in

concert with excessive solution application. The channelling of solution will result in unleached

areas within the heap and diluted PLS grades (Bouffard and Dixon, 2000; Ghorbani et al., 2016).

Moreover, solution over the impermeable zones tends to build up, resulting in surface ponding or

perched water table. If a large volume of solution is retained near the edge of the HLP, the solution

can blow out the heap slope, leading to potential stability issues (Pyper et al., 2019). Therefore,

heap permeability is crucial, and it is affected by the material particle size distribution (PSD) as

well as the ore preparation and stacking process.

2.1.1 Ore Preparation and Stacking

Ore preparation is often conducted before the placement of material onto the HLP. Several

common preparation steps for gold ore include: crushing of ROM, addition of lime for pH

adjustment, and agglomeration of the crushed rock. For crushed rock heap leaching, size reduction

of ROM is generally carried out through crushing at which the target mineral is liberated for leach

extraction (Pyper et al., 2019). The top sizes of the crushed rock usually range from 10 to 40 mm,

where a P80 is often desired to be greater than 6 mm to avoid permeability issues (Brierley and

Brierley, 2001; Ghorbani et al., 2016). The addition of lime or other pH modifiers is performed

during either crushing/stacking or agglomeration (Pyper et al., 2019). The preferred level of pH

for cyanide gold leaching is within 9.5 to 11 because below this range can increase cyanide

consumption while above this range will lead to a decrease in metal recovery (Ghorbani et al.,

2016). Although agglomeration is not always required, it can be used to mitigate segregation of

fines and reduce the chance of blinding (i.e., solution cannot flow downwards) (Pyper et al., 2019).

The purpose of agglomeration is to adhere the fines to each other or to larger particles so that a

more uniform heap can be resulted (Lewandowski and Kawatra, 2009; Velarde, 2007).

8

There are two principal methods for ore stacking: trucking stacking and conveyor stacking

(Ghorbani et al., 2016). Although some operations may use excavator stacking when the other two

options are not applicable due to accessibility issues (Pyper et al., 2019), it is less commonly used

than the other two approaches. For truck stacking, it is often used with competent ores with low

clay content. The heaps are generally constructed using the same techniques as waste dump

construction and maintenance (Pyper et al., 2019). The advantage of truck stacking is that it is

usually more flexible than conveyor stacking (Kappes, 2002). Nevertheless, the major

disadvantage of truck stacking is the compaction of ore due to the truck loads (Kappes, 2002).

Therefore, ripping is typically carried out to mitigate compaction prior to leaching (Pyper et al.,

2019).

Conveyor stacking system is commonly used for handling a large quantity of ore material, and it

can lead to a more uniform PSD across the heap (Ghorbani et al., 2016). In a typical conveyor

stacking system, one or more overland conveyors are used to connect the preparation plant (e.g.,

crushing plant) to the HLP. Multiple grasshopper conveyors are included across the active heap

area to feed a radial stacker conveyor, where the grasshopper conveyors are connected to the

overland conveyor through a tripper conveyor (Pyper et al., 2019). A stacker-follower conveyor

and a transverse conveyor, or horizontal indexing conveyor, are often involved in the system to

facilitate the material handling (Kappes, 2002; Pyper et al., 2019). One advantage of using

conveyor stacking system is that it allows gentle placement of ore, which reduce the amount of

compaction and segregation (Ghorbani et al., 2016).

2.1.2 Heap Leach Pad Configurations

Overall, there are three main types of HLP configurations (Figure 2-2): standard pad, valley fill

pad, and on/off pad (Ghorbani et al., 2016; Thiel and Smith, 2004). The selection of HLP

configuration has profound influences on capital and operation costs, leaching solution application

and collection, recovery plant sizing, stacking method, and heap closure (Pyper et al., 2019). An

HLP can consist of either one or multiple lifts, where a typical lift height ranges from 2 to 15 m

(John, 2011).

Standard pads (also refers to as traditional, conventional, flat or expanding pad in the literature)

require large ground areas for the pad construction and expansion (Lupo, 2010; Pyper et al., 2019;

Thiel and Smith, 2004). The ideal construction condition is on a flat topography with a slight slope

9

(e.g., 1-3% slope), although a pad can be also built in rougher terrain (Pyper et al., 2019). In

general, a standard pad requires low initial capital cost, and it is suitable for various ore types and

leach cycle time (Thiel and Smith, 2004). The construction requires relatively simple liner system,

and the pad offers flexibility for incremental pad expansion (Lupo, 2010; Pyper et al., 2019).

Figure 2-2: Three main types of heap leach pad configurations: (a) Standard pad; (b) Valley fill pad; (c)

On/off pad. Extracted from Lupo (2010).

10

Valley fill pads are constructed in steep topography (e.g., valley, basins), where the foundation

slope can often reach 40-50% (Ghorbani et al., 2016). A valley fill pad can often accommodate

variable ore production and leach cycle time, and it is suitable for hard and durable ores (Pyper et

al., 2019). Since a valley fill pad is constructed in steep terrain, a retaining structure (e.g., a dam)

is often required to be developed, and the cost of installation and construction is more expensive

than the other pad configurations. A valley fill pad generally has an internal solution storage pond

(Figure 2-2b), where leak detection and pumping systems are usually required for the internal pond


On/off pads are often used to process soft ores that cannot be stacked to a large heap height

(Ghorbani et al., 2016; Thiel and Smith, 2004). The ore material is loaded and leached, followed

by removal at the end of the leach cycle. The pad is then recharged with fresh ore, and the spent

ore (ripios) is either abandoned or sent to a secondary leach pad for continuation leaching (Pyper

et al., 2019). An on/off pad is generally less expensive to construct compared to the other pad

configurations, but it has a higher operational cost due to the double handling of material (Ghorbani

et al., 2016). The leach cycle of ores in an on/off pad is relatively short (30 days or less), and the

configuration is useful in regions with limited ground areas (Pyper et al., 2019). Several

disadvantages of on/off pads include high maintenance cost, severe liner damage due to frequent

material handling, and requirement of multiple cells (at least three) for continuous operation


2.1.3 Leaching

Following ore preparation and heap construction, leaching is conducted by applying a water-based

lixiviant over the heap surface. The leach solution application should have uniform surface

coverage because a maximal metal extraction requires optimum wetting uniformity (Pyper et al.,

2019). In addition, the solution application rate should be slower than the hydraulic conductivity

of the ore to prevent surface ponding. The existence of solution ponds on the heap surface can

become a threat to wildlife and a potential risk of heap stability issues (Franson, 2017; Marsden,

2019). According to Pyper et al. (2019), the solution application rates in practice vary from 2.4 to

19.6 L/h/m2, where a typical range is 8-12 L/h/m2.

Although there are various solution spreading devices employed in practice (e.g., wobbler

sprinklers, rotating impact sprinklers, D-ring sprinklers, misters, pressure drip emitters), irrigation

11

systems can be generally classified into sprinklers or drip emitters (Ghorbani et al., 2016). Dripper

lines are commonly made of high-density polyethylene (HDPE), and sprinkler systems are often

constructed using polyvinyl chloride or HDPE (Pyper et al., 2019). In general, a drip emitter

system can result in a gentle and precise solution application while diminishing evaporation losses

of solution and reagents. It has the advantages of easy installation and applicability to a wide range

of pressure conditions (Ghorbani et al., 2016; Pyper et al., 2019). However, drip emitters often

have small effective flow areas and do not provide continuous drip coverage, while channelling

and plugging problems make them difficult to accomplish sufficient solution/ore contact,

especially for the top one meter of the heap (Ghorbani et al., 2016; Kappes, 2002). In contrast,

sprinklers are easy to maintain, simple to conduct visual check, and convenient to adjust flow rate

while providing an uniform solution distribution pattern over the HLP surface (Ghorbani et al.,

2016). Nevertheless, sprinkler systems can increase the evaporation loss of reagents and might

lead to environmental and health hazards, especially in windy conditions (Ghorbani et al., 2016;

Pyper et al., 2019). Despite the pros and cons of sprinklers and drip emitters, both kinds of systems

have been successfully deployed in HL operations worldwide (Marsden, 2019).

In gold heap leaching, the commonly used water-based lixiviant is dilute cyanide solution. The

cyanidation process is proven to be effective for gold extraction, and cyanide is considered an

environmentally acceptable reagent among other alternatives (e.g., bromide, thiocyanate,

thiosulfate, iodide solutions) (Ghorbani et al., 2016; Grosse et al., 2003; Srithammavut, 2008).

Metals like gold and silver can be dissolved by a dilute alkaline sodium cyanide (NaCN) solution

at very low concentration (Marsden, 2019; Ghorbani et al., 2016), and the general reaction for gold

dissolution is expressed as:

4Au + 8CN− + O2 + 2H2O = 4Au(CN)2− + 4OH− (2.1)

The gold dissolution rate is affected by the NaCN concentration and alkalinity of the solution. The

desired range of solution pH is 9.5 to 11 (Ghorbani et al., 2016), and alkali may be added to the

leach solution for pH modification and control (Marsden, 2019). A typical range of cyanide level

within the heap ranges from 100 to 600 mg/L (or ppm) NaCN, and a maximized gold dissolution

rate may be achieved by maintaining the HLP runoff solution to have a concentration of

approximately 50-100 mg/L NaCN (Marsden, 2019; Ghorbani et al., 2016). Overall, the leaching

efficiency of a HLP is affected by several factors, including the chemistry of the applied solution,

12

the degree of gold liberation in the crashed ore material, the efficiency of ore-solution interaction,

and the amount of time allowed for the leaching reaction (Marsden, 2019). Precise control of the

abovementioned factors are hardly achievable in practice, but the HL performance may be tracked

by carefully monitoring the gold and cyanide concentration, pH, dissolved oxygen concentration,

and temperature of the process solutions (Marsden, 2019). In addition, maintaining a uniformity

of solution distribution across the HLP surface remains a critical monitoring task to ensure

sufficient contact between ore and leach solution while preventing surface ponding issues

(Marsden, 2019; Ghorbani et al., 2016).

In practice, leaching side slopes of an HLP is considered a challenging operational task (Pyper et

al., 2019). Neither sprinklers nor drip emitters provide promising options for addressing the

problem (Pyper et al., 2019), while the monitoring of side slope leaching also remains difficult due

to the inaccessibility by humans. Some operations found that the use of small sprinklers with gentle

spraying patterns offers a reasonable compromise for side slope leaching (Pyper et al., 2019). In

this study, we propose to use unmanned aerial vehicle equipped with remote sensing sensors to

constitute an effective and efficient option for HLP monitoring even for those human inaccessible

areas over the HLP.

2.2 Data Acquisition Using Unmanned Aerial Vehicle in Mining

Environments

Data acquisition using unmanned aerial vehicle (UAV) platforms has been adopted in almost every

study area that requires observed data from top or oblique views (Yao et al., 2019). Many studies

in mining (Bamford et al., 2020; Medinac et al., 2020), agriculture (Ivushkin et al., 2019), forestry

(Wallace et al., 2016), construction inspection (Lee et al., 2016) have demonstrated the practicality

and effectiveness of employing UAVs to perform various surveying and monitoring tasks.

Recently in the mining industry, Bamford et al. (2020) employed UAV systems to monitor blasting

process in four open pit mines, where visual data were collected during the pre-blasting, blasting

and post-blasting stages; Medinac et al. (2020) used a UAV system to perform haul road

monitoring to assess road conditions in an open pit mine; and Medinac and Esmaeili (2020)

collected UAV data to perform pit wall structural mapping and a design compliance audit of the

pit slope. Several other applications of UAVs in mining environments include dust monitoring

(Alvarado et al., 2015; Zwissler, 2016), drillhole alignment assessment (Valencia et al., 2019), pit

13

wall mapping (Francioni et al., 2015), particle size segregation analysis (Zhang and Liu, 2017),

and rock fragmentation analysis (Bamford et al., 2017a). However, not much attention in the

literature has been put on leveraging the power of UAV and sensing technology to perform heap

leach pad monitoring, especially HLP surface moisture mapping.

There are several advantages of using UAVs to conduct data acquisition in mining environments.

The data collected using UAV platforms are generally with high spatial and temporal resolutions,

which are hardly achievable by conventional point-measurement methods or even satellite-based

approaches. Meanwhile, UAV-based data acquisition reduces time effort in data collection and

increases the safety of personnel. The use of UAVs can survey a large field area within a short

duration while reducing the frequency of exposing technical staff to ongoing production

operations. The regions inaccessible by human operators can be covered, and the need for

personnel to collect data in hazardous environments (e.g., over an HLP with cyanide leaching) can

be diminished. In addition, if one or more imaging sensors are mounted on a UAV, the obtained

images with respect to the mining facility (e.g., a pit or an HLP) would become a permanent record

of the field conditions at a specific point of time, which can be revisited in the future as required

(Bamford et al., 2020). This is very useful for tasks like design compliance audit and change

detection. Also, many practitioners have devoted to developing real-time monitoring techniques

by incorporating computational devices or resources (e.g., onboard computer or cloud computing)

with UAVs, and the successful deployment of such systems can carry out real-time and on-demand

monitoring of production operations, which will be beneficial for timely decision making.

However, UAV-based data collection methods have their limitations. Different jurisdictions may

have different regulatory requirements, which can limit the use of UAVs in the mining

environments (Bamford et al., 2020). Moreover, weather, and environmental conditions have a

significant impact on both the data obtained by the UAV system as well as the UAV itself. The

variations of lighting conditions and cloud shadowing often have a large influence on the quality

of images. UAV platforms are generally not available to operate in extreme weather, such as rain,

snow, and storm. Also, consistently exposing a UAV system to a dusty and hot environment can

damage the UAV and wear the onboard sensors (Bamford et al., 2020). Therefore, appropriate

cleaning and maintenance of the UAV system after each data collection campaign is always

recommended to improve equipment durability.

14

The tremendous success and advantage of applying UAVs to conduct surveying and monitoring

tasks have contributed to the rapid development of UAV and sensing technologies in recent years

(Pajares, 2015). Various types of UAVs and sensors have been manufactured and commercialized

nowadays, which significantly advance the use of UAVs in different industries and working

environments. Overall, there are several categorization schemes for UAVs, and each

categorization method is based on one or multiple design attributes, including payload, endurance,

range, drone weight, flight speed, wing configurations, and flying altitude (Valavanis and

Vachtsevanos, 2015; Korchenko and Illyash, 2013; Yao et al., 2019). The data collection

conducted in our experiment was performed using a hexa-copter with a maximum gross takeoff

weight of approximately 15 kg (35 lbs). The detailed specification of our UAV system is described

in Chapter 3.

Although RGB cameras are the most used onboard sensors for UAV systems, there are other

imaging sensors that have been adopted for both academic and commercial applications, such as

multispectral, hyperspectral and thermal infrared cameras (Yao et al., 2019). For RGB cameras,

there are numerous options in the market, and some important specification parameters include

camera lens, resolution, and sensor chip quality. Cameras with better lenses and sensor chips can

result in less geometric distortions and higher signal-to-noise ratios than the worse ones (Yao et

al., 2019). A few RGB camera selection guidelines have been provided by Nex and Remondino

(2014), and Colomina and Molina (2014). Thermal infrared cameras are commonly used for

obtaining surface temperature and thermal emission measurements (Yao et al., 2019). These

measurements can be further processed to retrieve soil properties as well as material surface

moisture content (Ivushkin et al., 2019; Sobayo et al. 2018). Due to the payload limitation of

common commercial drones, UAV-based thermal cameras are generally without cooled detectors,

which result in lower sensitivity, spatial resolution, and capture rates than RGB cameras (Yao et

al., 2019). However, with properly designed flight height and image acquisition rate, the images

collected by a thermal camera can be integrated with data recorded in other spectral wavelengths

(e.g., RGB) to perform data analysis (see Chapter 5). For multispectral cameras, they are often

used for vegetation-related tasks as well as farming and hydrological applications (Calderón et al.,

2014; Candiago et al., 2015; Kemker et al., 2018; Kislik et al., 2018). As more and more data

processing packages and algorithms become available, data acquisition using UAV-based

multispectral cameras may become more common in the future (Yao et al., 2019). Although light-

15

weight hyperspectral cameras (e.g., Burkart et al., 2014; Suomalainen et al., 2014) are able to

capture images with a large number of narrow bands (e.g., a few hundred or even more than a

thousand bands with 5-10 nm bandwidth), they are usually expensive and not as mature as the

other camera sensors nowadays. Nevertheless, as the sensing technology is growing rapidly while

more and more data-driven algorithms are proposed in the literature (e.g., deep learning

techniques), the ability to capture a large amount of data by a single sensor within one flight can

become very appealing in the upcoming future. Besides the abovementioned sensors, Colomina

and Molina (2014) provided a review on the light-weight sensors that are available for low-payload

aerial platforms, and Pajares (2015) presented a thorough review on a wide range of sensors (e.g.,

camera, LiDAR, radar, sonar, gas detector) used for UAV-based data collection.

2.2.1 UAV Flight Planning

Despite the remarkable success of using UAV systems to acquire remote sensing data, there is no

universal guideline for UAV-based data collection. Data acquisition practices can vary

significantly even for the same or similar application, where different practitioners may develop

disparate practices through the learning-by-doing approach (Yao et al., 2019). One reason for such

phenomenon is because the different combinations of UAVs and sensors add flexibility and

complexity to the data acquisition process.

One practical flight planning method of UAV digital photogrammetry for geological surveys was

outlined by Tziavou et al. (2018), where the method was implemented and elaborated by Bamford

et al. (2020) for applications in the mining context. Bamford et al. (2020) adopted and applied the

method to collect photogrammetry data in multiple mining operations, demonstrating the

effectiveness and practicality of the approach in generating UAV flight plans. In this study, the

flight plans were generated following the practices employed by Tziavou et al. (2018) and Bamford

et al. (2020), and the implementation steps are described below.

Several factors should be considered to create a flight plan for photogrammetry data collection,

including image/photo overlaps, target distance, lighting and weather conditions, and camera’s

resolution, focal length and field of view (Bamford et al., 2020). In this study, the data collection

was performed by observing the HLP from a top-down view (i.e., the camera was tilted down to

the nadir), and the distance between the HLP surface and UAV system was considered the main

controllable parameter. To determine the appropriate distance from the HLP surface (i.e., flight

16

altitude in our case), the first step is to obtain knowledge about the dimension of the minimum

measurement target. For instance, in our experiment, the sprinkler spacing over the HLP was 3 m.

We decided to set the desired minimum measurement target to be approximately 1.5 m (i.e., half

of the sprinkler spacing), and this value was used to determine the objective ground sampling

distance (GSD). The GSD is defined as the ground distance covered between two adjacent pixel

centers. Bamford et al. (2020) suggested that the GSD should be at least an order of magnitude

smaller than the minimum measurement target, and thus we adopted a GSD varied from 10

cm/pixel to 15 cm/pixel. After determining the GSD, the flight altitude can be calculated by:

𝑧 = √GSD2𝑖𝑤𝑖ℎ

4 tan (𝑓ℎ

2) tan (

𝑓𝑣

2) (2.2)

where 𝑖𝑤 and 𝑖ℎ are the image width and height in pixels, respectively; 𝑓𝑣 and 𝑓ℎ are the lens

vertical and horizontal angle of view, respectively; GSD is the ground sample distance in meters

per pixel; and 𝑧 is the flight altitude in meters (Bamford et al., 2020; Langford et al., 2010).

Once the flight altitude is determined, the side and front spacing as well as the flight velocity can

be calculated as:

𝑠 = 2𝑧 tan (

𝑓ℎ

2) (1 − overlapside) (2.3)

𝑓 = 2𝑧 tan (

𝑓𝑣

2) (1 − overlapfront) (2.4)

𝑣𝑓 =

𝑓

𝑡𝑝 (2.5)

where 𝑠 is the side spacing between pictures in meters; 𝑓 is the front spacing between pictures in

meters; 𝑡𝑝 is the time between taking images (shutter interval) in seconds; 𝑣𝑓 is the flight velocity

in meter per second; 𝑓𝑣 and 𝑓ℎ are the lens vertical and horizontal angle of view, respectively; and

overlapside and overlapfront are the percentages of side and front overlap between images,

respectively (Bamford et al., 2020; Langford et al., 2010). In the literature, some studies

recommend the front overlap to be within the range of 30% to 85%, and 70% to 85% for side

overlap (Bamford et al., 2017; Dash et al., 2017; Francioni et al., 2015; Salvini et al., 2017; Tziavou

17

et al., 2018). Figure 2-3 schematically illustrates the concepts of front and side overlaps, where the

side spacing is the distance between the two flight lines, and the front spacing is the distance

between the two image centers on the same flight line (Bamford et al., 2020). In our field

experiment, the front and side overlap were designed to be 85% and 70%, respectively, where the

detailed flight plans are described in Chapter 3.

Beyond the creation of flight plans, the image georeference accuracy is also critical when the

acquired images are used to generate orthomosaics (also called true orthophotos, which are

generated based on an orthorectification process). Although some of the images captured by the

UAV system were georeferenced using onboard global positioning system (GPS), the GPS

coordinates recorded in the air are sometimes not as reliable as measurements made on the ground

(Bamford et al., 2020). In such cases, ground control points (GCPs) are often used to obtain better

positioning information. A GCP is an object or point in an image at which the real-world

coordinates are known (Linder, 2013). In this study, GCPs were placed over the HLP during the

field experiment, and the GPS coordinates of each GCP were measured using a portable GPS

device. The recorded positioning information was used to facilitate the data analyses described in

Chapter 4 and Chapter 5.

Figure 2-3: Illustration of overlaps and flight lines for heap leach pad photogrammetry data collection

18

2.3 Moisture Estimation Using Remote Sensing

This section presents a brief review of the different remote sensing methods for soil moisture

estimation proposed in the literature. The content covered in this section is largely extracted from

Tang and Esmaeili (2020).

In practice, two types of sensors are employed by remote sensing systems, namely, passive and

active sensors (Liang et al., 2012). A passive remote sensing system collects data through using

one or multiple passive sensors (e.g., digital cameras, thermal cameras, spectroradiometers), which

detect electromagnetic (EM) radiation that is either emitted or reflected from the target (Khorram

et al., 2012). In contrast, an active remote sensing system employs active sensors, such as radars,

to proactively release EM energy toward the target and record the amount of radiant flux scattered

back to the system (Jensen, 2009).

In remote sensing, a number of methods have been studied and applied to estimate surface soil

moisture (SSM) (Campbell and Wynne, 2011). Petropoulos et al. (2015) provided an in-depth

review of the principal foundations, advantages, drawbacks and current applications of different

soil moisture retrieval methods. According to Petropoulos et al. (2015), remote sensing-based SSM

retrieval methods can be grouped into three categories: microwave remote sensing, optical remote

sensing and synergistic methods, where synergistic methods are essentially data fusion techniques

developed to manage the complementarity between various types of data. Each of these categories

either uses one portion of the EM radiation spectrum or multiple regions of the spectrum as input

to estimate SSM.

In microwave remote sensing, the methods can be divided into passive and active microwave

sensing. Passive microwave sensors are designed to measure the naturally emitted microwave

emissions, with wavelengths ranging from 1 to 30 cm. The emitted EM signal at these wavelengths

is related to the soil dielectric properties closely associated with SSM (Chen et al., 2012). The

advantages of this method are that the data acquisition is not limited to daytime conditions, and

atmospheric effects become less significant when the detected EM wavelength is above 5 cm

(Petropoulos et al., 2015). However, the measurements from passive microwave systems are often

influenced by factors such as soil surface roughness and soil texture (Chai et al., 2010), as well as

being at a coarser spatial resolution as compared to other methods (Petropoulos et al., 2015).

Recent studies about using passive microwave remote sensing to estimate SSM are commonly

19

developed based on satellite measurements. The European Space Agency’s (ESA) Soil Moisture

and Ocean Salinity (SMOS) mission is a well-known program that uses on-board passive

microwave sensors to collect global-scale data. Similar to passive microwave remote sensing, the

measurements generated by active microwave sensors are related to SSM through the dielectric

properties of soil, and the measurement readings they produce are sensitive to soil surface

roughness. However, active microwave instruments also proactively release EM energy towards

the target surface, and the difference between the transmitted and received EM radiation,

commonly referred to as the backscatter coefficient, is subsequently measured (Petropoulos et al.,

2015). There are various empirical, semi-empirical and physically-based models that relate the

SSM to the backscatter coefficient. For example, Zribi and Dechambre (2002) developed an

empirical model to estimate the SSM by using the C-band radar measurements; Oh (2004)

proposed a semi-empirical model to directly retrieve both SSM and soil roughness using the

multipolarized radar measurements; and Shi et al. (1997) proposed a physically-based algorithm

to provide estimation on SSM and soil roughness using the L-band radar data. As compared to

passive microwave methods, the active methods can generate higher spatial resolution results, and

can thus be used in field experiments. Many investigations have been carried out to use ground

penetrating radar (GPR) to estimate soil moisture in both laboratory and field settings. For

instance, Ercoli et al. (2018) conducted both laboratory and field experiments to evaluate the

feasibility of using GPR to obtain SSM information for engineering and hydrogeological

applications; and Lunt et al. (2005) used GPR to estimate changes in soil moisture content under

different soil saturation conditions at a winery.

For optical remote sensing, Zhang and Zhou (2016) provided a review on the principal

foundations, advantages, limitations and practicalities of the existing optical methods. These

methods are categorized as optical because they utilize the properties of the optical wavelengths

of the EM spectrum, which extend from 0.3 to 15 μm, to estimate soil moisture (Swain and Davis,

1978). According to Petropoulos et al. (2015) and Zhang and Zhou (2016), optical remote sensing

methods can be further divided into reflectance-based and thermal infrared-based methods. The

wavelengths used by the reflectance-based methods include the reflective region of the EM

spectrum ranging from 0.4 to 3.0 μm, which covers the visible, near-infrared and shortwave

infrared wavelength regions (Jensen, 2009; Lillesand et al., 2015; Swain and Davis, 1978). These

methods relate the reflected EM radiation from the soil surface to SSM. It has been demonstrated

20

that surface reflectance decreases as SSM increases, and various relationships have been developed

to correlate soil surface reflectance to SSM (Liu et al., 2002; Wang et al., 2010). There are also a

large number of studies that correlate soil surface reflectance to SSM by using different types of

vegetation indices (Gao, 1996; Heim, 2002). In general, most of these correlations are empirically

derived, and these empirical relationships are often subject to challenges of low generality, fine-

tuning and weakness when describing physical processes. In addition, reflectance-based methods

are also influenced by numerous factors such as surface roughness, color of target surface, and

angles of measurement and incidence. Yet, these approaches are typically based on mature

instruments and technologies, and they can provide a high spatial resolution of SSM estimate

(Petropoulos et al., 2015).

In contrast, the thermal infrared (TIR) approaches estimate SSM through measuring the emitted

EM radiation from the soil surface with wavelengths ranging from 7 to 15 μm. These wavelengths

are commonly known as the thermal infrared region or less commonly far-infrared region of the

EM spectrum (Jensen, 2009; Swain and Davis, 1978). The measurements made by TIR sensors

can either directly provide an approximation to the soil surface temperature or be processed to

calculate the soil surface thermal properties. In this way, the TIR methods can be divided into two

groups. The first group refers to thermal inertia methods. Thermal inertia is a soil physical

property, defined by soil thermal conductivity, specific heat capacity, and soil bulk density, that

determines the resistance of soil to temperature variations (Minacapilli et al., 2012). The rationale

for thermal inertia methods is that SSM can affect the soil surface heating process by influencing

the thermal inertia (Zhao and Li, 2013); this is to say, an increase in SSM can result in an increase

in thermal inertia, and thus, lessen the diurnal temperature variation. Through this characteristic,

SSM can be estimated by measuring the diurnal temperature change, followed by solving a

relationship between SSM and temperature variation (Petropoulos et al., 2015). Applications on

using thermal inertia to estimate SSM have shown promising results in both laboratory

experiments (Minacapilli et al. 2012) and satellite-based remote sensing studies (Maltese et al.,

2013; Veroustraete et al., 2012; Verstraeten et al., 2006). Nevertheless, thermal inertia methods

often require ancillary data or up-front understanding of the soil properties (e.g., meteorological

factors or soil bulk density), which are sometimes difficult to obtain in practice (Zhang and Zhou,

2016). Besides the practicality challenges, thermal inertia methods are commonly unable to

provide on-demand SSM estimation and often limited to one estimation per day.

21

The second group of TIR methods employed in practice to estimate SSM is based on empirically

derived correlations between the remotely sensed soil surface temperature and SSM. Many studies

have empirically demonstrated that there exists strong correlations between moisture content and

surface temperatures, and these methods are often easy to implement while providing high spatial

and temporal resolution estimates (Petropoulos et al., 2015; Zhang and Zhou, 2016). Even though

these methods share the common drawbacks possessed by empirical approaches, they often

demonstrate high competency within the conditions in which they have been calibrated

(Petropoulos et al., 2015). In recent years, a number of applications have been carried out to use

UAV-based TIR methods to perform SSM retrievals in agriculture and mine tailing impoundment

monitoring. For instances, Chang and Hsu (2018) equipped a UAV with a thermal camera to

perform data acquisition over farm fields. Thermal images were taken during the field experiments,

and empirical relationships were employed to estimate SSM based on the remotely sensed TIR

data. Zwissler et al. (2016, 2017) conducted both laboratory and field studies to examine the

feasibility of SSM monitoring for mine tailings. In Zwissler’s studies, two empirical regression

models with respect to two different types of tailings were developed using the TIR data collected

in laboratory conditions. The performances of the models were tested in field experiments, and the

results provided meaningful insights.

In this study, a UAV-based passive remote sensing system equipped with one thermal and one

RGB camera was used to capture the emitted and reflected radiation from the heap surface. The

collected data can reveal thermal properties of the leach pad material which can be further used to

estimate the distribution of surface moisture over the HLP. Further details about the data

acquisition process is provided in Chapter 3, and the data analyses are elaborated in Chapter 4 and

Chapter 5.

22

2.4 Thermal Infrared Remote Sensing

As thermal images were acquired and used for the data analyses described in Chapter 4 and Chapter

5, it is beneficial to briefly review some fundamentals of thermal infrared remote sensing. This

section covers the basic concepts and principles that are related to our experiment, where for

further information about the subject may refer to Jensen (2009), Kuenzer and Dech (2013), and

Lillesand et al. (2015).

2.4.1 Thermal Infrared Radiation Principles and Concepts

As stated by Prakash (2000): “thermal remote sensing is the branch of remote sensing that deals

with the acquisition, processing and interpretation of data acquired primarily in the thermal

infrared (TIR) region of the electromagnetic (EM) spectrum.” Any object that has a temperature

greater than absolute zero (0 K) emits EM energy. Therefore, all terrestrial features, such as rock,

water, soil, and vegetation emit TIR radiation in the 3.0-14 𝜇m portion of the EM spectrum

(Jensen, 2009). Human eyes are not sensitive to TIR radiation, and we normally experience thermal

energy through the sense of touch (Jensen, 2009; Lillesand et al., 2015). However, it is possible to

design and engineer TIR sensors (e.g., infrared radiometer, thermal camera or imager) whose

detectors can capture and record the TIR energy, which allow human to sense the radiation

(Kuenzer and Dech, 2013; Lillesand et al., 2015). For thermal cameras, there is no “natural” way

to represent the thermal images because TIR radiation is not naturally visible by human eyes. A

common representation of thermal images is in grayscale, although one may use different colour

schemes (e.g., from red to blue) to deliver a feeling of hot and cold (Lillesand et al., 2015). Since

real-world objects continuously emit TIR radiation, thermal cameras can be operated at any time

of the day and night to obtain thermal images without the need of external light sources (Lillesand

et al., 2015; Prakash, 2000). The magnitude of TIR radiation emitted by an object is a function of

its temperature, and the measurements recorded by a thermal sensor with respect to the object is

dependent on multiple factors, which are discussed below.

Kinetic and Radiant Temperature

According to Lillesand et al. (2015), “kinetic temperature is an ‘internal’ manifestation of the

average translational energy of the molecules constituting a body.” It is the value measured by

using a thermometer in direct physical contact with an object. In contrast, the energy emitted from

23

an object is an “external” manifestation of its energy state (Lillesand et al., 2015). The emitted EM

radiation from the object is called radiant flux, and the concentration of the radiant flux’s

magnitude is known as the object’s radiant temperature (Jensen, 2009). Since kinetic and radiant

temperatures are positively interrelated for most ground objects, it is possible to use thermal

sensors, such as infrared radiometers and thermal imagers, to first sense the radiant temperature

remotely and then relate the measurements back to the object’s kinetic temperature (Kuenzer and

Dech, 2013; Jensen, 2009). The concepts and principles introduced in the remainder of this section

explain what the interrelationship between kinetic and radiant temperature is and how the object’s

temperature can be determined through the measurements from a thermal system.

Blackbody Radiation

A blackbody is a theoretical matter that absorbs and reemits all energy incident upon it (Kuenzer

and Dech, 2013). The amount of energy that a blackbody radiates (i.e., radiant exitance) is a

function of its surface temperature, and the mathematical expression is given by the Stefan-

Boltzmann law (Jensen, 2009),

𝑀black = 𝜎𝑇4 (2.6)

where 𝜎 is the Stefan-Boltzmann constant of 5.6697 × 10-8 (W m-2 K-4); 𝑇 is absolute temperature

(K); and 𝑀black is the total radiant exitance from the surface of a blackbody (W m-2). In addition

to radiant exitance, the spectral distribution of the emitted energy also varies with temperature

(Lillesand et al., 2015). Figure 2-4 illustrates the blackbody radiation curves at different

temperatures, where the area under a particular curve is equal to the total radiant exitance coming

from the surface of a blackbody at that specific temperature (Jensen, 2009). In this way, the

expanded form of the Stefan-Boltzmann law is defined as (Lillesand et al., 2015):

𝑀black = ∫ 𝑀black(𝜆)

∞

0

𝑑𝜆 = 𝜎𝑇4 (2.7)

where 𝑀black(𝜆) is the spectral radiant exitance at wavelength 𝜆 (W m-2 𝜇m-1); and the other terms

have the same definitions as in equation (2.6). The above mathematical expressions imply that the

higher the blackbody’s temperature, the greater the total amount of emitted radiation. This property

can be easily observed by comparing the radiation curves shown in Figure 2-4. Moreover, the

radiation curves also show that the dominant wavelength, which is the peak of a radiation

24

distribution, will shift towards a shorter wavelength as the blackbody’s temperature increases. The

determination of the dominant wavelength for a blackbody at a particular temperature is defined

by Wien’s displacement law,

𝜆max =

𝐴

𝑇 (2.8)

where, A is a constant of 2898 (𝜇m K); T is absolute temperature (K); and 𝜆max (𝜇m) is the

dominant wavelength at which the maximum spectral radiant exitance occurs (Jensen, 2009;

Lillesand et al., 2015). Wien’s displacement law can be used to determine the wavelength at which

the most information can be captured by a sensor with respect to an object. For instance, the

temperature of surface materials on the earth, such as rock, soil, and water, is approximately 300

K (Lillesand et al., 2015). Based on equation (2.8), the dominant wavelength from earth features

is at approximately 9.7 𝜇m. Therefore, a TIR sensor operating in the 8-14 𝜇m region can be used

to detect the radiation emitted from the earth surface with strong responses (Jensen, 2009).

Figure 2-4: Blackbody radiation curves at various temperatures. The area under each curve is the total

radiant exitance of a blackbody at that specific temperature. Extracted from Lillesand et al. (2015).

25

Emissivity

Although the notion of blackbody is convenient for describing radiation principles, there is no

natural object on the earth that acts as a perfect blackbody (Lillesand et al., 2015). Ground

materials are selectively radiating bodies, or selective radiators, where the amount of radiated

energy is always less than the energy emitted from a blackbody at the equivalent temperature

(Jensen, 2009; Lillesand et al., 2015). To measure the “emitting ability” of a real-world material,

emissivity (𝜀) is defined as (Lillesand et al., 2015):

𝜀(𝜆) =

radiant exitance of an object at a given temperature

radiant exitance of a blackbody at the same temperature (2.9)

The value of emissivity ranges between zero and one, and different materials have distinct

emissivity at different wavelengths (Jensen, 2009). Figure 2-5 provides an example depicting the

behaviours of three selective radiators in the 0-25 𝜇m region of the EM spectrum. As shown in

Figure 2-5, water behaves similar to a blackbody in the 0-25 𝜇m region, whereas Granite and

Dunite have a varying spectral radiant exitance at different wavelengths (Jensen, 2009). In general,

the emissivity of an object is influenced by several factors, including colour, compaction,

wavelength, chemical composition, moisture content, observation angle, surface roughness, and

field of view (Jensen, 2009; Schmugge et al., 2002; Weng et al., 2004). Salisbury and D’Aria

(1992) provided a list of emissivity of various terrestrial materials in the 8-14 𝜇m region, while

Jensen (2009), Kuenzer and Dech (2013), and Lillesand et al. (2015) summarized the emissivity

values of some typical materials.

Atmospheric Effects

The atmosphere directly determines what infrared energy can be transmitted from the terrain to a

thermal remote sensing system (Jensen, 2009). The energy with certain wavelengths would be

greatly absorbed by the atmosphere, and these regions of the EM spectrum are called absorption

bands (Jensen, 2009). Conversely, the regions that are less affected by the atmosphere are called

atmospheric windows (Jensen, 2009). Figure 2-6 illustrates the atmospheric absorption effect in

the 0-15 𝜇m region of the EM spectrum, where the grayish areas in the figure indicate the

atmosphere “closes down” the energy transmission (Lillesand et al., 2015). As shown in Figure

2-6, two typical atmospheric windows of the TIR regions are 3-5 𝜇m and 8-14 𝜇m. The other

26

wavelengths within the TIR region are significantly absorbed by carbon dioxide, water vapour,

and ozone contained in the atmosphere (Jensen, 2009).

In addition to the absorption effect, many other atmospheric constituents can significantly

influence the thermal remote sensing measurements (Lillesand et al., 2015). For instance,

suspended particles can scatter EM radiation resulting in an attenuation of the signal magnitude;

gases in the atmosphere can emit their radiation, which strengthens the energy detected by the

sensor; and the various environmental and weather conditions, such as aerosol, clouds, dust, fog,

smoke, and water droplets can all introduce noises and complexity into the data acquisition process

(Lillesand et al., 2015). Therefore, the data interpretation should take atmospheric effects into

account, and certain data cleaning and compensation strategies may be performed before the data

analysis (Jensen, 2009; Kuenzer and Dech, 2013; Lillesand et al., 2015).

Figure 2-5: Spectral radiant exitance of a) water, b) Granite, and c) Dunite in 0-25 𝜇m region at 350 K

temperature compared to a blackbody at the same temperature. Extracted from Jensen (2009).

27

Kirchhoff’s Radiation Law

The EM energy radiated from a terrain feature is often the result of the energy incident upon it

(Lillesand et al., 2015). There are three possible interactions between the object and the incident

energy, which are reflection, absorption, and transmission (Kuenzer and Dech, 2013). By using

the principle of conservation of energy, the relationship can be stated as

𝐸I(𝜆) = 𝐸A(𝜆) + 𝐸R(𝜆) + 𝐸T(𝜆) (2.10)

where 𝐸I(𝜆) is the energy incident on the object surface; and 𝐸A(𝜆), 𝐸R(𝜆), and 𝐸T(𝜆) are the

energy components absorbed, reflected, and transmitted by the object, respectively (Lillesand et

al., 2015). Equation (2.10) can be modified through dividing both sides by 𝐸I(𝜆), which gives

𝐸I(𝜆)

𝐸I(𝜆)=

𝐸A(𝜆)

𝐸I(𝜆)+

𝐸R(𝜆)

𝐸I(𝜆)+

𝐸T(𝜆)

𝐸I(𝜆) (2.11)

To simplify the notation in equation (2.11), we can further define the following:

𝛼(𝜆) =

𝐸A(𝜆)

𝐸I(𝜆), 𝑟(𝜆) =

𝐸R(𝜆)

𝐸I(𝜆), 𝜏(𝜆) =

𝐸T(𝜆)

𝐸I(𝜆) (2.12)

Figure 2-6: Atmospheric absorption effect in the 0-15 𝜇m region of the electromagnetic spectrum.

Notice the existence of atmospheric windows in 3-5 𝜇m and 8-14 𝜇m regions. Extracted from Lillesand

et al. (2015).

28

where 𝛼(𝜆) , 𝑟(𝜆) , and 𝜏(𝜆) are absorptance, reflectance, and transmittance of the object,

respectively (Jensen, 2009; Lillesand et al., 2015). By substituting equation (2.12) into equation

(2.11), the relationship becomes

1 = 𝛼(𝜆) + 𝑟(𝜆) + 𝜏(𝜆) (2.13)

which defines the absorbing, reflecting, and transmitting properties of an object under the principle

of conservation of energy (Lillesand et al., 2015; Slater, 1980).

According to the Kirchhoff’s radiation law, the spectral emissivity of an object equals to its

spectral absorptance at thermal equilibrium:

𝛼(𝜆) = 𝜀(𝜆) (2.14)

This relationship holds true in most conditions, and it is often phrased as “good absorbed are good

emitters” (Gupta, 2017; Jensen, 2009; Kuenzer and Dech, 2013; Lillesand et al., 2015).

Furthermore, real-world materials in remote sensing applications are usually assumed to be opaque

to TIR radiation, meaning that the radiant flux exits from the other side of the object is negligible,

i.e., 𝜏(𝜆) = 0 (Jensen, 2009; Lillesand et al., 2015). Hence, we can substitute equation (2.14) into

equation (2.13) and set the transmittance term to zero, resulting in

1 = 𝜀(𝜆) + 𝑟(𝜆) (2.15)

Equation (2.15) describes the important relationship between an object’s emissivity and

reflectance in the infrared region of the EM spectrum, where the higher the emissivity, the lower

the reflectance, and vice versa. For instance, water is a substance that has an emissivity close to

one, thus it absorbs almost all the incident energy and reflects very little back to the surroundings

(Jensen, 2009). Conversely, many metallic materials (e.g., aluminum foil) often have a low

emissivity, which means they absorb little and reflect most of the incident thermal energy

(Lillesand et al., 2015).

Knowing the emissivity of an object has a significant implication for relating the object’s radiant

temperature to its kinetic temperature. Recall the Stefan-Boltzmann law, i.e., equation (2.6), stated

that 𝑀 = 𝜎𝑇4. When we point a thermal sensor at a real object, the measurement we obtain is the

total radiant exitance from the surface of the object (i.e., M in the Stefan-Boltzmann law). This

measurement is made based on the object’s radiant temperature (Trad) because the radiant

29

temperature is the “external” manifestation of the object’s energy state (Jensen, 2009). The remote

sensor can only detect the external manifestation because it is not in direct contact with the

substance. In this way, we have

𝑀sensor = 𝜎𝑇rad4 (2.16)

where 𝜎 is the Stefan-Boltzmann constant of 5.6697 × 10-8 (W m-2 K-4); 𝑇rad is the radiant

temperature of the object (K); and 𝑀sensor is the total radiant exitance measured by the sensor (W

m-2) (Jensen, 2009). We can determine the object’s radiant temperature, Trad, by inverting equation

(2.16). However, the determined Trad is not equal to the object’s kinetic temperature, Tkin, mainly

due to the effect of emissivity (Lillesand et al., 2015). Therefore, we can modify the Stefan-

Boltzmann law by incorporating the emissivity of the object to the following form (Jensen, 2009;

Kuenzer and Dech, 2013; Lillesand et al., 2015):

𝑀object = 𝜀𝜎𝑇kin4 (2.17)

where 𝜀 is the object’s emissivity; 𝜎 is the Stefan-Boltzmann constant; 𝑇kin is the kinetic

temperature of the object (K); and 𝑀object is the total radiant exitance from the surface of the object

(W m-2). It is often assumed that the incorporation of emissivity can lead to equality between

equation (2.16) and equation (2.17), hence the relationship between the object’s kinetic

temperature and radiant temperature is given as (Gupta, 2017; Jensen, 2009; Kuenzer and Dech,

2013; Lillesand et al., 2015; Sabins, 1987):

𝑇rad = 𝜀1 4⁄ 𝑇kin (2.18)

This relationship demonstrates that an object’s radiant temperature reported by a remote sensor is

always less than the substance’s kinetic temperature due to the effect of emissivity (Lillesand et

al., 2015). Many thermal infrared cameras nowadays allow users to explicitly enter the material’s

emissivity to account for the abovementioned discrepancy. In this study, a typical emissivity value

of 0.95 for wet soil surface was set in the thermal system during the data acquisition, where further

information about the data collection process is elaborated in Chapter 3.

2.4.2 Considerations for Thermal Imagery Collection and Interpretation

Although thermal cameras can be operated at any time of the day and night to obtain thermal

images, the selection of optimal times for field data acquisition should consider various factors.

30

One critical element that should be considered is the diurnal temperature cycle of ground materials.

Figure 2-7 illustrates a typical 24-hour diurnal temperature variation of water and dry soil/rock

(Lillesand et al., 2015). If the relative temperature curves intersect with each other, then there is

no radiant temperature difference exists between the materials, which will result in minimal

contrast in the acquired thermal imagery (Jensen, 2009). These points are called thermal

crossovers, and there are two time periods within a day (shortly after dawn and around sunset)

when several ground materials, such as soil, water, rock, vegetation, have similar radiant

temperature (Jensen, 2009; Lillesand et al., 2015). In general, obtaining thermal infrared data at

the thermal crossovers should be avoided because the collected data provide limited information

about the different types and conditions of the objects.

In contrast, there are two favourable times for field TIR data acquisition within a diurnal cycle.

The first time period is the predawn hours. As shown in Figure 2-7, the terrestrial materials have

relatively stable temperatures during this period (4 - 5 a.m.), where the change of terrain

temperature is approaching zero (Jensen, 2009). It is often considered that a quasi-equilibrium

condition is reached during this time period, where the slopes of the radiant temperature curves

are relatively flat (Lillesand et al., 2015). In addition to the predawn, another preferable time for

thermal imagery acquisition is the early afternoon (around 2 - 3 p.m.). As shown in Figure 2-7, the

two material types have distinct behaviours under solar heating. Water has a small temperature

fluctuation across the 24-hour period, whereas dry soils and rocks have a large temperature

difference between the afternoon and predawn. The solar radiation can warm up bare soils and

rocks significantly, and the maximum temperature is often reached in the early afternoon. If a

terrain mainly consists of soil, rock, and water, the maximum scene contrast normally occurs in

these hours, and the data obtained in the early afternoon can furnish significant information for

distinguishing different materials (Lillesand et al., 2015).

Besides the diurnal effects, some other factors should also take into consideration during the data

collection and interpretation. UAV-based thermal imagery acquisition, in particular, is influenced

by various meteorological and logistic elements. For instance, the early afternoon is generally more

windy than other times within a day (Gupta, 2017), and using UAV to obtain data can encounter

inaccurate flight lines and drone instability. Also, the effects of heat dissipation and moisture

evaporation caused by the high wind speed can introduce uncertainties in the captured data (Gupta,

2017). For the predawn hours, although the convective wind currents are usually gentle in the

31

predawn, UAV navigation over a large area is difficult during periods of darkness, especially when

ground features are not physically visible by the pilot (Lillesand et al., 2015). Beyond the above-

mentioned factors, the acquired TIR data can also be affected by geographical and topographical

factors. For example, in the Northern Hemisphere, the south-facing slopes of topographic features

often experience more solar radiation than the north-facing slopes, resulting in differential heating

of the terrain (Lillesand et al., 2015). The slopes in the south side can appear hotter than the north

ones in the acquired images, which may lead to biased results in the data processing. Overall,

mission planning should be conducted by considering the project objectives as well as various

environmental and logistical factors to obtain high-quality thermal infrared data, while the

topographical and meteorological effects should not be ignored during the imagery interpretation.

Figure 2-7: Illustration of thermal crossovers and relative diurnal radiant temperature of water versus

dry soils and rocks. Modified based on Lillesand et al. (2015).

32

2.5 Deep Learning and Convolutional Neural Networks

This section provides a brief overview of the core concepts in deep learning and convolutional

neural networks that are related to the data analyses elaborated in Chapter 5 of this thesis. The field

of deep learning is a new field division of machine learning, which is rapidly evolving thanks to

the appearances of large datasets, efficient algorithms and powerful computational hardware in

recent years (Rawat and Wang, 2017; Lateef and Ruichek, 2019). The ever-changing state of the

deep learning field makes difficult to keep up with its evolution pace, and thus, only the topics and

methods that are relative to the conducted experiments are covered and explained. The interested

readers may refer to Goodfellow et al. (2016) and Aggarwal (2018) for a more thorough discussion

of deep learning theory and practice.

Over the past several years, deep learning (DL) has accomplished tremendous success in a vast

variety of application domains including computer vision (Rawat and Wang, 2017), autonomous

robotics (Pierson and Gashler, 2017), agriculture sciences (Kamilaris and Prenafeta-boldú, 2018),

medical sciences (Shen et al., 2017), remote sensing (Zhu et al., 2017), and mining (Zhang and

Liu, 2017), to name a few. It is spawned by a subfield of machine learning called Neural Network

(NN) (Alom et al., 2019), and a deep learning model is consisted of a NN with multiple layers

(Aggarwal, 2018). The term deep often refers to the number of layers involved in state-of-the-art

models, and neural originated from the loose biological similarity to human nervous system

(Aggarwal, 2018). One of the key reasons for DL to become popular is its ability to effectively

decompose latent information contained in data into a hierarchical structure, where a hierarchy of

features or patterns with different levels of abstractions can be learned at different layers of a NN

(Peretroukhin, 2020). For instance, an image of a cat may have local textures (e.g., fur) that

compose primitive(s) (e.g., tail), which belongs to a semantic object (i.e., cat). The layers that are

close to the input are empirically proved to learn low-level features (e.g., local textures), and layers

that are close to the output have the capability to capture high-level abstractions (e.g., semantics)

(Zeiler and Fergus, 2014). In this way, many practical tasks, such as image classification, object

detection, can be handled by NNs, and many state-of-the-art DL models have surpassed human-

level performance in real-world applications (e.g., He et al., 2015; Silver et al., 2017).

Depending on the specific task to be addressed, DL approaches employed in an application can

often be categorized into one of the following types: Supervised, Semi-supervised, Unsupervised

33

and Reinforcement Learning (Alom et al., 2019). Among the abovementioned categories, the

Supervised Learning technique has been widely used to develop models (algorithms) for tasks

involving perception and recognition (e.g., moisture detection, image classification), where

models are trained to learn how to associate an input with an output, given a set of examples of

inputs and labelled outputs (Goodfellow et al., 2016). The labelled outputs, 𝐲train∗ (also known as

targets), and the input examples, 𝐱train , together compose a training set (i.e., 𝒮train ≜

{𝐱train, 𝐲train∗ }). To train a DL model using supervised learning, we iteratively feed the model with

𝐱train and obtain estimated targets 𝐲train . For every iteration, we compute difference measure

between 𝐲train and 𝐲train∗ , which is called the training loss. The objective is to successively reduce

this training loss so that a mapping function, 𝐟(∙), between the input and output is learned by the

model. This process is called optimization. Ideally, we want this learned function to generalize to

new data that are not included in 𝒮train. However, this is usually not true in practice, where the

model tends to perform well on the data that it has seen during training, but not necessarily on the

previously unseen inputs. Such a phenomenon is called overfitting, and the ability for a model to

perform well on previously unobserved inputs is called generalization (Goodfellow et al., 2016).

In order to keep track on the generalization ability of the model, we need to incorporate a validation

set (𝒮val ≜ {𝐱val, 𝐲val∗ }) of examples that the model does not experience during training. This

validation set should be periodically visited by the model at training time, so that some validation

loss can be computed. Note that the validation set is not used for training the model, but for

assessing how well the model performs when encountering points outside the training set.

Typically, both the training and validation losses decrease at the early stage of a training process,

until a critical stage would be reached. After the optimal point, the training loss keeps decreasing

onward, while the validation loss starts increasing (Goodfellow et al., 2016). Figure 2-8 graphically

depicts this relationship. One may stop training the model when the loss on the validation set

reaches its minimum, and this type of training strategy is known as early stopping. A formal

definition of early stopping and some other training strategies that can be used to prevent

overfitting are described by Goodfellow et al. (2016) and Aggarwal (2018).

After the training has completed, the final performance of the trained model should be evaluated

by using a held-out test set (𝒮test ≜ {𝐱test, 𝐲test∗ }). The test set shall never be used in any way

during the learning process, and the data in test set should not contain any repeated entry in the

34

training and validation sets (Goodfellow et al., 2016). Figure 2-9 summarizes the relationships

between the three sets of data (i.e., training, validation, and test sets) and provides a roadmap of

the development process for DL models, which are designed for image classification and

segmentation tasks, using supervised learning. The topics within the green blocks are discussed in

this section, and the remaining topics are covered in later chapters of this thesis. All models

Figure 2-8: Typical relationship between model capacity and error. After reaching the optimal capacity,

the generalization error starts increasing, while the training error keeps decreasing. Modified based on

Goodfellow et al. (2016).

Figure 2-9: Summarization of the development workflow of a deep learning model using supervised

learning.

35

developed in this study are following the procedure depicted in Figure 2-9. The data preparation,

network construction, model training and evaluation, and moisture map generation processes are

elaborated in Chapter 5.

2.5.1 Deep Feedforward Networks

Deep feedforward networks, also known as feedforward neural networks, or multilayer

perceptrons (MLPs), are a typical type of deep learning model, which are designed to approximate

an arbitrary function 𝐟∗ by defining a parameterized mapping 𝐲 = 𝐟(𝐱; 𝜽) (Clement, 2020). The

values of parameters, 𝜽, are learned from data with an intent to result in the best approximation of

𝐟∗ (Goodfellow et al., 2016). The term feedforward refers to the direction of information flow

inside the model, where an input, 𝐱 ∈ ℝ𝑀, flows through the intermediate computations toward

the corresponding output, 𝐲 ∈ ℝ𝑁 , without any feedback connections during the forward

propagation of information. Feedforward neural networks are called networks because multiple

functions are chained together in the intermediate computations, and the length of the chain defines

the depth of the network (Goodfellow et al., 2016). For instance, a two-layer network may include

two functions, 𝐟1 and 𝐟2, chained together such that 𝐲 = 𝐟2 ∘ 𝐟1(𝐱) = 𝐟2(𝐟1(𝐱)). Equivalently, the

computation can be expressed as 𝐡 = 𝐟1(𝐱), and 𝐲 = 𝐟2(𝐡). In this expression, 𝐟1(𝐱) is called a

hidden layer, where 𝐡 is an intermediate result which would not be reported as an output by the

network. In contrast, 𝐟2(𝐡) is called the output layer because it is the last layer of the network, and

the output y is what would be obtained by a user. Therefore, we can call the network a two-layer

feedforward network, or a one-hidden-layer MLP (Goodfellow et al., 2016). Figure 2-10 illustrates

this one-hidden-layer MLP as a directed acyclic graph.

Figure 2-10: Illustration of a one-hidden-layer multilayer perceptrons (MLP) as a directed acyclic graph

describing the mapping 𝐲 = 𝐟2 ∘ 𝐟1(𝐱) , where 𝐱 and 𝐲 are the input and output of the network,

respectively; 𝐡 is the hidden layer; and 𝐟1 and 𝐟2 are two functions mapping a layer onto the next.

36

As mentioned previously, the functionality of a MLP is to define a parameterized function 𝐟, which

is used to approximate a complex, and often arithmetically unknown, mapping 𝐟∗ between an input

space and an output space. Training is the process of driving an estimated output (𝐲 = 𝐟(𝐱; 𝜽)) to

match the true target (𝐲∗ = 𝐟∗(𝐱)) by successively updating the network parameters 𝜽 so that the

model can give 𝐲 ≈ 𝐲∗ by the end of the training process. According to Goodfellow et al. (2016),

the training examples fed into the network only prescribe the desired behavior of the output layer,

whereas the behaviors of the intermediate layers are not directly specified by the training data. It

is the learning algorithm that decides how to use the hidden layers to best approximate 𝐟∗. This is

the reason why the intermediate layers are called hidden layers. Within each hidden layer of a

MLP, there are many hidden units (or neurons) that act in parallel. The number of hidden units

within a layer determines the dimensionality of that layer, and the dimensionality of the widest

hidden layer defines the width of the network (Clement, 2020).

To explain the computational mechanism of a MLP, we continue using the one-hidden-layer MLP

introduced above as an example. In a MLP model, each two adjacent layers are fully connected to

each other (so called fully connected layers), where every unit in the succeeding layer is a function

of all components in the preceding layer (Clement, 2020). Figure 2-11 depicts this relationship

using the one-hidden-layer MLP with four neurons in the hidden layer. In this case, the network

can be specified as

𝐡 = 𝐟1(𝐱; 𝜽1) = 𝐠1(𝐖1𝐱 + 𝐛1), (2.19)

𝐲 = 𝐟2(𝐡; 𝜽2) = 𝐠2(𝐖2𝐡 + 𝐛2), (2.20)

Figure 2-11: Illustration of a one-hidden-layer multilayer perceptrons (MLP) with four units in the

hidden layer. Left: directed acyclic graph describing the mapping 𝐲 = 𝐟2 ∘ 𝐟1(𝐱); Right: inner structure

the MLP. In this case, 𝐱 ∈ ℝ3, 𝐲 ∈ ℝ2, and 𝐡 ∈ ℝ4.

37

where 𝜽1 = {𝐖1, 𝐛1} and 𝜽2 = {𝐖2, 𝐛2} are parameters of the hidden layer and output layer,

respectively; 𝐖1 ∈ ℝ4×3 and 𝐖2 ∈ ℝ2×4 are weight matrices; 𝐛1 ∈ ℝ4 and 𝐛2 ∈ ℝ2 are bias

vectors; and 𝐠1(∙) and 𝐠2(∙) are element-wise activation functions. It is important to note that if

𝐠𝟏(∙) and 𝐠2(∙) are both linear functions, then the composition of 𝐟2 ∘ 𝐟1 is also linear, which

means the model would not be able to approximate any nonlinear mapping. Therefore, at least one

of the two activation functions must be nonlinear in order to introduce some nonlinearity into the

model. As a common practice, we often choose the activation function of the hidden layer(s) (i.e.,

𝐠1(∙) in this case) to be nonlinear and leave the output layer to be linear. By doing so, the MLP is

capable to learn a nonlinear mapping between the input, 𝐱, and output, 𝐲.

Activation Functions

The selection of activation function is an area of active research. There are a number of nonlinear

activation functions which have been proposed in the literature and implemented in state-of-the-

art models. It has been shown that the choice of activation function has significant influence on

both the performance and required training time of a NN (Glorot et al. 2011; Krizhevsky et al.,

2012). Thorough reviews on activation functions employed in deep learning models are provided

by Rawat and Wang (2017), and Nwankpa et al. (2018).

Recent practices in deep learning often employ the rectified linear unit (ReLU) introduced by Nair

and Hinton (2010) at the hidden layers of feedforward networks (Goodfellow et al., 2016). The

mathematical expression of ReLU is defined as

𝑔(𝑧) = max(0, 𝑧), (2.21)

which only retains the positive part of the activation and set the negative part to zero. Some

researchers also proposed several generalized forms of rectified linear unit such as leaky rectified

linear unit (LReLU) by Maas et al. (2013), parametric rectified linear unit (PReLU) by He et al.

(2015), and exponential linear unit (ELU) by Clevert et al. (2016). The expressions are given as

𝑔(𝑧) = {

max(0, 𝑧) + 𝛼 min(0, 𝑧) LReLUmax(0, 𝑧𝑘) + 𝛼𝑘 min(0, 𝑧𝑘) PReLU

max(0, 𝑧) + min(0, 𝛼(𝑒𝑧 − 1)) ELU (2.22)

38

where 𝛼’s are adjustable parameters, which are used to control the shapes of the functions. One

common property of these rectified linear unit variants is that they allow the activations to be

negative values so that the robustness of the models are improved (Rawat and Wang, 2017).

As mentioned in the one-hidden-layer MLP example, the output layer is often left as linear, which

means an identity activation is employed. The identity activation is defined as

𝑔(𝑧) = 𝑧. (2.23)

Figure 2-12 provides an illustration of the identity, ReLU and LReLU activation functions.

2.5.2 Convolutional Neural Networks

Convolutional networks, (LeCun, 1989) or convolutional neural networks (CNNs), are a particular

form of feedforward network that is commonly used for processing data with a grid-like topology,

such as images and time-series data (Goodfellow et al., 2016). The key distinction between CNNs

and MLPs is that typical CNNs employ not only fully connected layers in their architectures, but

also convolutional and pooling (or subsampling) layers. The convolutional and pooling layers are

often grouped into modules, and multiple modules are stacked together followed by one or more

fully connected layers to form a deep model. Figure 2-13 graphically illustrates a typical CNN

architecture, where the depicted model is designed for an image classification task (Rawat and

Wang, 2017). In the remaining of this section, the image classification task is used as an example

to facilitate the explanation of key concepts in CNN.

Figure 2-12: Illustration of the identity, rectified linear unit (ReLU), and leaky rectified linear unit

(LReLU, 𝛼 = 0.1) activation functions.

39

Fully Connected Layers

Fully connected layers are usually placed at the end of a CNN architecture to perform the function

of high-level reasoning such as identifying the corresponding class of an input image (Rawat and

Wang, 2017). As introduced in Section 2.5.1, neurons between two adjacent layers are pairwise

connected to each other, while there is no connection among units within a single layer. The

computation of a fully connected layer can be interpreted as a dense matrix-vector multiplication

followed by passing through an element-wise activation function. The computational results

contained in the output layer are often taken to represent the class scores (e.g., in classification) or

some real-valued targets (e.g., in regression).

Convolutional Layers

The convolutional (CONV) layer is the core building block of CNNs. The CONV layers serve as

feature extractors, which learn the feature representations of an input and arrange the extracted

information into feature maps (Rawat and Wang, 2017). The process of generating feature maps

is called convolution, which is an operation of using kernels to convolve over the input. Because

convolution is a linear operation, some additional nonlinearity would be required to enable a

CONV layer to learn a nonlinear mapping between the input and the output. This is achieved by

Figure 2-13: Illustration of a typical convolutional neural network (CNN) architecture designed for an

image classification task, which includes convolutional layers, pooling layers and fully connected

layers. Extracted from Rawat and Wang (2017).

40

including a nonlinear activation function (e.g., ReLU) after each convolution operation (Clement,

2020). To graphically depict this process, Figure 2-14 provides an example of a 2D convolution

followed by a ReLU nonlinearity. In Figure 2-14, each pixel (unit) in the feature map is associated

with a 2 × 2 area of the input. Each of these 2 × 2 areas is called the receptive field of its

corresponding pixel in the feature map. The size of the receptive field reflects the amount of

information that is used to obtain a result. In this case, the generated feature map has a size of 2 ×

3, which is dependent on how much the kernel moves at each step. The spatial interval of the kernel

movement is called the stride. In general, the greater the stride, the smaller the feature map would

be, if a padding strategy is not employed to extend the input data (Aggarwal, 2018).

In the example shown in Figure 2-14, the input to the CONV layer is a set of discrete data, which

are represented as a two-dimensional array. This kind of input is very common in deep learning

applications, where images are prominent examples. When working with such input, the

convolution operation can be expressed as

𝐒(𝑖, 𝑗) = (𝐈 ∗ 𝐊)(𝑖, 𝑗) = ∑ ∑ 𝐈(𝑖 − 𝑚, 𝑗 − 𝑛) 𝐊(𝑚, 𝑛)

𝑛𝑚

(2.24)

Figure 2-14: An example of 2D convolution followed by a nonlinear ReLU activation function. The

kernel is restricted to lie within the input, which is called a “valid” convolution in some contexts. The

kernel size is 2 × 2, and the input can be thought of as a 3 × 4 single-channel image. The dimension of

the generated feature map is 2 × 3, which is smaller than the input. In practice, some padding strategies

may be used to force the output to match the input’s dimension. Modified based on Clement (2020).

41

where 𝐈 is the 2D discrete input data; 𝐊 is a discrete 2D kernel; 𝐒 is the generated feature map;

symbol ∗ denotes the convolution operation; 𝑖 and 𝑗 are indices indicating the pixel location inside

the 2D input; and 𝑚 and 𝑛 are indices specifying a position within the kernel 𝐊 (Goodfellow et al.,

2016). In equation (2.24), only the weights in kernel 𝐊 are learnable entries, while the others are

either fixed values, or results computed based on 𝐊. Hence, the goal of the learning algorithm is

to search for optimal values of kernel 𝐊 for all the convolutional layers. An in-depth explanation

about the convolution arithmetic is elaborated by Dumoulin and Visin (2016).

Overall, there are three factors that have contributed to the extensive use of convolutional layers

in CNNs, namely sparse interactions, parameter sharing and equivariant representations

(Goodfellow et al., 2016). To visualize the concept of sparse interactions (also referred to as sparse

connectivity), Figure 2-15 provides a comparison between a CONV layer and a fully connected

layer with the same input and output dimensions. As demonstrated in Figure 2-15, the CONV layer

Figure 2-15: Comparison of the number of connections between a convolutional layer (top) and a fully

connected layer (bottom) with the same input and output dimensions. The convolutional layer

significantly reduces the number of connections. Top: each color of the connections represents one

kernel, and thus, three kernels are employed (i.e., orange, purple and green). Each input unit only

connects to three output units, hence a sparse connectivity. Bottom: every input unit has pairwise

connections to all the output units.

42

has sparse interactions between the input and output as compared to the fully connected layer. The

CONV layer significantly reduces the number of connections, which means it requires much fewer

computational operations to obtain the output. In addition, the CONV layer has fewer parameters,

and thus, less memory requirements. The importance of these properties become prominent when

the inputs are large images with millions of pixels. The concept of parameter sharing is shown in

Figure 2-14, where one single convolutional kernel is used to compute the entire feature map. The

parameters contained in the kernel is shared across all spatial locations of the input data. This

property of CONV layer also contributes to its small memory requirement and efficient

computation. Finally, the property of equivariant representations is actually a consequence of

parameter sharing. Due to the fact that the kernels are learned to pick up local features of the data,

and the parameters are shared across the spatial locations of the input, the same set of features can

be extracted even if the input has undergone some translation (i.e., shifting). The generated feature

maps would be translated by the same amount due to the equivariance to translation property,

which adds robustness to the feature extractors (Goodfellow et al., 2016).

Pooling Layers

A pooling layer is often located after a composition of one or multiple CONV layers to reduce the

spatial resolution of the generated feature maps. By performing the pooling operation, the resultant

output can achieve spatial invariant to input distortions and translations (Rawat and Wang, 2017).

Such a property is accomplished because the pooling layer summarizes the input data and outputs

a summary statistic of the input representations (Goodfellow et al., 2016). To visualize this

process, Figure 2-16 depicts two types of pooling, namely max pooling and average pooling,

applied to a 4 × 4 two-dimensional array. In the example shown in Figure 2-16, the pooling

operations are implemented over non-overlapping 2 × 2 areas (labelled with dotted lines) with

stride of two, and the output dimension is 2 × 2 for both cases. In max pooling, for instance, the

maximum value within each non-overlapping area is determined and reported, and the output can

be seen as a recapitulation of the input data.

Pooling layers that perform max pooling operations are commonly used in recent practice to

diminish computational burden and memory requirement of a CNN model (Rawat and Wang,

2017). Some studies have shown that the max pooling operation can help improve model

generalization and increase convergence speed when compared to conventional subsampling

43

techniques, such as average pooling, which suffers from cancellation effects between neighboring

outputs (Jarrett et al., 2009; Scherer et al., 2010; Rawat and Wang, 2017). However, one of the

biggest drawbacks of max pooling and many other pooling techniques is that the downsampling

nature of a pooling operation would result in loss of spatial knowledge of input features. Such

spatial information becomes important when precise localizations of features are required, such as

in image segmentation and object detection tasks. One alternative to pooling layers is strided

convolution, which can preserve spatial context when downsampling features (Clement, 2020).

Although max pooling is the most commonly used pooling technique in practice (Goodfellow et

al., 2016), there are many other pooling strategies which have been used and proposed in the

literature, such as Lp pooling (Sermanet et al., 2012), stochastic pooling (Zeiler and Fergus, 2013),

fractional max pooling (Graham, 2014), mixed pooling (Yu et al., 2014), spectral pooling (Rippel

et al., 2015), and transformation invariant pooling (Laptev et al., 2016). Each of these pooling

strategies has strengths and inadequacies, and additional details of different pooling methods are

discussed by Rawat and Wang (2017).

Figure 2-16: Illustration of spatial max pooling and average pooling applied to a 4 × 4 two-dimensional

array. In this example, both pooling techniques are operated over non-overlapping 2 × 2 windows

(labelled with dotted lines) with stride of two, and thus, the output dimension is 2 × 2 in both cases.

44

2.5.3 Supervised Training

Given a NN with a set of parameters 𝜽 (where 𝜽 may include, but not limited to, kernel parameters,

𝐊, and weight and bias parameters, 𝐖 and 𝐛), the objective of the training process is to find an

optimal parameter set such that a task-specific loss function is minimized. The process of

minimizing the loss function is called optimization, and it is achieved by iteratively updating the

model parameters. In recent practice, NNs are commonly trained using gradient-based

optimization algorithms (e.g., gradient descent or its variants), accompanied by the batch

normalization technique during the training process. In order to increase the generalization ability

of the model, some regularization mechanisms are often adopted in the learning algorithm to add

robustness to the trained model. It has also been found that an appropriate parameter initialization

scheme can help increase the success rate of network convergence and reduce the required training

time. In this subsection, we briefly discuss each of these supervision components, where further

information may refer to Goodfellow et al. (2016), and Rawat and Wang (2017).

Loss Function

A loss function measures the magnitude of error between the true target and the result estimated

by a given model. The choice of loss function determines how estimation error is penalized, and

thus, influence how the parameters are updated at each optimization step. Loss functions are often

task-specific and can have many different forms. For multi-class image classification, for instance,

the Categorical Cross-Entropy loss (a.k.a. Softmax loss) is the most commonly used thanks to its

simplicity and probabilistic interpretation. There are some other loss functions such as the Hinge

loss and the Triplet Ranking loss, which are also suitable for an image classification task depending

on the problem setup (Rawat and Wang, 2017). In this thesis, the Softmax loss is employed for the

image classification task described in Chapter 5, and the Softmax loss is written as

𝐿 =1

𝑁∑ 𝐿𝑖

𝑁

𝑖=1

=1

𝑁∑ −log (

𝑒𝑠𝑦𝑖

∑ 𝑒𝑠𝑗𝑗

)

𝑁

𝑖=1

, (2.25)

where 𝐿 is a scalar representing the full loss for the entire dataset; 𝑁 is the number of examples

contained in the dataset; index 𝑖 indicates the 𝑖th example of the data; 𝐿𝑖 is a scalar representing

the loss for the 𝑖th example; 𝑦𝑖 represents the true label of the 𝑖th example; 𝒔 is a vector (ℝ𝐶 , 𝐶 is

the number of classes) containing the class scores for each output class; and, 𝑠𝑗 denotes the 𝑗th

45

element (𝑗 ∈ [1, 𝐶]) of the vector of class scores 𝒔. In equation (2.25), the vector of class scores 𝐬

is usually the output of a fully connected layer, which has weights matrix, 𝐖, and bias vector, 𝐛,

as shown in Figure 2-13. Therefore, 𝑠𝑦𝑖 can be denoted as 𝑠𝑦𝑖

= 𝐖𝑦𝑖

𝑇 𝐱𝑖 + b𝑦𝑖, where 𝐱𝑖 is a vector

representing the input feature associated with the 𝑖th example; 𝐖𝑦𝑖

𝑇 is a vector denoting the

transpose of the 𝑦𝑖th column of 𝐖; and, b𝑦𝑖

is a scalar indicating the 𝑦𝑖th element of vector 𝐛.

Further details about the other loss functions may refer to Goodfellow et al. (2016) and Aggarwal

(2018).

Optimization

In the context of training a deep learning model, optimization refers to the process of minimizing

a loss function by altering the model parameters (Goodfellow et al., 2016). Modern optimization

approaches are often built on top of the gradient descent algorithm, which involves computing the

gradient of the loss function with respect to the model parameters. Gradient, in this context, is the

generalized notion of derivative, which is a vector that contains the partial derivatives of the loss

function with respect to every model parameter, denoted by ∇𝜽𝐿(𝜽) (Goodfellow et al., 2016). The

computed gradient points to the direction of steepest ascent, and thus, the negative gradient is

pointing toward the steepest descent direction. One can take a small step each time along the

negative gradient to move toward a critical point (i.e., ∇𝜽𝐿(𝜽) = 0) in the parameter space. This

process can be expressed as

𝜽 ← 𝜽 − 𝜂 ∙ ∇𝜽𝐿(𝜽), (2.26)

where 𝜂 is the step size, which is commonly called the learning rate; and symbol “←” denotes an

operation that the parameters are updated by a set of new parameter values. In this way, equation

(2.26) can be applied iteratively to the parameters until the update become sufficiently small (i.e.,

𝜂 ∙ ∇𝜽𝐿(𝜽) ≈ 0). It is important to note that the gradient descent algorithm does not guarantee to

find a set of parameters that can reach the absolute lowest value (i.e., global minimum) of the loss

function. The optimization problem is often operated in a high-dimensional space, which may have

many local minima and saddle points surrounded by flat regions (Goodfellow et al., 2016).

Therefore, the training is often stopped when the parameter configuration has led to a small loss

value but not necessarily the minimal. Figure 2-17 provides an illustration of the three types of

critical points, namely global minimum, local minimum, and saddle point.

46

In order to perform parameter updates, the back-propagation (often simply called backprop)

algorithm is employed to flow information backward through the network to compute the gradients

with respect to the model parameters by using the chain rule of calculus. For instance, if a system

has 𝐡 = 𝝍(𝐱) , 𝐬 = 𝐠(𝐡) , and 𝐳 = 𝝓(𝐬) = 𝝓(𝐠(𝐡)) = 𝝓(𝐠(𝝍(𝐱))) , where 𝐳 can be either a

scalar or a vector; 𝐱, 𝐡, and 𝐬 are vectors; and, 𝝍(∙), 𝐠(∙) and 𝝓(∙) are functions operating with

vectors and matrices, then the chain rule states that

∇𝐡𝐳 = (∂𝐬

∂𝐡)

𝑇

∇𝐬𝐳,

∇𝐱𝐳 = (∂𝐡

∂𝐱)

𝑇

∇𝐡𝐳.

(2.27)

Equation (2.27) implies that the gradient of the output 𝐳 with respect to the input x can be obtained

simply by multiplying the gradient of the succeeding layer, ∇𝐡𝐳, by the partial derivative of the

succeeding result, 𝐡, with respect to 𝐱, which is actually a Jacobian matrix, i.e., (∂𝐡

∂𝐱)

𝑇

. In this way,

Figure 2-17: Illustration of global minimum, local minimum and saddle point in an optimization

problem. Optimization algorithm does not guarantee to reach the global minimum in many cases,

because of the existence of local minima and saddle points. It is often acceptable when the algorithm

results in a reasonably small loss value. Reproduced from Goodfellow et al. (2016).

47

for each step of the backward gradient computation, we only need to compute one new component,

i.e., the Jacobian matrix of the current layer. Then the gradient of the current layer can be easily

computed by multiplying the newly computed Jacobian by the already computed gradient from the

previous step. This process is recursively performed until the gradients for all of the layers are

determined. Further details about the backprop algorithm may refer to Rumelhart et al. (1986), and

Goodfellow et al. (2016).

At this point, it is possible to use the gradient descent approach to update the model parameters

and implement the back-propagation algorithm to efficiently compute the gradients, but there is

another challenge that is required to be addressed. In practice, the number of examples contained

in a training set is usually large. Benchmark datasets often contain several thousands to several

millions of training examples (Everingham et al., 2010; Geiger et al., 2012; Lin et al., 2014; Zhou

et al., 2019), and even many custom datasets can contains a tremendous number of data points. It

is often impossible for present computational hardware to perform training over the whole training

set due to the limited memory and computational power. To encounter this challenge, the strategy

is to divide the entire dataset into non-overlapping minibatches. Each minibatch is a small subset

randomly sampled from the dataset without replacement, and only one minibatch is sent to the

model for each iteration of training. The model parameters are updated based on an incoming

minibatch at every iteration, and the method of updating parameters is called minibatch stochastic

gradient descent (SGD) if equation (2.26) is used. Every time the training set is fully sampled, it

represents that one epoch of training has elapsed. One issue associated with this stochastic

optimization process is that the minibatches sampled from the training set may have different

statistics as compared to the dataset as a whole. This can lead to the computed gradients become

noisy, which may increase the possibility of model divergence. In order to increase the success

rate of model convergence and to add robustness to the optimization process, many improved

optimization algorithms have been proposed, such as RMSprop (Hinton et al., 2012b),

ADADELTA (Zeiler, 2012), and Adam (Kingma and Ba, 2014). Further information about the

frequently used optimization algorithms for machine learning may refer to Sun et al. (2019).

Batch Normalization

Batch Normalization (BN) introduced by Ioffe and Szegedy (2015) is an adaptive

reparameterization method that is commonly used in practice. One significant reason that makes

48

BN popular is because it effectively tackles one of the fundamental challenges when training NNs

using gradient descent (GD). As mentioned above, the model parameters are updated based on the

update directions (i.e., gradients) determined by the back-propagation algorithm. The gradient of

a parameter computed by using backprop indicates the direction of steepest ascent when all other

parameters stay unchanged. Nevertheless, in practice, all the model parameters are updated

simultaneously at every iteration, and unexpected results may be observed due to the simultaneous

changes of many interdependent parameters (Clement, 2020). Batch normalization tackles this

problem by reparametrizing some layers of the model to always have standardized outputs with

zero mean and unit standard deviation (Goodfellow et al., 2016). In this way, the BN technique

reduces the severe consequences of internal covariate shift (i.e., parameter changes in early layers

result in changes of input distributions for later layers) (Rawat and Wang, 2017).

To show the reparameterization of BN, let matrix 𝐇 contains a minibatch of inputs with a

dimension of 𝑀 × 𝑁, where each row of 𝐇 represents one training example, and the number of

columns denotes the dimensionality of the inputs. To normalize the minibatch, we replace the

entries in 𝐇 by

𝐻𝑖𝑗 ←𝐻𝑖𝑗 − 𝜇𝑗

𝜎𝑗, (2.28)

where 𝑖 is the row index of 𝐇 indicating the 𝑖𝑡ℎ input in the minibatch (𝑖 ∈ [1, 𝑀]); 𝑗 is the column

index of 𝐇 indicating the 𝑗𝑡ℎ dimension of the input (𝑗 ∈ [1, 𝑁]); 𝜇𝑗 is a scalar representing the

mean of the 𝑗𝑡ℎ column; 𝜎𝑗 is a scalar denoting the standard deviation of the 𝑗𝑡ℎ column; and 𝐻𝑖𝑗

is a scalar indicating the entry of 𝐇 located at the 𝑖th row and the 𝑗th column. It is important to

note that the definitions of 𝜇𝑗 and 𝜎𝑗 are different at training time and at test time. At training time,

the column mean and standard deviation are given by

𝜇𝑗 =

1

𝑀∑ 𝐻𝑖𝑗

𝑀

𝑖=1

(2.29)

𝜎𝑗 = √1

𝑀∑ (𝐻𝑖𝑗 − 𝜇𝑗)

2+ 𝜖

𝑀

𝑖=1

(2.30)

49

where 𝜖 is a small scalar (e.g., 10−5) added to avoid the denominator in equation (2.28) being zero.

The running averages of 𝜇𝑗 and 𝜎𝑗 are recorded during training, and the values are locked once the

training is finished. At test time, the locked values of 𝜇𝑗 and 𝜎𝑗 are applied to the test input in the

same fashion as in equation (2.28), which allows the model to be evaluated at individual data points

(Clement, 2020).

As mentioned by Goodfellow et al. (2016), and Ioffe and Szegedy (2015), simply normalizing the

data to have zero mean and unit standard deviation can reduce the expressive power of the network.

Therefore, it is common to use an additional linear transformation to enhance the representation

such that

𝐻𝑖𝑗 ← 𝛾𝑗 (

𝐻𝑖𝑗 − 𝜇𝑗

𝜎𝑗) + 𝛽𝑗, (2.31)

where 𝛾𝑗 and 𝛽𝑗 (𝑗 ∈ [1, 𝑁]) are learnable parameters that are used to scale and shift the output.

Batch Normalization has been successfully applied in many state-of-the-art deep learning models

to accelerate the training process and increase the model performance (Santurkar et al., 2018).

Besides BN, there are also other normalization techniques that are proposed in the literature such

as Layer Normalization by Ba et al. (2016), Instance Normalization by Ulyanov et al. (2016), and

Group Normalization by Wu and He (2018). Each of these techniques has its strengths and

appropriate use cases, and many models developed with these methods have shown promising

results.

Regularization

Regularization refers to a broad class of strategies that are used to increase the generalization

ability of machine learning models. In practice, regularization techniques can be applied at every

stage of the model development process, including data preparation (e.g., data augmentation

techniques), model construction (e.g., pooling layers, restrictions on the parameter values), model

training (e.g., dropout, loss penalties), and even prediction stage (e.g., ensemble methods). In this

section, we focus on a regularization technique that is commonly used during the training process

of deep models, called dropout (Hinton et al., 2012a; Srivastavaet al., 2014). Further details about

the other frequently used regularization strategies for deep models may refer to the survey study

given by Moradi et al. (2019).

50

The key idea of dropout is that at each training iteration, individual units in one or more layers are

either dropped out of the model with some probability 𝑝 , or retained in the network with

probability 1 − 𝑝. In this way the incoming and outgoing connections of the dropped units are also

removed for the iteration. Model parameters are then updated only for those retained units. This

process can be thought of as training a random subnetwork of the base network at each iteration

by stochastically dropping some of the computational units. It is important to note that the

subnetworks sampled from the base model are not independent because they share the parameters.

Each of the subnetworks has a reduced capacity as compared to the base network, where the base

model after training is similar to an ensemble of all the sub-models (Goodfellow et al., 2016).

Figure 2-18 provides an illustration of dropout applied to a feedforward network during training.

At test time (i.e., after the training process), the entire base network is used, and there is no dropout

implemented.

Dropout has been proved to significantly diminish overfitting by preventing feature coadaptation

and possess an implicit bagging effect (Rawat and Wang, 2017). It can be easily integrated with

other regularization techniques such as weight decay and early stopping. Many studies have

provided profound explanations and analyses on the rationale and mechanism of dropout, and

several variants of dropout have been proposed in the literature to adapt different problem setups

(Baldi and Sadowski, 2014; Goodfellow et al., 2016; Rawat and Wang, 2017; Moradi et al., 2019).

Figure 2-18: Illustration of the forward propagation through a feedforward network using dropout.

During training, dropout can be interpreted as sampling a random subnetwork from the base network.

Left: the forward pass of the base network; Right: dropout applied to the network at training time.

51

Parameter Initialization

Parameter initialization plays a critical role in the training process of NNs. Although it occurs only

once per network at the beginning of learning, a poor initialization can lead to problems like

vanishing/exploding gradient, which can significantly hinder network convergence (Bengio et al.,

1994). In general, there are two groups of initialization approaches. The first group refers to

transfer learning, where the parameters values of a successful model developed for a proxy

application are used as the starting point for a model designed for another application. For example,

a CNN that is designed to classify images of cats and dogs may use the parameters from an already

trained animal classifier as its starting point. A secondary training stage can then be performed to

fine-tune the model based on a specialized dataset, which mainly contains photos of cats and dogs

in this case. Numerous studies have shown that transfer learning can be used to resolve the problem

of insufficient training data, increase the success rate of model convergence and shorten the

required training time of NNs (e.g., Huang et al., 2017; Tan et al., 2018). Thanks to the existence

of many benchmark datasets for general vision tasks, a lot of successful models have been

developed, and the model parameters are made available to be used (Lin et al., 2014; Everingham

et al., 2010).

Many research-oriented tasks need to train models from scratch, and the second group of

initialization approaches aim to provide robust initialization schemes for deep NNs. As shown in

equation (2.19) and (2.20), the parameters of a network often consist of weights and biases, as well

as some operation-specific parameters (e.g., learnable parameters in batch normalization and

PReLU). Most of the studies regarding parameter initialization focus on initialization of weights,

where biases are commonly set to zeros at the beginning of the training process. Among the

proposed initialization methods in the literature, two techniques are popular and often used in

practice, namely Glorot initialization (a.k.a. Xavier initialization, Glorot and Bengio, 2010) and

He initialization (a.k.a. Kaiming initialization, He et al., 2015). In Xavier initialization, the initial

weights of a layer are drawn from either a uniform distribution or a normal distribution. In the case

of uniform distribution, each entry of the weight matrix would have

𝑊𝑖𝑗𝑙 ~𝒰 [−√

6

𝑛𝑙 + 𝑛𝑙+1, √

6

𝑛𝑙 + 𝑛𝑙+1], (2.32)

52

where 𝒰[−𝑎, 𝑎] is the uniform distribution in the interval (−𝑎, 𝑎); 𝑙 is an index indicating the 𝑙𝑡ℎ

layer of the network; 𝑊𝑖𝑗𝑙 denotes the entry located at the 𝑖𝑡ℎ row and the 𝑗𝑡ℎ column of the weight

matrix for the 𝑙𝑡ℎ layer; and, 𝑛𝑙 and 𝑛𝑙+1 are called fan-in and fan-out, which are the number of

neurons within the 𝑙𝑡ℎ and the 𝑙 + 1𝑡ℎ layer, respectively. In contrast, if a normal distribution is

employed, the weights should follow

𝑊𝑖𝑗

𝑙 ~𝒩 (0,2

𝑛𝑙 + 𝑛𝑙+1), (2.33)

where 𝒩(0, 𝜎2) is a zero-mean normal distribution with a variance of 𝜎2. It is important to note

that all the weights are independent and identically distributed regardless of the distribution from

which they are drawn. It has been shown that Xavier initialization can increase convergence speed

and reduce the risk of vanishing gradient when training NNs (Glorot and Bengio, 2010). However,

the main limitation of Xavier initialization is that its derivation is based on a linear activation,

which might not be optimal when used jointly with nonlinear activation functions (Rawat and

Wang, 2017).

To improve Xavier initialization, He et al. (2015) derived a theoretically sound initialization,

known as Kaiming initialization, which is compatible with the commonly used ReLU and PReLU

nonlinear activation functions. The initial weights under Kaiming initialization follow

𝑊𝑖𝑗

𝑙 ~𝒩 (0,2

𝑛𝑙), (2.34)

where all the terms have the same definitions as in Xavier initialization. It can be easily observed

that Kaiming initialization is similar to the Xavier method, except that the weights are drawn from

a normal distribution whose variance is calculated based on only the number of neurons within the

current layer. Due to the extensive use of ReLU activation in modern deep learning models,

Kaiming initialization has been widely adopted in the literature, and it has shown its suitability for

training extremely deep networks (He et al., 2015; Rawat & Wang, 2017).

53

2.6 Convolutional Neural Network Based Surface Water and Moisture

Recognition and Monitoring

To the best of our knowledge, not much attention in the literature has been put on utilizing

convolutional neural networks (CNNs) for heap leach pad (HLP) monitoring, especially HLP

surface moisture mapping. However, CNNs have been widely adopted in agriculture, remote

sensing, and several other study areas thanks to their ability to generate accurate and precise

predictions (Ball et al., 2017; Kamilaris and Prenafeta-boldú, 2018a; Li et al., 2018; Ma et al.,

2019; Zhu et al., 2017).

CNNs are most commonly used to process data with grid-like topology (e.g., images, video and

time-series data), where the multilayer structure allow the models to extract low-, mid- and high-

level features from the data and provide a hierarchical representation of the model input (Garcia-

Garcia et al., 2017; Goodfellow et al., 2016; Kamilaris and Prenafeta-boldú, 2018a). The key

advantage of CNN is that the extracted features are learned automatically from the data through

the training process. There is no hand-engineered feature required, which significantly improves

the models’ generalization ability when applied to data within the same problem domain but

previously unobserved by the models (Goodfellow et al., 2016; Pan and Yang, 2010; Weiss et al.,

2016; Zhu et al., 2017). Beyond the capacity of learning features automatically, CNNs are often

robust against challenging image conditions, including complex background, different resolution,

illumination variation, and orientation changes (Amara et al., 2017). In contrast, many traditional

signal and image processing approaches rely on hand-engineered features, where the feature

engineering is not only time-consuming and tedious but also dataset specific, which is often subject

to generalization issues (Kamilaris and Prenafeta-boldú, 2018a; Li et al., 2018). Xia et al. (2017)

and Li et al. (2018) conducted comparative studies regarding the performance of different image

classification methods, which use low-level (e.g., SIFT), mid-level (e.g., Bag of Visual Words),

and high-level (i.e., CNN) image features, on multiple datasets, such as RSSCN7 (Zou et al., 2015),

AID (Xia et al., 2017), and UC-Merced (Yang and Newsam, 2010). The experimental results

demonstrated that CNN models, as high-level feature extractors, surpass the performance of

traditional handcrafted feature-based methods by a significant margin (Xia et al., 2017).

In addition, CNNs are shown to achieve comparable and usually more efficient and accurate

inference performance than other machine learning algorithms, such as support vector machine

54

(SVM), k-nearest neighbours (KNN), and random forest (Kemker et al., 2018; Paisitkriangkrai et

al., 2016; Zhang et al., 2018). Kemker et al. (2018) introduced a semantic segmentation dataset

(called RIT-18) and compared the performance of two CNNs and several machine learning

algorithms (e.g., KNN, SVM, multiscale independent component analysis) on RIT-18. The

experimental results showed that the CNN models significantly outperformed the other algorithms

in terms of prediction accuracy, and Kemker el al. (2018) mentioned that the deep features learned

by CNNs can generalize better than handcrafted features across different datasets. Kussul et al.

(2017) carried out an application of land cover and crop type classification using one multilayer

perceptron (MLP), one random forest (RF), and two CNN models based on satellite imagery. The

two CNN models consistently accomplished higher accuracy in classifying water body and the

various crop types than the RF and MLP models. Kussul et al. (2017) concluded that the CNN

models were able to build a hierarchy of sparse and local features, which contributed to their better

performance over the other two methods in their experiment.

There are many studies that utilize CNNs for surface water and moisture recognition. Isikdogan et

al. (2017) proposed a CNN named DeepWaterMap, which was used to generate surface water

maps on a global scale based on Landsat satellite imagery. The experiment was formulated as a

semantic segmentation task (known as image classification in remote sensing), and the network

architecture of DeepWaterMap had a typical encoder-decoder structure (Hinton and

Salakhutdinov, 2006). Isikdogan et al. (2017) stated that the trained DeepWaterMap model was

capable of learning the shape, texture and spectral response of surface water, cloud, and several

other land cover types, where the generated results from their experiment demonstrated the

model’s ability to discriminate water from surrounding land cover. Rather than using satellite

imagery, Fu et al. (2018) performed a land use classification application based on remotely sensed

visible-light (RGB) images acquired by aerial platforms. The three datasets employed in the study

covered agriculture and urban areas, and they proposed a blocks-based object-based image

classification method that embedded a CNN model to determine the types of land use (e.g., water,

building, road). Fu et al. (2018) highlighted that the high-level features extracted by the CNN

model were effective for complex image pattern descriptions, which facilitated their method to

achieve an end-to-end land use classification without the time-consuming design process of

handcrafted features. Moreover, CNNs can be also applied to grid-like data derived from other

types of signals. Wang et al. (2018) collected soil echoes using an ultra-wideband radar sensor

55

over an approximately 50 m2 bare soil field. The recorded soil echoes were transformed into time-

frequency distribution patterns, and two CNN architectures, AlexNet (Krizhevsky et al., 2012) and

Visual Geometry Group (Simonyan and Zisserman, 2014), were employed to classify the time-

frequency patterns with different moisture levels. Besides the applications in remote sensing and

agriculture, Zhao et al. (2020) applied a CNN model to detect potential water leakage in a metro

tunnel by locating moisture marks of shield tunnel lining. They mentioned that moisture marks are

caused by water ingress through cracks in concrete, and early notice of such defects has significant

meaning for avoiding ground failure. They acquired RGB images through a platform called

Moving Tunnel Inspection equipment, and a CNN architecture, Mask R-CNN (He et al., 2017),

was employed to perform instance segmentation on their obtained images. The experimental

results demonstrated that the trained model was able to locate and segment the moisture marks in

the images accurately and efficiently (Zhao et al., 2020).

In addition to recognizing the existence of surface water and moisture, a number of studies in the

literature have applied CNNs to perform soil moisture estimation. Ge et al. (2018) adopted a CNN

model with 1-D convolutions to estimate soil moisture on a global scale based on satellite data.

They combined information from four different sources to generate input vectors and compared

the model’s performance against a fully connected feedforward NN. The experimental results

indicated that the CNN model produced reasonable moisture estimates and achieved a better

performance than the NN on their custom dataset (Ge et al., 2018). Similarly, Hu et al. (2018)

employed a CNN model to retrieve global soil moisture based on passive microwave satellite

imagery. They used the Advanced Microwave Scanning Radiometer - Earth Observing System

(AMSR-E) brightness temperatures as input data and compared the performance of a CNN model

(with a linear output layer) against a support vector regression (SVR) model. The experimental

results showed that the CNN model could produce more accurate moisture estimates than their

SVR model. Besides using satellite data to estimate soil moisture on a large scale, Sobayo et al.

(2018) collected in-situ moisture measurements in three farm areas and acquired thermal images

by attaching a thermal infrared camera to an unmanned aerial vehicle. They trained a CNN-based

regression model to learn the correlation between the remotely sensed soil temperature and the in-

situ moisture measurements. The experimental results demonstrated the effectiveness of using the

CNN model to generate relatively accurate soil moisture estimates based on the custom dataset

(Sobayo et al., 2018).

56

Chapter 3 Field Data Collection

Field Experiment and Data Acquisition

Field experiment and data acquisition were conducted over a sprinkler-irrigated heap leach pad at

McEwen Mining’s El Gallo gold mine located in Sinaloa State, Mexico, from March 5th to 8th,

2019. This chapter provides an overview of the mine site where the data were collected, the

equipment used, and the methodology for collecting the data. The content covered in the chapter

is largely reproduced from Section 3 of the author’s conference publication (Tang and Esmaeili,

2020).

3.1 Site Information

The McEwen Mining’s El Gallo gold mine is located in Sinaloa State, Mexico, approximately 350

km northwest of Mazatlán, 100 km northwest of Culiacan, and 40 km northeast of Guamúchil

(Figure 3-1). The mine is resided in the Lower Volcanic Series of the Sierra Madre Occidental,

dominated by rocks of andesitic composition (Medinac, 2019). The mineralization is hosted within

a northeast structural trend with numerous sub-structures (Bamford et al., 2020). Gold was the

primary metal produced, and heap leaching (HL) was used to extract the metal from the crushed

ores.

Figure 3-1: Location of the El Gallo mine.

57

The heap leach pad (HLP) was located north of the mine site and the footprint of the HLP was

approximately 22 hectares. The HLP adopted a sprinkler irrigation system with a sprinkler spacing

of 3 m, and dilute cyanide solution was applied continuously during the field experiment. The flow

rate of the HLP was 600 m3/hr, and an average irrigation rate of 8 L/hr/m2 was used for the

irrigation system. The designed lift height of the HLP was 10 m and the overall heap height was

80 m. The HLP material was crushed gold-bearing rock with a particle size distribution (PSD) as

illustrated in Figure 3-2. The PSD curves in Figure 3-2 were generated from both on-site mesh

sieving and laboratory sieving results. Based on the PSD, the HLP material had an 80% passing

size (P80) of 8 to 10 mm, and the material was treated as coarse-grained soil (ASTM, 2017).

3.2 Equipment

During the field data collection, a commercially available UAV platform, DJI Matrice 600 Pro,

was equipped with one thermal camera, DJI Zenmuse XT 13 mm, and one digital camera, DJI

Zenmuse X5, in a dual gimbal setup. The specifications of the cameras are listed in Table 3-1. The

dual gimbal system consisted of a front gimbal, which was installed with a global positioning

system (GPS), and a bottom gimbal (without a GPS), which was located at the geometric center

of the UAV platform. This dual gimbal setup allowed for the acquisition of both thermal and

regular RGB images simultaneously. The UAV was selected because it had sufficient payload

Figure 3-2: Material particle size distribution of the studied heap leach pad

58

capacity to carry the dual gimbal system during the experiment. The two cameras were selected

because they could take images with high resolutions while easily integrating with the existing

gimbal system. It is worthwhile to mention that the spectral band captured by the thermal camera

was 7.5–13.5 μm, which is known as the commonly used atmospheric window for aerial sensing

within the TIR region of the EM spectrum (Gupta, 2017). Figure 3-3 shows the equipment used

during the field experiment. The other equipment used included one BIKOTROIC BTH Portable

Sand Moisture Meter, one Dr. Meter LX 1330B digital light meter, and one Protmex MS 6508

Digital Thermo-hygrometer.

Table 3-1: Thermal and digital cameras specifications

Specifications DJI Zenmuse XT 13 mm DJI Zenmuse X5

Dimension 103 mm x 74 mm x 102 mm 120 mm × 135 mm × 140 mm

Weight 270 g 530 g

Maximum Resolution 640 × 512 4608 × 3456

Angle of View 45o × 37o 72o

Gimbal Accuracy ±0.03o ±0.02o

Spectral Band 7.5 – 13.5 μm RGB

Source: DJI 2019a, 2019b

Figure 3-3: Equipment used during the field experiment

59

3.3 Field Experiment and Data Collection

The duration of the field experiment was from March 5 to March 8, 2019. There were two phases

of the field experiment. The first phase focused on generating a detailed survey plan and placing

ground control points (GCPs) within the study area. The second phase (Mar. 6 to Mar. 8, 2019)

included data acquisition using the generated survey plan. In phase one, a flight was conducted

using the UAV platform equipped with the dual camera system to acquire images that covered the

entire HLP. An orthomosaic (or true orthophoto, which is generated based on an orthorectification

process), of the HLP was then generated by using the OpenDroneMap software to help facilitate

the survey planning. Using the orthomosaic of the HLP, the locations of the GCPs were determined

based on their accessibility over the HLP. A detailed survey plan, which included the mission

count, take-off location, flight altitudes, and flight times and durations, was then generated. In the

second phase of the field experiment, two data collection campaigns were conducted in each

surveying day, one in the morning (10 a.m.) and the other in the afternoon (2 p.m.). This survey

schedule was adopted to comply with the mine site’s shift schedule and safety policies, although

the predawn hours were considered as best for thermal infrared surveys because of the minimal

temperature variation caused by differential solar heating and logistic reasons (Gupta, 2017).

During each data collection campaign, two flight missions were carried out, and Table 3-2

summarizes the details of the flight plans. Twelve GCPs were placed at the designated locations

by on-site technical staff, and the GPS coordinates of the GCPs were recorded by a portable GPS

device. Figure 3-4 illustrates the GCP locations with respect to the HLP. It is worthwhile to note

that five of the twelve GCP locations were selected to be the sampling locations, where samples

were collected near these GCPs.

Table 3-2: Details of flight missions for phase two of the field experiment

Flight Mission Parameters Flight Mission 1 Flight Mission 2

Area of study Top two lifts of the HLP Entire HLP

Footprint of studied area 4 hectares 22 hectares

Flight altitude* 90 m 120 m

Ground sampling distance** 12 cm/pixel 15 cm/pixel

Flight time 7 min/mission 24 min/mission

Number of RGB images 80 images/mission 280 images/mission

Number of thermal images 170 images/mission 620 images/mission * The flight altitude is with respect to the take-off location ** The ground sampling distance is with respect to the thermal images

60

For each data collection campaign, five ground samples were collected at the sampling locations

during the time of flights. These samples were sent to the on-site laboratory to measure specific

gravity and gravimetric moisture content. These measurements were used as ground truth to

facilitate and validate the remote sensing results. Care was taken during the sampling process to

collect only the surface material (top 5 to 10 cm) from the HLP.

There were five members involved in each surveying campaign. Two technical staff for ground

sample collection and three members for UAV data acquisition. Due to the large extent of the

study area, the time spent by the technical staff to collect ground samples at the selected locations

was approximately the same as the total flight time of the two flight missions.

During each UAV data collection campaign, the following setup was used to acquire the aerial

images: the thermal and RGB cameras were operated by using the DJI GO and DJI Ground Station

Pro applications, respectively, where the gimbal pitch angles were both set to 90o downward to

face the HLP surface; the image acquisition rates were set to two seconds per thermal image and

Figure 3-4: Flight mission 2 (light yellow) and locations of ground control points (GCPs) with respect

to the heap leach pad. Five sampling locations are shown as green circles.

61

five seconds per RGB image, where the thermal image format was set to R-JPEG (Radiometric

JPEG) format, and the colour image was set to JPEG format; and for each pair of adjacent images,

the front and side overlap were designed to be 85% and 70%, respectively. The user-defined

external parameters in the DJI GO application were set based on the field conditions at the time

of image acquisition, where the scene emissivity for all missions was set to 0.95, which is a typical

value for wet soil surface (Jensen, 2009). The atmospheric temperature and relative humidity were

measured using a thermohygrometer, and the corresponding values were inputted into the

application. The parameter of reflected apparent temperature was set to the same as the

atmospheric temperature, and the external optics temperature and transmittance were set to 1.0

(Zwissler, 2016). The acquired images were presented in 8-bit grayscale JPEG files when exported

to computing devices, where the remotely sensed surface temperature at each pixel location was

stored in the metadata of the image. The surface temperature sensed by the thermal camera can

then be extracted by using external software tools such as FLIR Atlas SDK for MATLAB. In this

study, we used the remotely sensed surface temperature (TRS) acquired by the thermal camera as

an approximation of the actual HLP surface temperature to perform data analysis.

By the end of the field experiment, twenty-four sets of data with approximately 6,900 images were

collected in total. This included 12 sets of colour images and 12 sets of thermal images. Table 3-3

summarizes the number of data collected at each flight mission, where the thermal and RGB colour

images are reported separately.

Table 3-3: The number of colour and thermal collected during the field experiment*

March 6, 2019 March 7, 2019 March 8, 2019

Morning Afternoon Morning Afternoon Morning Afternoon

Whole HLP** T: 620

C: 273

T: 618

C: 281

T: 618

C: 270

T: 621

C: 289

T: 618

C: 290

T: 619

C: 275

Top two lifts** T: 178

C: 58

T: 170

C: 74

T: 174

C: 74

T: 169

C: 81

T: 170

C: 73

T: 173

C: 76

* Overall, there were 24 sets of data collected, where 12 sets were colour images and the other 12 sets were thermal images

** T: number of thermal images; C: number of colour (RGB) images

62

Chapter 4 Surface Moisture Mapping Based on Thermal Imaging

Mapping Heap Leach Pad Surface Moisture Distribution

Based on Thermal Imaging

Chapter 3 provided a description of the studied heap leach pad (HLP) and depicted the data

acquisition using the UAV system. This chapter outlines how the acquired thermal images are used

to create surface moisture maps and includes a discussion on the effectiveness and limitations of

the proposed method. The data analysis presented in this chapter focuses on analyzing the six sets

of thermal images covering the whole HLP (approximately 3,700 image), and the results of

processing the rest of the data are elaborated in Chapter 5. The material contained in this chapter

is largely reproduced from the author’s paper: “Mapping Surface Moisture of a Gold Heap Leach

Pad at the El Gallo Mine Using an UAV and Thermal Imaging”, which has been submitted to the

Mining, Metallurgy & Exploration Journal for publication.

4.1 Overview

After acquiring data from the field experiment, data processing and moisture map generation were

conducted off-line. As mentioned previously, the heap leach pad material was treated as coarse

grained soil according to its particle size distribution (ASTM, 2017). It would be adequate to apply

a remote sensing based surface soil moisture (SSM) retrieval method to estimate the surface

moisture distribution over the HLP. Therefore, the acquired thermal images and in-situ moisture

measurements from the collected samples were first used to derive an empirical relationship

between the surface moisture content and the remotely sensed surface temperature using linear

regression. Moisture distribution maps were then generated by using the regression model to

visualize the moisture variation over the HLP.

A general workflow of the data analysis process is illustrated in Figure 4-1. The remainder of this

chapter provides the implementation details of each data processing step, and the generated

orthomosaics and moisture maps are presented to illustrate the analysis results.

63

4.2 Data Preprocessing

As mentioned above, there were six sets of thermal images with respect to the whole HLP, and the

preprocessing of data was independently performed for each of these six datasets in two different

procedures. The first procedure is referred to as a data cleansing step, where the corrupted and

low-quality (i.e., inappropriately exposed) images were manually removed from each of the

datasets. In the second procedure, an intensity transformation and mapping script written in

MATLAB using the FLIR Atlas SDK was run to first determine the highest and lowest remotely

sensed surface temperature (denoted as Tmax and Tmin, respectively) within each dataset. In this

way, there were six pairs of Tmax and Tmin determined, where each pair was associated with one of

the thermal datasets. By using the maximum and minimum surface temperatures, the pixel

intensity values of the thermal images were then mapped by using equation (4.1).

𝐼𝑥,𝑦

(𝑖)= 𝑟𝑜𝑢𝑛𝑑(

𝑇𝑥,𝑦(𝑖)

− 𝑇min

𝑇max − 𝑇min× 255) (4.1)

In equation (4.1), 𝑟𝑜𝑢𝑛𝑑(∙) is the rounding operator that returns an integer pixel intensity value

ranging from 0 to 255, which is the bit-depth range for an 8-bit image; Tmax and Tmin are the highest

and lowest remotely sensed surface temperature in degree Celsius of the current dataset,

respectively; the superscript (i) denotes the ith image in the current dataset; 𝑇𝑥,𝑦(𝑖)

is the remotely

sensed surface temperature in degree Celsius at the 𝑥th row and 𝑦th column pixel location of the

ith image; and 𝐼𝑥,𝑦(𝑖)

is the output pixel intensity value at the (𝑥, 𝑦) pixel location of the ith image in

the current dataset. The outputs of this process were a set of single-channel 8-bit raster data in

matrices format. For each matrix, the single-channel raster data were replicated three times to

Figure 4-1: General workflow of the data processing and moisture map generation.

64

generate an RGB image with three channels having the same intensity values. By following the

above procedure, six sets of grayscale images were generated with the same number of images as

the input thermal image datasets. Figure 4-2 provides a visual comparison example between the

initial and processed images. The images included in Figure 4-2 were four thermal images taken

in a successive order. As seen in Figure 4-2, the initial images undergo rapid intensity changes due

to the camera’s built-in exposure adjustment. Therefore, this pre-processing step was performed

to ensure that the images used to generate orthomosaics were following a consistent intensity scale.

4.3 Linear Regression Model Development

A linear relationship between the HLP surface temperature and material gravimetric moisture

content, 𝜔, was derived based on the acquired data. Several assumptions were made during the

data analysis process in order to facilitate the regression model development. It was assumed that

the chemical composition and material roughness were relatively uniform for the top 5 cm of the

HLP surface, and the remotely sensed surface temperature captured by the thermal camera could

be used as an approximation to the HLP surface temperature. It was also assumed that the sensor

noises of the camera were independent and identically distributed (IID) random variables. Under

Figure 4-2: Visual comparison example between initial and processed thermal images taken in a

successive order: (a) Output images after the preprocessing step; (b) Initial thermal images taken by

thermal camera.

65

these assumptions, the linear regression model was developed based on the remotely sensed

surface temperature values (TRS) and the measured gravimetric moisture contents at the sampling

locations. To determine the TRS values at the sampling locations, the following steps were

employed independently for each of the thermal datasets:

1) The thermal images that covered the sampling locations were manually identified;

2) The pixel coordinates associated with the sampling locations were then pinpointed by using

the GIMP software within the images that were identified in the previous step;

3) After determining the pixel coordinates at which the sampling locations were located, a 5

× 5 average kernel was applied at these pixel locations to calculate the TRS values. An

illustration of the temperature determination step is depicted in Figure 4-3.

Note that a 5 pixels by 5 pixels area on the image plane represents approximately a 75 cm by 75

cm area on the HLP surface. It was assumed that the measured moisture content from the collected

samples represents the average surface moisture of the 75 cm by 75 cm area. The above process

Figure 4-3: Determination of the remotely sensed surface temperature at a sampling location. The pixel

coordinate associated with the sampling location (labeled in green) was first pinpointed within the

thermal image on the left. The average temperature of a 5 pixels by 5 pixels area (labeled in blue on the

right) was then calculated to represent the TRS at this sampling location.

66

would result in multiple TRS values associated with each of the sampling locations. The average of

the values corresponding to the same location was used to represent the approximate surface

temperature at that sampling point. In this way, 30 pairs of TRS and moisture measurements (i.e.,

five from each dataset) were determined, and a univariate linear regression model was developed

based on these 30 data points. The resultant linear relationship is expressed as equation (4.2).

𝜔 = −0.5103(TRS) + 23.77 (4.2)

In equation (4.2), 𝜔 is the HLP surface material gravimetric moisture content (%); and TRS is the

remotely sensed surface temperature (oC). Figure 4-4 illustrates the linear relationship between 𝜔

and TRS, and Figure 4-4 compares the measured moisture content to the predicted values calculated

using equation (4.2). Overall, the model demonstrates a good agreement between the predicted

and measured moisture contents with a R2 of 0.7409, and a root mean square error (RMSE) of

1.28%.

Figure 4-4: (a) Empirically derived univariate linear regression between gravimetric moisture and

remotely sensed surface temperature; (b) Predicted vs. measured gravimetric moisture content (%).

There were five samples (one from each sampling location) collected for every data collection campaign.

The surveying team conducted two campaigns per day for three successive days. Therefore, there are

30 data points (i.e., groundtruth samples) involved in both (a) and (b).

67

4.4 Orthomosaics Generation

Orthomosaics were generated in the Agisoft Metashape software by using the preprocessed

images. Software parameters were set to generate orthomosaics with the highest possible spatial

resolution. The generated orthomosaics had a ground sampling distance of approximately 10

cm/pixel, and the outputs were in 8-bit image format. Overall, six orthomosaics were generated,

where each one was associated with one of the thermal datasets with respect to the entire HLP.

During the orthomosaic generation process, after the importation of the images into the software,

the images were automatically registered based on their georeference information (i.e., GPS

information) from the images’ metadata. The image and camera alignment functions were used to

generate sparse point cloud, followed by dense point cloud generation. The GCP coordinates were

also imported during the point cloud generation step to increase the accuracy of the dense point

cloud. After generating the dense point cloud, the orthomosaic was generated and exported as an

image in TIFF format. This process was repeated using the same software settings for the six sets

of preprocessed images. The generated orthomosaics from each of the datasets are shown in Figure

4-5.

4.5 Moisture Maps Generation

Moisture maps were generated by using the orthomosaics and the linear regression model. Each

orthomosaic was first imported into the QGIS software, and the “Raster Calculator” function was

used to map the pixel intensity values to surface temperature values through equation (4.3).

𝑇𝑥,𝑦 = 𝑇min + (𝑇max − 𝑇min)

𝐼𝑥,𝑦 − 𝐼min

𝐼max − 𝐼min (4.3)

In equation (4.3), 𝑇𝑥,𝑦 is the remotely sensed surface temperature (oC) at the 𝑥th row and 𝑦th

column pixel location of the current orthomosaic; Tmin and Tmax are, respectively, the highest and

lowest remotely sensed surface temperatures (oC) of the current thermal dataset, which have been

determined in the image pre-processing step; 𝐼max and 𝐼min are the maximum and minimum pixel

intensity values; and 𝐼𝑥,𝑦 is the pixel intensity value at the (𝑥, 𝑦) pixel location of the current

orthomosaic. By doing so, every pixel location of the orthomosaics would have its corresponding

TRS, and thus, moisture maps can be generated by applying equation (4.2) to each pixel in the

orthomosaics. The generated moisture maps for the thermal datasets are illustrated in Figure 4-6.

68

Figure 4-5: Generated orthomosaics of the HLP by using the acquired thermal image datasets.

69

Figure 4-6: Generated moisture maps of the HLP by using the orthomosaics and the linear regression

model.

70

4.6 Discussion and Conclusion

The results shown in Figure 4-5 and Figure 4-6 demonstrate the feasibility of mapping HLP surface

moisture distribution using remotely sensed thermal images. The generated HLP surface moisture

maps have a temporal and spatial resolution hardly achievable by conventional point-measurement

methods. In addition, the proposed method is highly practical, which makes it beneficial to HL

monitoring.

The proposed method is practical for its efficient product generation process and the adequate

accuracy of the generated results. As mentioned previously, the linear model directly relates

material surface moisture content to remotely sensed surface temperature, which does not require

further effort to collect additional ancillary data. As soon as the model is developed, a moisture

map can be generated within an hour of the data acquisition, and the spatial distribution of the

material moisture over the HLP can be intuitively visualized.

From the results shown in Figure 4-6, we can directly acquire an understanding of the spatial

distribution of surface moisture over the HLP. This allows a mine manager to make decisions

based on the generated results, and the moisture maps can be used as a guide to evaluate the

performance of the irrigation system. In our case, we can conclude that the solution application in

the west is more abundant than in the east. A relatively poor moisture coverage can be consistently

observed in the southeast area of the HLP, and particular attention should be paid to identify the

possible operational issues in this region. One may also notice that there is one region in the north

showing less moisture coverage as compared to the surrounding area (i.e., the orange stripe

surrounded by blue colour at the upper part of each moisture map). This region was actually the

toe of the ore pile, and there was no sprinkler installed at the location, thus, the dryness was

expected. In addition to providing overviews of the HLP surface, the spatial resolution of the

moisture maps is high enough to even show the performances of individual sprinklers. By looking

at the center portion of the HLP, it can be found that several sprinklers result in smaller areas of

influence as compared to the others. This may indicate a requirement for sprinkler replacement or

descaling.

From an HLP monitoring perspective, it would be difficult for the HLP manager to precisely

control the surface wetness for every inch of the pile. Instead, a more practical strategy is to

maintain the majority of the surface moisture falling within an acceptable range (this range should

71

be defined based on site-specific operational practice), while avoiding the creation of solution

ponds and extremely dry areas over the HLP. In our case, if we define the area with a moisture

level below 3% as dry and above 9% as wet, we can quickly pinpoint the regions associated with

extreme moisture conditions; for example, the northwest and the southeast regions in the March

6, Morning dataset. These areas might arise operational concerns because the extremely dry

regions may imply sprinkler defects or ineffective leaching conditions, while the regions that show

an extremely high moisture level may be subject to ponding issues. Actions can be taken by

technical staff to further investigate these regions with the help from the generated moisture maps.

This increases the efficiency and effectiveness in resolving operational issues and streamlines the

entire monitoring process. In addition, since the moisture maps provide direct visualization of the

spatial variation in material surface moisture, they can be involved in the irrigation optimization

process to quantitatively depict the performances of different solution application strategies.

Despite the benefits discussed above, the empirically derived linear model has its imperfections.

Thus, it is important to understand the principal foundation, limitation, and possible improvement

of the proposed method. In general, remote sensing (RS) approaches are essentially approximation

techniques that inevitably include errors in their estimated results. Assumptions and/or

simplifications are commonly employed up to a certain extent during the RS model development

process to balance accuracy, efficiency, and practicality. As shown in Figure 4-6, there are

inconsistencies between datasets regarding the estimated surface moisture content from one dataset

to another. This is due to the inability of the linear model to take the effects of all the influential

factors into account when relating the remotely sensed surface temperature to the surface moisture

content of the HLP material. In the remainder of this section, some of the factors that may have

contributed to the generated results are discussed, and several recommendations regarding model

improvement are also provided.

The basis of TIR remote sensing is to use a thermal sensor placed at some distance from an object

to measure the EM energy emitted from the object’s surface (Jensen, 2009). A material that has a

temperature higher than absolute zero (0 K) emits thermal infrared EM radiation, and the amount

of the emitted energy is a function of its true surface temperature (i.e., kinetic temperature) and

emissivity. The kinetic temperature of any ground feature on the earth is affected by the heat

sources, atmospheric effects, and material thermal properties; while the emissivity of the object is

the result of its composition and surface geometry (Gupta, 2017). It is important to note that some

72

of these factors are interdependent, for example, the composition of an object also affects its

physical and thermal properties.

In this study, the primary heat source of the HLP surface was the Sun, and the spatial and temporal

surface temperature variation was mainly due to solar heating. The energy from the incoming sun

rays led to changes in the kinetic temperature of the leach pad surface, and the maximum surface

temperature was expected to occur in the early afternoon (around 2 p.m.) (Gupta, 2017). In general,

the amount of incident solar energy over the HLP was not spatially uniform. The amount of solar

radiation received by an area on the HLP depended on several parameters such as topographical

relief, slope aspect, as well as solar zenith and azimuth angles. The solar angles were functions of

the latitude of the site and time of day and month (Kalogirou, 2013). Also, slopes with different

orientations underwent differential heating, and the magnitude of heat energy received was

affected by sun position and surface orientation. The differential heating that occurs at different

times of the day can be easily observed when comparing the morning datasets to the afternoon

datasets in Figure 4-5 and Figure 4-6. Because the site was located in the northern hemisphere, the

sunrise occurred in the southeast, and thus more solar radiation was incident toward the southeast

corner of the HLP in the morning hours. This resulted in a higher remotely sensed surface

temperature in the area. In contrast, ground materials generally reach their maximum temperature

in the early afternoon (Gupta, 2017; Jensen, 2009; Lillesand et al., 2015), and the detected surface

temperature was more uniform as shown in the afternoon datasets. To graphically depict the

different positions of the Sun related to the HLP, an illustration of the Sun’s daily path from sunrise

to sunset is shown in Figure 4-7 (Kalogirou, 2013). According to Lillesand et al. (2015), south-

facing slopes in the northern hemisphere generally receive more solar heating than the north-facing

ones. This phenomenon can be intuitively recognized from Figure 4-7, and the influence of the

differential heating on the HLP surface can also be observed in Figure 4-5 and Figure 4-6. One

may develop a moisture estimation approach that incorporates the site latitude and solar angles to

enhance estimation accuracy and model generalizability (Liu and Zhao, 2006).

Another factor that can influence the kinetic temperature of the ground surface is active thermal

sources. For HLPs that involve extensive exothermic reactions (e.g., rapid sulfide-to-sulfur and

sulfur-to-sulfate oxidation of sulfide minerals), the chemical reactions can result in self-heating of

the leach pad material, and the heat generated in the HLP can be transferred to the surface via

convection of fluid and/or conduction through solid. Remote sensing models that are developed

73

based on surface data may be incapable of explaining the sophisticated patterns of heat generation

and transportation inside the HLP. Although the heat energy introduced by chemical reactions was

considered insignificant in this study, future improvement of the proposed method may incorporate

the results provided by numerical programs that can model the internal behaviours of the HLP to

complement the estimation of surface moisture content.

In addition to the heat sources, the atmosphere plays a critical role in both downwelling and

upwelling energy transfer. It affects not only the magnitude and spectral composition of the solar

radiation received by the ground surface but also the intensity and components of the energy

recorded by a thermal remote sensing system (Lillesand et al., 2015). Gases and suspended

particles in the atmosphere can absorb, scatter, and emit radiation during the energy transfer, which

may attenuate, strengthen, or transform the radiation emitted from ground objects before reaching

the thermal camera (Lillesand et al., 2015). Water vapour, in particular, may absorb radiation

emitted from the material surface, leading to a decrease in the energy detected by the sensor

(Lillesand et al., 2015). Several studies in agriculture and mine tailing impoundment monitoring

have shown that the effect of atmospheric humidity during the data collection should be considered

when the remotely sensed thermal data are used for estimation of material surface moisture (Liu

and Zhao, 2006; Sugiura et al., 2007; Zwissler et al., 2017). In our case, the measured relative

Figure 4-7: Illustration of the Sun’s positions related to the HLP (not to scale). The mine site is located

in the northern hemisphere, and thus the south-facing slopes receive more solar heating than the north-

facing ones. The solar zenith angle, 𝚽, and solar azimuth angle, 𝜶, change overtime from sunrise to

sunset. Figure modified based on (Kalogirou, 2013).

74

humidity was approximately 25% (±2%) throughout the entire field experiment, which was

considered as consistent. Nevertheless, the data acquisition was performed in three successive days

with similar weather conditions, and thus the collected data may not be representative to explain

the effect of humidity. Future improvement of the predictive model may be achieved by

incorporating a humidity term in the regression after collecting more representative data in

different seasons and weather conditions.

Besides the gases and suspended particles contained in the atmosphere, meteorological conditions

ought to be considered for remote sensing studies, especially for surveys conducted using UAV

based remote sensing platforms. Wind, for instance, can dissipate surface heat and accelerate

moisture evaporation, which results in cooling of the ground material. This cooling effect often

varies spatially and temporally, and thus introduces complexity and difficulty when using the

remotely sensed thermal data to estimate surface moisture. Moreover, wind speed and direction

are usually capricious in the local area, which may result in instability issues of the UAV platform

such as inaccurate flightpath (i.e., deviating from the pre-set flight routes) and tilted observation

angles of the onboard sensors (Jensen, 2009). During our field experiment, it was observed that

the wind speed was generally higher in the afternoon than in the morning. Also, the wind directions

were often inconsistent during the data acquisition, which had contributed to variations in the

flightpath accuracy and drone battery consumption. Some studies in the literature have proposed

parameterized models that include wind velocity to take the influence of wind into account when

relating material surface moisture to thermal data (Liu and Zhao, 2006; Scheidt et al., 2009; Zhao

and Li, 2013). In addition to wind velocity, another important meteorological component that may

have affected our results was cloud cover. The amount of cloud cover at each data collection

campaign was considerably different, and it was expected that the cloud cover had contributed to

differential heating and shadowing over the HLP, which led to patchy appearances on some of the

collected thermal images. It is important to bear in mind the potential influence of the

meteorological conditions when interpreting the generated results. To improve our method, a more

sophisticated model that involves meteorological variables may be developed in future studies to

account for the influences of meteorological conditions on the moisture estimation.

The amount of radiation emitted by an object is a function of its emissivity (Lillesand et al., 2015).

The greater the emissivity, the more radiance is emitted by the radiating body at a given kinetic

temperature (Jensen, 2009). It is important to notice that a cooler ground feature can emit the same

75

amount of radiation as a warmer body due to the discrepancy between their emissivity (Gupta,

2017). There are a number of factors that can influence the emissivity of an object, such as

chemical composition, surface roughness, moisture content, colour, and viewing angle (Jensen,

2009; Weng et al., 2004). In general, rocks with high silica content have a low emissivity, and

coarse particle surfaces relate to a high emissivity (Gupta, 2017). In this study, it was assumed that

the chemical composition and material roughness were uniform for the top 5 cm of the HLP

surface. However, if the material composition and roughness varied significantly across the

surface, then the estimated results would be biased toward the data on which the linear model was

derived. In our case, the mineralization at El Gallo mine has occurred in a volcanic series,

dominated by rocks of andesitic composition. Moreover, the ores were subjected to crushing before

they were dumped within the HLP. Thus assuming a consistent rock composition and roughness

surface for the HLP should be conceivably appropriate. Besides the effects of material composition

and roughness, darker-coloured particles are better emitters than the lighter-coloured ones; and the

more moisture a rock contains, the higher its ability to emit radiation (Gupta, 2017). These

relationships imply that the correlation between the material moisture content and the remotely

sensed data may not be a univariate linear function, and an improved model may be developed in

future studies to account for the abovementioned variables.

Observing the same surface from different viewing angles would obtain different thermal

measurements. This is because the emissivity of an object varies with the sensor viewing angle

(Jensen, 2009). Moreover, the distance between the sensor and the observed surface affects the

accuracy of the recorded data. The further the distance from the sensor to the target surface, the

more noise would be introduced into the remotely sensed thermal images (Sugiura et al., 2007). In

this study, the thermal images were acquired by pointing the thermal camera vertically downward

so that the central axis of the camera’s instantaneous field of view (IFOV) aligns with the normal

of the top surface. Such a strategy would result in a uniform observation distance between the

sensor and the horizontal surfaces at the expense of observing the slopes at oblique angles. The

reason for adopting such configuration was because ponding issues were more likely to occur on

the flat terrains rather than on the slopes of the HLP. Hence, more emphasis was put on the flat

regions over the HLP. Furthermore, the flight altitudes of the data collection campaigns were

selected based on the considerations of image resolution, flight duration and drone battery

consumption so that a balance between accuracy, efficiency and practicality was achieved.

76

In conclusion, this chapter elaborated on the methodology and implementation details of using the

acquired thermal images and in-situ moisture measurements to generate HLP surface moisture

maps. An empirical linear relationship between the remotely sensed surface temperature and the

HLP surface moisture content was first derived, and the moisture maps were generated using the

linear model. The empirical model showed a good agreement with the ground-truth moisture

measurements, and the generated moisture maps possessed a temporal and spatial resolution hardly

achievable by conventional point-measurement methods. The benefits and limitations of the

proposed method were discussed, and possible improvement of the moisture estimation step was

also outlined. Overall, the results have demonstrated the feasibility and practicality of the proposed

approach, and the products created from the data analysis process can be useful for HLP

monitoring applications.

77

Chapter 5 Surface Moisture Mapping Using Convolutional Neural Networks

Mapping Heap Leach Pad Surface Moisture Distribution

Using Convolutional Neural Networks

Chapter 4 described a framework for producing HLP surface moisture maps based on the obtained

thermal images. This chapter introduces how the acquired colour and thermal images can be

utilized simultaneously during the data analysis to generate surface moisture maps using

convolutional neural networks (CNNs). The proposed approaches create moisture maps in an end-

to-end fashion after the necessary data preparation procedures, and the methods can be further

developed towards a fully automated data analysis process for HLP surface moisture monitoring.

5.1 Overview and Methodology

Convolutional neural networks (CNNs) are a particular type of neural networks (NNs), which has

shown remarkable performance in processing data with a grid-like topology, such as images and

time-series data (Rawat and Wang, 2017). CNN models typically consist of multiple layers (e.g.,

a few tens or hundreds of layers), which endow them with the ability to extract hierarchical features

from the model input (Bengio, 2009; LeCun et al., 2015). Layers that are close to the input can

extract low- and mid-level features, while later layers can learn high-level (i.e., more abstract and

semantically meaningful) representations, which are the combinations of lower-level abstractions

(Alom et al., 2019; Zhu et al., 2017). Such a feature extraction ability allows predictive CNN

models to exploit spatial and/or temporal correlations in the data when making predictions, which

contributes to the tremendous success of CNNs in computer vision tasks, including image

classification and semantic segmentation (Khan et al., 2020; Lateef and Ruichek, 2019). Image

classification, known as scene classification in remote sensing, refers to categorizing a model input

(e.g., an image) into one of several predefined classes (Ma et al., 2019; Rawat and Wang, 2017);

and semantic segmentation, known as image classification in remote sensing, refers to assigning a

semantic class to every pixel of the input (Kemker et al., 2018; Zhu et al., 2017). To avoid

confusion, we adopt computer vision terms throughout this chapter unless otherwise specified.

In this study, we propose two approaches for generating heap leach pad (HLP) surface moisture

maps using CNNs, where the first method embeds a moisture classification model, and the second

78

utilizes a semantic segmentation network. The general workflow of the two methods is illustrated

in Figure 5-1. Since the input of CNN models must be raster tiles with a fixed height and width,

we designed the models to adapt a small input size (32 × 32 for classification, and 64 × 64 for

segmentation). In this way, if an input based on which a moisture map should be generated has a

large height and width, we first subdivide the input into multiple tiles with the same size, followed

by using the predictive CNN models to produce the corresponding prediction for each tile and

finally combine all of the model predictions to generate the moisture map output (Figure 5-1).

All the employed models in this work were trained using supervised learning, and the data used

for model development were derived from the colour (RGB) and thermal images obtained during

the field experiment described in Chapter 3. Since the two types of networks require different kinds

of training, validation, and testing data, we prepared the classification and segmentation datasets

separately based on the same set of remote sensing imagery (i.e., the raw data). The details of data

preparation are elaborated in Section 5.2. The model development and moisture map generation

for the classification and segmentation CNNs are presented in Section 5.3 and Section 5.4,

respectively. Methodology and implementation details are explained throughout the sections,

while discussions and visualization examples are provided to clarify the important concepts and

experimental results. Finally, Section 5.5 concludes the chapter and outlines the future direction

of this research work.

Figure 5-1: Schematic illustration of the moisture map generation workflow by using a classification

model (upper) and a segmentation model (lower). The input of the workflow should be a four-channel

raster with a height and width no less than the designated tile size of the corresponding model (i.e., 32

× 32 for classification, and 64 × 64 for segmentation). The classification model returns a moisture class

for each tile, while the segmentation model provides pixel-wise prediction.

79

5.2 Data Preparation

This section provides a detailed description of the data preparation process. The overall workflow

consisted of four parts, namely, data preprocessing, orthomosaic generation, orthomosaic co-

registration, and datasets construction. The resultant outputs were two sets of training, validation,

and testing data, which were later used for the development of convolutional neural networks.

During the field experiment, there were 24 sets of data collected, which contained approximately

4,750 thermal images and 2,115 visible-light (colour) images. Among the 24 sets of data, half of

the datasets were for the whole studied area, and the other half were collected only for the top two

lifts of the HLP. Table 5-1 summarizes the number of data collected at each flight mission, where

the thermal and visible-light data are reported separately. The data preparation process started with

preprocessing the colour and thermal images (Section 5.2.1). Then, the preprocessed data were

used to create 24 orthomosaics, followed by dividing the generated outputs into 12 groups

according to their studied area and time of data acquisition (Section 5.2.2). In this way, each group

consisted of one thermal and one colour orthomosaic that covered the same studied area (i.e., either

the whole facility or the top two lifts of the HLP). Afterwards, the colour orthomosaic within each

group was registered with its thermal counterpart such that 12 four-channel rasters (called overview

rasters) were generated (Section 5.2.3). Lastly, the datasets construction process, which involved

three steps, was performed to generate appropriate training and evaluation data for the deep

learning models (Section 5.2.4).

There were various types of data used and generated during the data preparation process, thus, it

is necessary to clarify the naming convention employed in the remainder of this section. In this

study, images were taken using digital and thermal cameras. Therefore, we use the term colour

image (or sometimes visible-light image) to indicate the data collected by the digital camera and

thermal image to denote the information recorded by the thermal sensor. In general, a colour image

Table 5-1: The number of remote sensing data collected during the field experiment*



Whole HLP** T: 620

C: 273

T: 618

C: 281

T: 618

C: 270

T: 621

C: 289

T: 618

C: 290

T: 619

C: 275

Top two lifts** T: 178

C: 58

T: 170

C: 74

T: 174

C: 74

T: 169

C: 81

T: 170

C: 73

T: 173

C: 76

* Overall, there were 24 sets of data collected, where 12 sets were colour images and the other 12 sets were thermal images

** T: number of thermal images; C: number of colour images

80

has three channels, which means every pixel of the image has a red, green, and blue colour intensity

value associated with it. In contrast, the thermal images are single-channel images because there

is only one digital number at each pixel location. For those data involving more than three channels

of values (e.g., four channels in our case), we call them raster data, or simply rasters. We adopted

the abovementioned naming convention to avoid confusion, although a raster image, by definition,

can represent any data that are stored in a 2D array-like pattern (Marschner and Shirley, 2015).

5.2.1 Data Preprocessing

The preprocessing of data was performed independently for the colour and thermal datasets. It is

important to note that the colour images were not georeferenced by default because the GPS device

was installed at the front gimbal on which the thermal camera was attached. Therefore, only the

raw thermal images were initially georeferenced. For the 12 colour image sets, the corrupted and

inappropriately exposed images were manually removed from the dataset. There were four (out of

2114) colour images removed in total. The colour images that cover one or multiple ground control

points (GCPs) within the field of view were recorded manually so that the GPS coordinates of

these GCPs could be used in the orthomosaics generation step to georeference the images.

The method used to preprocess the 12 thermal image sets was described in Section 4.2. In short,

the preprocessing of thermal images consisted of two procedures. The first procedure was to

remove the poor-quality data by manual inspection, which resulted in a removal of 47 (out of 4748)

images from the dataset. The second procedure was to run an intensity transformation and mapping

script written in MATLAB to ensure that the images within the same dataset had a consistent

intensity scale. By the end of the preprocessing step, every thermal dataset would have one pair of

highest and lowest remotely sensed surface temperature (denoted as Tmax and Tmin, respectively)

associated with it, where these Tmax and Tmin values would be later used to map the pixel

intensities of thermal orthomosaics into surface temperature values (see Section 4.2 for details).

5.2.2 Orthomosaics Generation

After completing the data preprocessing step, orthomosaics were generated in the Agisoft

Metashape software by using the preprocessed images. For the visible-light image sets, the

generated colour orthomosaics had a ground sampling distance (GSD) of 2 cm/pixel, and the

outputs were in 24-bit image format (8-bit for each red, green, and blue channel). Overall, twelve

81

colour orthomosaics were generated, where half of these orthomosaics were related to the whole

HLP facility, and the other half were associated with the top two lifts of the HLP. It is worth noting

that the highest possible GSD for the top two lifts datasets could be finer than 2 cm/pixel due to

the high image resolution resulting from the low flight altitude. However, a coarser spatial

resolution was selected to match with the datasets for the whole HLP so that the inputs to the deep

learning models are consistent in spatial scale.

The colour orthomosaic generation process started with importing the visible-light images into the

Agisoft Metashape software. The visible-light images were not georeferenced, and thus, the

images would not be automatically registered after importation. A preliminary image alignment

was then performed by using the “Align Photos” function, where the software would compute a

sparse point cloud and a set of preliminary positions of the images. However, the generated sparse

point cloud and preliminary positions were neither accurate nor orientated adequately due to the

lack of elevation and global positioning information. An example of the preliminary camera

positions and sparse point cloud computed by the software is illustrated in Figure 5-2(a). In order

to refine the image alignment, the locations of the GCPs were manually pinpointed within the

software interface, and the corresponding GPS coordinates of the GCPs were inputted into the

software. In this way, the appropriately positioned image alignment and sparse point cloud could

be generated by using the “Optimize Camera” function. An example of the adequately positioned

sparse point cloud created by incorporating the GCP coordinates is shown in Figure 5-2(b). The

refined spare point cloud was then used to create a dense point cloud, followed by the final

orthomosaic generation. This process was repeated using the same software settings for the 12

colour image sets. Figure 5-3 shows the six colour orthomosaics for the top two lifts of the HLP,

while Figure 5-4 depicts the orthomosaics with respect to the whole HLP facility.

The thermal orthomosaics for the 12 sets of thermal images were generated following the

procedures described in Section 4.4. The overall procedures for thermal and colour orthomosaic

generation was similar, except that the thermal images were georeferenced by default. Hence, the

thermal images would be automatically registered based on their georeference information after

importation. The generated thermal orthomosaics had a GSD of approximately 10 cm/pixel for all

12 datasets, and the outputs were in 8-bit (i.e., single-channel grayscale) image format. The

generated thermal orthomosaics for the whole facility are shown in Figure 4-5, and the

orthomosaics with respect to the top two lifts of the HLP are provided in Figure 5-5.

82

By following the procedures described above, there were 24 orthomosaics created based on the

Figure 5-2: (a) The generated point cloud without GPS information was not adequately oriented. (b)

The generated point cloud with GPS information was appropriately positioned. The x- and y-axis are

denoting the east-west and the north-south directions, respectively. The blue rectangles represent the

estimated image plane positions of the input images.

83

acquired visible-light and thermal image sets. The orthomosaics were then divided into 12 groups

according to their studied area and time of data acquisition, such that each group involved one

thermal and one colour orthomosaic. For instance, the first group would involve the thermal and

colour orthomosaics generated based on the data collected on the morning of March 6, and both

orthomosaics were for the whole HLP. In this way, we can superimpose the colour orthomosaics

onto the thermal ones to produce 12 four-channel rasters, where the first three channels contain

the intensity values for the red, green, and blue colours, followed by the fourth channel containing

Figure 5-3: Generated colour orthomosaics for the top two lifts of the HLP by using the acquired visible-

light image datasets.

84

the thermal information. Such a process of overlaying two images of the same scene with

geometric precision is called registration, or more specifically co-registration (Gupta, 2017).

Figure 5-4: Generated colour orthomosaics for the whole HLP by using the visible-light image datasets.

85

5.2.3 Orthomosaics Registration & Multichannel Rasters Generation

Overlaying the colour and thermal orthomosaics with geometric precision can allow the resultant

raster data to simultaneously contain information acquired from both visible-light and thermal

cameras. However, the generated orthomosaics were inevitably subject to image distortion, which

resulted in geometric misalignments when directly superimposed over each other. In other words,

if a colour orthomosaic was directly superimposed onto a thermal one, the same pixel location of

the two orthomosaics would not refer to the same location over the HLP due to image distortion

and variations (Gupta, 2017). To adequately align the orthomosaics within the same group, the

Figure 5-5: Generated thermal orthomosaics for the top two lifts of the HLP by using the acquired

thermal image datasets.

86

“Georeferencer” function in the QGIS software was used to perform orthomosaics alignment. It is

important to clarify that we were not truly georeferencing the orthomosaics to a geodetic reference

system (e.g., WGS-84); instead, we were selecting one of the two orthomosaics within a group to

be a reference and registering the other one onto the reference image. In the literature, the reference

image is also called the base or master image, while the images to be registered are referred to as

sensed or slave images (Gupta, 2017; Zitova and Flusser, 2003).

In this study, the thermal orthomosaic within each group was selected to be the reference, and the

goal was to align the colour orthomosaic with the thermal one such that the geometric

misalignment between the two would fall below a tolerance limit. This tolerance limit should be

set based on the application’s objective, and we adopted a threshold of three pixels (equivalently

0.3 m over the HLP surface), which is an order of magnitude smaller than the three-meter sprinkler

spacing of the HLP’s irrigation system. We consider that this precision is sufficient for the

application of HLP surface moisture map generation.

For each group of orthomosaics, the alignment process started with importing both images into the

QGIS software. After the importation, the colour orthomosaic was used as the image to be

registered (i.e., the slave image), and the thermal orthomosaic was used as the master image. A set

of feature correspondences was then manually identified between the two images, where these

correspondences were used to compute an image transformation that mapped the pixel locations

in the slave image onto the coordinate system of the master image. In this process, the user-defined

software parameters were set as follows: the “Polynomial 3” option was selected to be the

“Transformation type”, and the “Cubic” interpolation technique was picked as the “Resampling

method”. The reason for selecting the third-order polynomial transformation was because it has

the ability to correct complex and nonlinear image distortions, while it is one of the most

commonly used transformation types in practice (Kurt et al., 2016). Similarly, the “Cubic”

resampling method was selected because the cubic interpolation technique is commonly used when

processing aerial images, and it has the ability to preserve edges and produce sharp image outputs

(Kurt et al., 2016; Lehmann et al., 1999) .

In the above procedure, the number of identified feature correspondences depended on the studied

area. If the orthomosaics covered the whole HLP, then 150 pairs of features were manually

determined on the images. An example of the identified feature correspondences on one group of

87

the orthomosaics is shown in Figure 5-6. As depicted in the figure, the identified features were

spread over the images, and every feature (i.e., red dot) on the left image had one corresponding

feature on the right. These identified features are called control points (CPs) in the literature, where

one may refer to them as postmarked CPs because they were determined after the data collection

(Hackeloeer et al., 2014; Zitova and Flusser, 2003). In contrast, if the studied area was the top two

lifts of the HLP, then 60 pairs of CPs were used for the alignment.

As mentioned previously, the goal of the above process was to align the orthomosaics such that

the misalignment between the thermal and colour data fell below a tolerance limit. To evaluate the

alignment accuracy, an additional set of feature correspondences were manually identified, where

the CPs contained in this additional set were mutually exclusive from the ones that were used to

compute the image transformation (i.e., there was no repeating entry between the two sets of

features). The numbers of additional feature correspondences used for the whole area and the top

two lifts of the HLP were 25 and 15, respectively. In this way, the coordinates of these additional

CPs on the master image and the transformed slave image could be recorded, and the overall

alignment error was calculated by averaging the root mean square errors between the

corresponding coordinate pairs (Zitova and Flusser, 2003). As mentioned above, we set the

tolerance limit for the alignment error to be 0.3 m, or equivalently the ground distance represented

by three pixels of the thermal orthomosaic. If the alignment error were greater than the threshold,

Figure 5-6: Illustration of the selected 150 feature correspondences over the colour and thermal

orthomosaics. The two orthomosaics were generated based on the March 6, Morning datasets. Every

red dot on the colour orthomosaic (left) has a unique corresponding feature on the thermal orthomosaic

(right). There were 150 correspondences identified for every pairs of orthomosaics for the whole HLP.

88

then another round of image transformation would be performed until the averaged error went

below the tolerance.

After performing the image alignment described above, the thermal and transformed colour

orthomosaics were positioned adequately, and the next step was to regularize the spatial resolution

(i.e., GSD) of the data so that every pixel of the colour and thermal orthomosaics could represent

the same real-world dimension over the HLP. As mentioned in Section 5.2.2, the GSDs of the

thermal and colour orthomosaics were 10 cm/pixel and 2 cm/pixel, respectively. Since we selected

the thermal data as the reference, the colour orthomosaics would be downsampled to the same

resolution as its thermal counterpart. The downsampling was carried out by using the “Align

Raster” function in the QGIS software. The downsampling operation started with importing a

transformed colour orthomosaic and its thermal counterpart into the software. The thermal

orthomosaic was then selected to be the reference layer, and the output size was set to be the same

as the thermal data. The “Average” algorithm was selected to be the resampling method, and the

generated output would be saved in TIFF file format. By using these settings, the resultant output

would be a downsampled colour orthomosaic with the same GSD as the thermal input. This

downsampling operation was repeated for all 12 groups such that the spatial resolutions of all

orthomosaics were appropriately regularized.

The final step of the registration process was to superimpose the transformed and regularized

visible-light orthomosaics over the thermal data to generate four-channel rasters containing both

colour and temperature information. Figure 5-7 graphically depicts this step, and the same

procedure was repeated for every group of the data. To generate a four-channel raster, a thermal

orthomosaic was first imported into the QGIS software. After the importation, the “Raster

Calculator” function was used to map the pixel intensity values to surface temperature values by

using equation (4.3). The output of this operation would be a surface temperature map, where each

pixel would have an associated remotely sensed surface temperature value in degree Celsius. Once

the temperature map was created, the final product was generated using the “Merge” function in

QGIS to overlay the colour orthomosaic with the temperature map. The resultant output would be

a four-channel raster with all data values being single-precision floating-point numbers. In this

way, the 12 groups of orthomosaics were converted into 12 four-channel rasters (called the

overview rasters in the remainder of this chapter), which were used to prepare the datasets for the

development of convolutional neural networks.

89

5.2.4 Datasets Construction

The datasets construction process consists of three steps, which are graphically summarized in

Figure 5-8. In this study, we designed CNNs for two different tasks, namely image classification

and semantic segmentation, that required different types of model inputs. Therefore, the datasets

used to develop the classification and segmentation models were separately prepared as shown in

Figure 5-8. Overall, the first step of the datasets construction was to subdivide the 12 overview

rasters into a large number of small tiles such that each tile covered a small area of the HLP surface.

Secondly, the small raster tiles were partitioned into training, validation, and test sets, which would

be used in the CNN training and evaluation processes. Lastly, a label creation step was performed

to label all the examples (i.e., the small raster tiles) contained in the datasets so that the labelled

examples could be used to train the NNs in a supervised learning paradigm. The remainder of this

subsection provides the implementation details of each datasets construction step, followed by a

summarization of the datasets statistics.

Figure 5-7: Generation of a four-channel raster by overlaying a colour orthomosaic over a remotely

sensed surface temperature map of the heap leach pad. The output is a four-channel raster, where the

first three channels contain intensity values of the red, green, and blue colour, and the fourth channel

contains the remotely sensed surface temperature in degree Celsius. All data values in the output raster

are single-precision floating-point numbers.

90

Figure 5-8: The three steps of the deep learning datasets construction process.

91

At this point, the 24 sets of images were converted into 12 overview rasters containing both visible-

light and temperature information. The rasters that covered the top two lifts of the HLP had

approximately 4,100 × 2,050 (width × height) pixels, while the raster size for the whole HLP

facility was approximately 7,000 × 5,500 pixels. To subdivide a large raster into small tiles, the

raster to be subdivided was first imported into the QGIS software, followed by running the “Save

Raster Layer as” function. The “Create VRT” option was selected, and the output files were

designated to be in the TIFF file format. The raster height and width of the resultant outputs were

both set to 64 pixels, while the coordinate reference system (CRS) was set to be the same as the

input raster. The same settings were used for all 12 overview rasters, and the generated outputs

were a set of raster tiles that had a dimension of 64 × 64 × 4 (height × width × channels). Since

the region of interest (ROI) in this study is the HLP, thus, the raster tiles that covered areas outside

the ROI were manually removed. In this way, there were 31,313 raster tiles created, and these

raster tiles would be used for the segmentation task. Table 5-2 summarizes the number of raster

tiles generated by each of the overview rasters.

As shown in Figure 5-8, the classification data were obtained by splitting each of the raster tiles in

the segmentation dataset into four equal-area portions (i.e., top-left, top-right, bottom-left and

bottom-right portions) through running a MATLAB script. By doing so, there were 125,252 rasters

contained in the classification dataset (i.e., four times the segmentation dataset), and each

classification example had a dimension of 32 × 32 × 4 (height × width × channels). It is

important to note that the image resolution of 32 × 32 pixels is one of the commonly used image

sizes in computer vision and image processing for the image classification task (Krizhevsky and

Hinton, 2009). Moreover, the generated rasters had a GSD of 10 cm/pixel, and thus, every

classification example represented approximately a 3.2 m by 3.2 m area on the HLP surface. This

matches with the sprinkler spacing of 3 m of the HLP’s irrigation system (see Section 3.1).

Table 5-2: The number of tiles generated from each overview raster*



Whole HLP 4,794 4,854 4,881 4,905 4,656 4,770

Top two lifts 401 415 433 434 361 409

* In total, there were 31,313 raster tiles generated, where every raster tile had a dimension of 64×64×4 (height ×

width × channels). These raster tiles were used as the inputs for the semantic segmentation task.

92

After obtaining the classification and segmentation rasters, the next step was to group the data into

training, validation, and test sets. In practice, there is no universal standard regarding the

percentages by which the data should be partitioned. Machine learning practitioners employ

different ratios between training, validation and test sets, which is true even for benchmark

databases (Deng et al., 2009; Geiger et al., 2012; Lin et al., 2014). In this study, we decided to

adopt an 80/10/10 split (i.e., the training, validation and test sets contain approximately 80%, 10%

and 10% of the total number of data, respectively). The data partition was performed stochastically,

and the detailed procedures were as follows:

1) Assigning a unique file index to each raster in the dataset. For instance, the first raster in

the classification dataset had a file index of “000001”, while the last file had an index of

“125252”.

2) Creating two lists of random numbers (one for segmentation, and the other for

classification) by using the random number generator in MATLAB to sort the file indices

stochastically. It is worthwhile to note that a random seed of “44” was used to anchor the

permutation of file indices.

3) Grouping the first 80% of the indices to be the training set, while the last 10% to be the test

set, and the remaining 10% to be the validation set.

In this way, the classification dataset was partitioned into a training set including 101,252 rasters,

a validation set containing 12,000 rasters, and a test set of 12,000 rasters Similarly, the

segmentation dataset was also divided into a training set (25,313 rasters), a validation set (3,000

rasters), and a test set (3,000 rasters) through the same procedures.

After partitioning the datasets, the final step was to create labels for all the classification and

segmentation examples. It is important to note that the example labels are the expected outputs

that the deep learning models should learn to produce. In this study, we wanted the classification

models to estimate the moisture status of the ground area covered by an input raster. The estimated

moisture status should be given in one of the three classes, namely “Wet”, “Moderate”, and “Dry”.

In other words, the models should learn the correlation between pixel values and moisture levels

and return a class estimate that best describes the moisture status of a given area over the HLP.

Therefore, we annotated each classification example with one of the three moisture classes, and

93

the created labels were stored in a text file, including the file indices and their corresponding

moisture class. To annotate a classification example, a moisture map was first created by applying

equation (4.2) to every pixel over the temperature channel of the raster (see Figure 5-8). Secondly,

the mean moisture content of the moisture map for each raster tile was calculated, followed by a

thresholding operation. In this study, we defined the “Wet” class to be greater than 8% of moisture

content, the “Dry” class to be smaller than 4% of moisture content, and the “Moderate” class to be

any moisture value falling within the range of 4% to 8%. In this way, every classification example

was annotated by a text string, which denoted its moisture status. The annotation process above

was performed by running a script written in MATLAB.

Similarly, for semantic segmentation, we wanted the models to classify a given moisture value

into one of the three moisture classes; however, the classification should be carried out at a pixel

level, and one may treat the segmentation task as a pixel-wise classification problem (see Figure

5-8). To annotate a segmentation example, a MATLAB script was run to first apply equation (4.2)

to every pixel over the temperature channel of the raster to create an estimated moisture map. A

pixel-wise thresholding operation was then conducted to categorize the moisture values into one

of the three moisture classes (i.e., “Wet”, “Moderate”, and “Dry”), where the same definitions of

the moisture classes were used as in the classification case. The resultant output of the annotation

process was a 2D array with the same height (64 pixels) and width (64 pixels) as the input raster.

Every element (pixel) of the 2D array contained a text string denoting the moisture class at the

pixel location (i.e., pixel-wise labelling). In order to make it easier for data handling, we further

converted the text strings into numbers, such that “0” denoted the “Dry” class, “1” represented the

“Moderate” class, and “2” indicated the “Wet” class. In this way, we could save the created label

as a single-channel image and name the file using the file index of the corresponding segmentation

example. By following the above procedures, every segmentation example (in the training,

validation, and test sets) had its corresponding label image, and the prepared data were then used

for the development of convolutional neural networks.

At this point, all the data were prepared appropriately, and it is important to obtain an

understanding on the datasets statistics. Table 5-3 summarizes the frequencies and percentages of

each moisture class present in the classification dataset. Overall, we can conclude that the majority

of the data (60.7%) were labelled with a “Moderate” moisture class, and the “Wet” moisture class

94

is a minority class (5.0%) in the classification dataset. This phenomenon of having skewed data

distributions is called class imbalance, which naturally arises in many real-world datasets (Johnson

and Khoshgoftaar, 2019). Many reasons can lead to a class imbalance in datasets, and we consider

our data having an intrinsic imbalance, that is, an imbalance created by naturally occurring

frequencies of events rather than external factors like errors in data collection or handling. In this

study, the majority of the areas over the HLP surface were expected to have a moderate moisture

content, while the dry and wet areas should be the minorities because the extreme moisture

conditions may relate to operational issues. Although one may concern that the class imbalance

would potentially influence the performance of the trained models, several studies have shown that

deep learning models can be successfully trained, regardless of class disproportion, as long as the

data are representative (Johnson and Khoshgoftaar, 2019; Krawczyk, 2016). Also, it is worth

noting that some benchmark databases on which thousands of deep learning models have been

trained encounter severe class imbalance issues (Dong et al., 2018; Lin et al., 2014; Van Horn et

al., 2017). Moreover, the percentage of the minority becomes less influential if the minority class

contains a sufficient number of examples (Johnson and Khoshgoftaar, 2019). In our case, the

minority class (i.e., the “Wet” moisture class) has more than 5,000 training examples, which is

already of the same size as the majority class of some benchmark datasets, and thus should be

considered as sufficient for training the models (Johnson and Khoshgoftaar, 2019; Krizhevsky and

Hinton, 2009; LeCun et al., 1998). Despite the class imbalance issue, the data was partitioned

relatively uniform across the training, validation, and test sets, where the “Wet”, “Moderate”, and

“Dry” classes consistently occupy 5%, 61% and 34%, respectively, in the three sets of data.

For the segmentation task, Table 5-4 summarizes the (pixel-level) frequencies and percentages of

each moisture class present in the dataset. Similar to the classification case, the segmentation

Table 5-3: Summarization of dataset statistics for the classification task

Moisture

Classes

Whole Dataset*

(125,252 examples)

Training Set*

(101,252 examples)

Validation Set*

(12,000 examples)

Test Set*

(12,000 examples)

“Wet” 6,258 (5.00%) 5,021 (4.96%) 665 (5.54%) 572 (4.77%)

“Moderate” 75,995 (60.7%) 61,525 (60.8%) 7,159 (59.7%) 7,311 (60.9%)

“Dry” 42,999 (34.3%) 34,706 (34.3%) 4,176 (34.8%) 4,117 (34.3%)

* The percentage in the parenthesis is calculated by using the number of instances in the moisture class divided by

the total number of examples included in the set. For instance, there are 5,021 “Wet” examples in the training set,

which is approximately 4.96% of the total number of examples (i.e., 101,252) in the training set.

95

examples also encounter the class imbalance issue, where the “Wet”, “Moderate”, and “Dry”

classes occupy approximately 6%, 58% and 36%, respectively, in the training, validation and test

sets. Additional attention should be paid when tracking the model performance because a model

that identifies all the pixels as “Moderate” moisture class can still achieve a pixel-wise accuracy

of approximately 58%. As shown in Table 5-5, approximately 59% of the rasters contained two

moisture classes, while 36% of the segmentation examples involve only one class, and 5% of the

data include all three classes simultaneously. These statistics imply that the moisture distribution

across the HLP surface is not overall uniform. Most areas over the HLP surface have moisture

contents vary from one class to another even within the small surface area represented by one

raster. Nevertheless, the variations in moisture classes within individual rasters are actually

beneficial for training segmentation models since they provide more contextual information than

those data involving only one class per example (Lin et al., 2014).

In summary, the data preparation process converted the 24 image sets (i.e., the raw data) into one

classification and one segmentation dataset, which were used for the development of convolutional

neural networks. The entire process took approximately 300 hours for someone who possessed

familiarities with the software and programming language used. The data preparation was arguably

the most time-consuming and labour-intensive part in this study, and future works will devote to

automate the process with intents to minimize human intervention and increase workflow

efficiency.

Table 5-4: Summarization of dataset statistics for the segmentation task

Moisture

Classes

Whole Dataset

(128.3 M pixels)

Training Set

(103.7 M pixels)

Validation Set

(12.3 M pixels)

Test Set

(12.3 M pixels)

“Wet” 6.04% 6.00% 6.61% 5.81%

“Moderate” 58.5% 58.6% 57.7% 58.8%

“Dry” 35.4% 35.4% 35.7% 35.4%

Table 5-5: Frequency and percentage of the number of classes contained per segmentation example*

Number of

classes present

Whole Dataset

(31,313 examples)

Training Set

(25,313 examples)

Validation Set

(3,000 examples)

Test Set

(3000 examples)

One class 11,289 (36.1%) 9,159 (36.2%) 1,057 (35.2%) 1,073 (35.8%)

Two classes 18,448 (58.9%) 14,889 (58.8%) 1,792 (59.7%) 1,767 (58.9%)

Three classes 1,576 (5.03%) 1,265 (5.00%) 151 (5.03%) 160 (5.33%)

* On average the segmentation dataset contains 1.7 classes per example.

96

5.3 Classification-Based Heap Leach Pad Surface Moisture Mapping

In this section, we introduce our methodology for using CNN-based moisture classifiers to

generate moisture maps for the studied HLP. We first define the three CNN architectures that we

employed in the experiment, followed by providing the implementation details of the model

training and model evaluation process. We end the section with a description of the workflow that

we adopted to generate the moisture maps, where a brief conclusion about the performance of our

method is included. To the best of our knowledge, not much attention in the literature has been put

on leveraging the power of CNNs to perform HLP surface moisture mapping. Therefore, the focus

of this section is to showcase how a CNN classifier can be incorporated into the moisture map

generation workflow and to study the behaviours and performances of the proposed method.

5.3.1 Network Architectures

Convolutional neural networks have become the most prominent solution for the image

classification task in recent years, demonstrating superhuman performance in real-world

applications (He et al., 2015; Rawat and Wang, 2017). While lots of breakthroughs have been

made in the past decade (Ciresan et al., 2011; He et al., 2016a; Howard et al., 2017; Krizhevsky

et al., 2012; Sermanet et al., 2013), exploring powerful and efficient architectures of CNNs remains

as an active area of research nowadays (Howard et al., 2019; Sandler et al., 2018; Tan and Le,

2019; Zhang et al., 2020). Among the vast number of proposed CNN models in the literature, we

employed three well-studied architectures, AlexNet (Krizhevsky et al., 2012), ResNet (He et al.,

2016a), and MobileNetV2 (Sandler et al., 2018), to study their behaviours when using the prepared

data as inputs to perform a moisture classification task. The trained models were further used to

produce moisture maps for the HLP, and their performances were compared against each other.

AlexNet was first proposed by Krizhevsky, Sutskever and Hinton in 2012, which has

revolutionized the field of computer vision and is one of the key contributors to the recent

renaissance of neural networks (Krizhevsky et al., 2012; Rawat and Wang, 2017). AlexNet and its

variants have been extensively used as base models in many research studies (Rawat and Wang,

2017; Wang et al., 2018). In this work, we employ a modified version of AlexNet to learn the

correlation between the raster input and the moisture class output. The network architecture

contains eight layers with weights, where the first five layers are convolutional, and the reminders

97

are three fully-connected (FC) layers. There are also two max pooling layers without weights

involved in the architecture to downsample the intermediate feature maps (a.k.a. activation maps).

The modified version of AlexNet is graphically depicted in Figure 5-9. The inputs to the models

are the 32 × 32 × 4 (height × width × channels) raster data in the classification dataset (see

Section 5.2.4). All of the convolutional layers in the network adopt 3 × 3 kernels with a stride of

one. The first and the second layers do not use zero padding to adjust the layer’s input dimension,

whereas the third to the fifth convolutional layers adopt a zero padding to ensure that the layer’s

input and output have the same height and width. Max pooling layers with 2 × 2 non-overlapping

windows and a stride of two are used to downsample the feature maps such that the height and

width become one half after each pooling operation. The last three layers are FC layers, where the

first two have 1024 neurons each, and the final layer is a three-way FC layer with softmax. The

final output of the network is a 3 × 1 vector containing the predicted probabilities for each of the

three moisture classes (i.e., “Wet”, “Moderate”, and “Dry”). The ReLU activation function is

applied in all convolutional and fully-connected layers except for the output in which a softmax is

used (Krizhevsky et al., 2012). The dimensionalities of the intermediate feature maps are clearly

Figure 5-9: The modified AlexNet architecture employed in this study. The input to the network has a

dimension of 32 × 32 × 4 (height × width × channels), and the output is a 3 × 1 vector containing the

corresponding probabilities for each of the three moisture classes. The numbers in the figure indicate

the corresponding dimension of the input, feature maps and output; while “Conv 3×3” represents a

convolutional layer with a kernel size of 3 × 3; “Max pool 2×2” indicates a max pooling layer operating

with 2 × 2 non-overlapping windows; “Fully connected” means a fully-connected layer; letter “s” stands

for stride; and letter “p” denotes the number of zeros padded to each side of feature maps.

98

labelled in Figure 5-9, and the whole network includes 17.2 million parameters, which are all

trainable entries.

ResNet (or Residual Network) is another groundbreaking architecture proposed by He et al.

(2016a) that has significantly improved the performance of CNNs in image recognition tasks (i.e.,

classification, detection, segmentation). Many advanced models are developed and fostered based

on the ideas and concepts introduced by ResNet (Alom et al., 2019; Khan et al., 2020). One

important reason for the network’s popularity is that ResNet simultaneously addresses two critical

challenges during the training of CNNs: vanishing/exploding gradients and degradation. He et al.

(2016a) addressed the vanishing/exploding gradients problem by 1) employing a normalized

weight initialization technique, and 2) using batch normalization extensively in the network (He

et al., 2015; Ioffe and Szegedy, 2015). By doing so, information can effectively flow through the

network in both forward and backward directions, and the model can converge by using stochastic

gradient descent (SGD) with backpropagation (He et al., 2016a). However, the degradation

problem makes the training of deep CNN difficult. The network performance starts saturating and

then degrades as more layers are directly stacked over each other in the architecture. Such

degradation is undesirable because deeper models (i.e., more layers stacked together) should result

in greater capacity and are expected to obtain better performance than the shallower ones (He et

al., 2016a; Simonyan and Zisserman, 2014; Szegedy et al., 2015).

To address the degradation problem, He et al. (2016a) introduced a deep residual learning

framework. A basic building block of residual learning with convolutional layers is shown in

Figure 5-10(b). In this framework, the residual block includes a shortcut connection that skips two

convolutional layers and performs an identity mapping. One copy of the input first passes through

the two stacked convolutional layers, followed by adding another copy of the input from the

shortcut to generate the final output. The ReLU activation function is used for introducing

nonlinearity to the process. As a comparison, Figure 5-10(a) shows a block in a plain network

without a shortcut connection. The notion of residual learning has been found quite useful for

training deep neural networks, where He et al. (2016a, 2016b) demonstrated that a network with

more than one thousand layers could be successfully trained with residual learning (Alom et al.,

2019; Khan et al., 2020).

99

In practice, deploying deep networks requires a significant amount of computational power and

computer memory. Training a network made of a stack of multiple residual blocks shown in Figure

5-10(b) can require long training time. Therefore, He et al. (2016a) proposed a bottleneck design

of the residual block to make the training of ResNet less computationally expensive. Figure 5-11

provides a comparison example between an original residual block and the bottleneck design (He

et al., 2016a). As shown in Figure 5-11, the original residual unit consists of two convolutional

layers with 3 × 3 kernels. In contrast, the bottleneck residual unit replaces the two convolutional

layers with three layers: a 1 × 1 layer for dimension reduction, a 3 × 3 layer of regular

convolution, and a 1 × 1 layer for dimension restoration (He et al., 2016b). Such a bottleneck

design can result in a more efficient training process, and thus it is used in our experiment.

In this study, we employ a 50-layer ResNet (ResNet50) architecture, which is one of the most

widely used versions of the network in practical applications (Alom et al., 2019). Overall,

ResNet50 consists of 50 layers with weights, including 49 convolutional layers and one FC layer.

There is also one max pooling and one global average pooling layer, which does not have trainable

entries. Batch normalization is used after every convolutional layer, and the ReLU nonlinearity is

used as the activation function except for the output layer. The architecture of ResNet50 is

summarized in Table 5-6, and the final output of the network is a 3 × 1 vector containing the

predicted probabilities for each of the three moisture classes. The network includes 23.6 million

parameters consisting of 23.5 million trainable and 53 thousand non-trainable parameters.

Figure 5-10: (a) A plain convolutional (Conv) block with two Conv layers. (b) A basic building block

of residual learning. The ⊕ symbol denotes an element-wise addition operation. In this case, the

shortcut connection performs identity mapping. Modified based on He et al. (2016a).

100

The last network that we have adopted is MobileNetV2 proposed by Sandler et al. (2018). As

mentioned previously, modern state-of-the-art CNNs have demonstrated their abilities to surpass

human-level performance in visual recognition tasks. Nevertheless, many of these networks

require high computational resources, and they are not compatible with portable devices, such as

on-board computers or smartphones, especially when real-time performance is desired (Howard et

Figure 5-11: (a) An original residual block (b) A bottleneck residual block. The ⊕ symbol denotes an

element-wise addition operation. In this case, the shortcut connection performs identity mapping. BN

stands for batch normalization. Modified based on He et al. (2016a).

Table 5-6: Architecture of ResNet50 (He et al., 2016a)

Layer/block Name Layer/block output dimension ResNet50*

Input 32 × 32 × 4 -

Conv1 16 × 16 × 64 7 × 7, 64, stride 2

Conv2_x 8 × 8 × 256

3 × 3 max pool, stride 2

[1 × 1, 643 × 3, 64

1 × 1, 256] × 3

Conv3_x 4 × 4 × 512 [1 × 1, 1283 × 3, 1281 × 1, 512

] × 4

Conv4_x 2 × 2 × 1024 [1 × 1, 2563 × 3, 256

1 × 1, 1024] × 6

Conv5_x 1 × 1 × 2048 [1 × 1, 5123 × 3, 512

1 × 1, 2048] × 3

FC 1 × 1 × 3 global average pool, three-way FC, softmax

* Each square bracket represents a bottleneck residual block, with the numbers of block stacked. Every row in a

bracket denotes one layer of operation. For instance, “7 × 7, 64, stride 2” means a convolution operated with 64

kernels, where each kernel has a size of 7 × 7, and the convolutional operations are operated with a stride of two.

Batch normalization is applied after every convolutional operation. Downsampling is performed in Conv3_1,

Conv4_1, and Conv5_1 with a stride of two.

101

al., 2017). Many recent studies have aimed to develop efficient architectures for resource-

constrained environments while retaining prediction accuracy (Howard et al., 2019; Sandler et al.,

2018; Zhang et al., 2018). MobileNetV2 is one of these networks, which is tailored for devices

with relatively low computational resources (i.e., memory and computing power).

Two factors conspire to make MobileNetV2 efficient: 1) the architecture introduces a module

called inverted residual block to increase memory efficiency; 2) the network replaces regular

convolution with depthwise convolution to reduce the number of multiply-adds calculations and

model parameters, which further lessens the memory footprint and computational cost (Sandler et

al., 2018). Overall, an inverted residual module appears similar to the building block of ResNet,

except that the shortcut connection directly occurs between bottleneck layers rather than feature

maps that have a high number of channels. Figure 5-12 provides a schematic visualization of the

differences between the two types of residual units (Sandler et al., 2018). It is worth noting that

the output of an inverted residual block is a direct addition of the bottlenecks without using

nonlinearity (Figure 5-12). Sandler et al. (2018) argued that the use of linear bottlenecks could

help prevent nonlinearities from destroying information, and thus, improve the performance of the

Figure 5-12: Illustration of the differences between (a) a classical bottleneck residual block and (b) an

inverted residual block with linear bottleneck. Diagonally hatched layers are the linear bottlenecks that

do not use nonlinearities. The ⊕ symbol denotes an element-wise addition operation, where the

classical residual block has a ReLU nonlinearity following the addition. The “ReLU6” nonlinearity is

essentially a ReLU function capped with a maximum value of six, that is: 𝑔(x) = min(max(0, x), 6).

The thickness of a block represents the relative number of channels involved in that layer. Note how the

inverted residual connects the bottlenecks, while the classical residual connects the feature maps with a

large number of channels. The last (lightly coloured) layer is the input for the next block. Best viewed

in colour. Modified based on Sandler et al. (2018).

102

trained models. It has been shown that the use of inverted residual block can result in memory-

efficient models while maintaining the prediction accuracy. Furthermore, MobileNetV2 uses

depthwise convolutions and 1 × 1 convolutions (a.k.a. pointwise convolution) extensively in the

network. This significantly reduces the number of mathematical operations and the number of

trainable weights as compared to the use of regular convolution (Howard et al., 2017; Ye et al.,

2019). Sandler et al. (2018) and Bai et al. (2018) provided detailed calculations quantifying the

amount of computational resources required by the three types of convolutional layers. Figure 5-13

provides a visual comparison between the regular convolution, depthwise convolution, and

pointwise convolution (Bai et al., 2018) .

In this study, a modified version of MobileNetV2 was employed to make the model compatible

with our input rasters. We modified the network to have a shallower depth (i.e., fewer layers) than

the original architecture. The modification is made because the prepared classification examples

have a height and width of 32 pixels, which is smaller than the input of the original model (224

pixels). Directly using the original architecture would shrink the size of intermediate feature maps

to have a width and height of one pixel before reaching the output layer. This could be suboptimal

because we have observed that the network performance deteriorates if the size of feature maps

become smaller than 4 × 4 (see Section 5.3.3). The deterioration may be due to the kernel size (3

× 3) used by the network, which means padding would be needed in order to perform the

convolution when a feature map has a small height and width. For instance, if we perform a 3 × 3

convolution over a 2 × 2 feature map, then five zeros (or other artificial values) would be padded

at the perimeter of the feature map. Note that the feature map has only four pixels in total, which

means the convolution would be performed over mostly artificial numbers. To avoid the

intermediate feature maps becoming too small, we reduced the number of inverted residual blocks

used by the original architecture and stopped stacking residual module when the feature maps have

a height and width of four pixels.

The architecture of the modified MobileNetV2 is summarized in Table 5-7, and the final output of

the network is a 3 × 1 vector containing the predicted probabilities for each of the three moisture

classes. Overall, the modified MobileNetV2 architecture starts with a regular 3 × 3 convolution

with 32 filters, followed by six inverted residual blocks, and ends with a global average pooling, a

three-way fully-connected layer, and a softmax. Note that the inverted residual blocks with

different stride values have different inner structures. For “Block1” to “Block5” (excluding

103

“Block0”) in Table 5-7, the stride of each block is provided in the last column of the table, and the

corresponding inner structures are depicted in Figure 5-14. The network has 109 thousand

parameters in total, which includes 103 thousand trainable and six thousand non-trainable

parameters. It is worth noting that this model is considered small (in terms of the number of

parameters involved) in modern practice, but we have observed that the network’s performance is

comparable to ResNet50 and consistently better than AlexNet on the prepared dataset.

Figure 5-13: Comparison between (a) regular convolution, (b) depthwise convolution and (c) pointwise

convolution. In the figure, symbol ⊛ represents a convolutional operator; K denotes the kernel size; H,

W, and D are the height, width and depth of the input, respectively; and Dout is the number of channels

of the output feature maps, which is equal to the depth of the input, D, in this case. Best viewed in

colour. Modified based on Bai et al. (2018).

104

Figure 5-14: : Inner structure of the inverted residual blocks in the modified MobileNetV2 architecture.

(a) Inverted residual block with stride of one. (b) Block with stride of two. The shortcut connection

performs identity mapping. BN stands for batch normalization. Dwise means depthwise convolution.

Table 5-7: The modified MobileNetV2 architecture employed in this study

Layer/block name Layer/block output dimension Operator* Stride for 3 × 3

Convolution

Input 32 × 32 × 4 - -

Conv1 16 × 16 × 32 3 × 3, 32 2

Block0 16 × 16 × 16 [1 × 1, 32

3 × 3 Dwise, 321 × 1, 16

] 1

Block1 8 × 8 × 24 [1 × 1, 96

3 × 3 Dwise, 961 × 1, 24

] 2

Block2 8 × 8 × 24 [1 × 1, 144

3 × 3 Dwise, 1441 × 1, 24

] 1

Block3 4 × 4 × 32 [1 × 1, 144

3 × 3 Dwise, 1441 × 1, 32

] 2

Block4 4 × 4 × 32 [1 × 1, 192

3 × 3 Dwise, 1921 × 1, 32

] 1

Block5 4 × 4 × 32 [1 × 1, 192

3 × 3 Dwise, 1921 × 1, 32

] 1

Conv2 4 × 4 × 1280 1 × 1, 1280 1

FC 1 × 1 × 3 avg. pool, three-way FC, softmax -

* Each square bracket represents an inverted residual block. Every row in the bracket denotes one layer of operation. For

instance, “3 × 3 Dwise, 32” means a 3 × 3 depthwise convolution operated with 64 kernels. The “avg. pool, three-way

FC, softmax” refers to a global average pooling, followed by a three-way fully-connected layer with softmax. Batch

normalization is applied after every convolutional operation, including regular, depthwise and pointwise convolution.

105

5.3.2 Training Setup

We implemented all three networks in TensorFlow 2 (TF) and deployed model training using

TensorFlow’s Keras API (Abadi et al., 2016; Chollet et al., 2015). All models were trained from

scratch using RMSProp optimizer with a learning rate of 0.001, a momentum of zero, and a decay

(named rho in TF) of 0.9 (Hinton et al., 2012). There was no extra learning rate decay used during

the training, and the cross-entropy loss (named sparse categorical crossentropy in TF) was used

as the loss function. We normalized the training data to have zero mean and unit standard deviation

for every channel. The normalization started with calculating the per-channel means of the whole

training set (i.e., four mean values for the entire training set, one for each channel), followed by

subtracting every training example by the per-channel means. After obtaining the zero-mean data,

we calculated the per-channel standard deviations (i.e., four values, one for each channel) and

ended the normalization process by using every training example to divide the per-channel

standard deviations. We did not perform any data augmentation during the training process

because we considered the number of rasters in our training set (101,252 examples) sufficient for

training classifiers for only three classes. The training set statistics (i.e., the means and standard

deviations) were also used to normalize the validation and testing data following the

abovementioned procedure (He et al., 2016a; Simonyan and Zisserman, 2014) .

We included dropout with a dropout probability of 0.5 in the first two FC layers of the modified

AlexNet architecture, whereas there was no batch normalization (BN) used in the network

(Krizhevsky et al., 2012). In contract, no dropout was used in the ResNet50 architecture, but we

adopted BN with a momentum of 0.99 right after every convolution and before activation (He et

al., 2016a). Similarly, we did not adopt dropout for the modified MobileNetV2, while BN with a

momentum of 0.999 was applied after every convolutional operation including regular, depthwise

and pointwise convolution (Sandler et al., 2018).

To train a model, we started with initializing the model using the Kaiming initialization proposed

by He et al. (2015). We then trained the model for 20 epochs with a minibatch size of 68 examples.

It is worth noting that the training set contained 101,252 examples which can be divided by 68

with no remainder. In this way, each epoch consisted of 1489 steps (or iterations), where each step

was for one minibatch. After completing the 20 epochs of training, the trained model was examined

106

against the validation set, and the validation accuracy was used to represent the prediction accuracy

of that model.

In order to conduct a fair comparison regarding the performance of the three employed

architectures, we trained 30 copies of each network (i.e., 30 models for each architecture, 90

models in total) with different initialization of the network parameters. Since different initialization

for the weights of a network can lead to variations in the model performance, we chose the one

(out of 30) that resulted in the best prediction accuracy on the validation set to be the final model.

In this way, there were three final models determined (one for each architecture) based on their

performances on the validation set. We then evaluated the three final models on the test set to

compare the three architectures’ classification accuracy. It is important to note that we did not use

any ensemble technique to combine multiple models together in this study.

Since the Kaiming initialization randomly samples model weights from a zero-mean (truncated)

Gaussian distribution (He et al., 2015), the model will have a different initial state after each

initialization process. To improve the reproductivity of our training results, we explicitly defined

30 seeds1 for the random number generator in TF to initialize the 30 models for each architecture

(i.e., one seed for one model). The same list of seeds was used for all three architectures, such that

the initialization of weights can be reproduced.

The model training was carried out on a “workstation” computer, while the development and

coding were performed on a “laptop” computer. Table 5-8 provides a comparison between the

specifications of the laptop and workstation computers. It is worth noting that the ResNet50

architecture could not be efficiently trained on the laptop computer due to its large memory

footprint, while both the modified AlexNet and MobileNetV2 could be trained on the laptop with

the setup described above. Training of the modified AlexNet, ResNet50, and modified

MobileNetV2 on the workstation computer required approximately 7 hours, 20 hours, and 8 hours,

respectively, for 600 epochs (i.e., 20 epochs per model × 30 models per architecture).

1 The list of random seeds for model initialization: {0, 4, 5, 9, 40, 44, 45, 49, 50, 54, 55, 59, 90, 94, 95, 99, 400, 444, 459, 499, 500, 540, 549, 550,

599, 900, 949, 959, 995, 999}.

107

5.3.3 Model Evaluation

The training curves of the networks are shown in Figure 5-15. The classification accuracy of the

models on the training and validation sets was recorded after each epoch of training. Each solid

line in Figure 5-15(a) is a curve of mean training accuracy, determined by averaging the prediction

accuracy of the 30 models with the same network architecture. Similarly, every curve in Figure

5-15(b) represents the mean classification accuracy on the validation examples for the models with

the same architecture. The shaded regions in Figure 5-15(a) indicate the ranges of training accuracy

for the three networks, where the upper and lower bounds of the shaded areas are the maximum

and minimum accuracy values that were resulted from the models. The ranges of validation

accuracy for the three network architectures significantly overlapped with each other and thus are

not shown in Figure 5-15(b).

Overall, we have five major observations from Figure 5-15. First, the modified MobileNetV2 and

ResNet50 models that we have trained possess an ability to fit the training data better than the

modified AlexNet models. The former two architectures consistently resulted in higher training

accuracy than the modified AlexNet. This result is expected because several studies have shown

that ResNet and MobileNetV2 generally have better performance than AlexNet (Canziani et al.,

2016; Howard et al., 2017; Khan et al., 2020). Second, we observe that although performances on

the training set vary among the three architectures, all models were able to converge, and more

importantly, they could fit the training set at a high classification accuracy (higher than 97% as

shown in Figure 5-15(a)). Third, the range of accuracy for the modified AlexNet in Figure 5-15(a)

appears wider than the other two architectures. One possible reason for this phenomenon is because

we have included dropout in the modified AlexNet but not in the modified MobileNetV2 and

ResNet50 models. The randomness introduced by dropout during the training process could result

Table 5-8: Comparison of computer specifications

Component Workstation Laptop

Central processing unit

(CPU)

Intel Core i9 12-Core/24-Thread

Processor, 2.90 GHz Base/4.3 GHz

Max Turbo

Intel(R) Core(TM) i7-7700HQ CPU

@ 2.80GHz

Graphics processing unit

(GPU) NVIDIA RTX 2080 Ti 11 GB NVIDIA GeForce GTX 1050 Ti 4 GB

DDR RAM 64 GB 16 GB

108

in some misclassification by the modified AlexNet models on the training examples. Fourth, we

can conclude that, on average, the trained models are generalizable to the validation data. As shown

in Figure 5-15(b), the models from all three architectures were able to achieve a validation

accuracy close to the training accuracy. The high generalization ability of the trained models may

imply that the data in the training and validation sets have a similar distribution, and thus the

features learned from the training examples can be well generalized to the validation rasters.

Last, we notice that the modified MobileNetV2 performed marginally better than the ResNet50

models on the dataset. However, it is important to note that the training curves are not supportive

evidence for arguing that the modified MobileNetV2 is a better network architecture than

ResNet50. The results only imply that our training data can be better fitted by the modified

MobileNetV2 models given the specific training setup and data preparation process described in

previous sections. Moreover, the small height and width of the intermediate feature maps could

have impacted the ResNet50 models’ performance on our dataset. As shown in Table 5-6, the

feature maps are shrunk to 1 × 1 (height × width) before reaching the output layer, and the loss of

spatial information may have led to a deterioration of network performance. Since ResNet was

Figure 5-15: Training curves of the modified AlexNet (blue), ResNet50 (green), and the modified

MoblieNetV2 (red). (a) Training accuracy of the three networks. Each solid line represents the mean

training accuracy of the 30 models with the same architecture. The upper and lower bounds of the shaded

areas are the maximum and minimum training accuracy among the 30 models, respectively. (b) Mean

prediction accuracy on the validation set of the three networks. Each line was determined by averaging

the validation accuracy of the 30 models with the same architecture. The validation accuracy of the first

and second epoch for the modified MobileNetV2 were 60% and 79%, respectively.

109

initially proposed for images with higher resolutions (e.g., 224 × 224) than our data (32 × 32),

future studies may incorporate rasters with higher resolution or adopt some padding techniques to

avoid the intermediate feature maps in the network becoming too small in size.

To further explore the influence of height and width of intermediate feature maps on the model

performance, we compared our modified MobileNetV2 architecture with two other versions of the

network: (A) a version that shrinks the feature maps to 1 × 1 (height × width) before reaching the

output layer; and (B) a version that has a minimum size of intermediate feature map of 2 × 2. Table

5-9 and Table 5-10 summarize the two network architectures, and we denote A and B as

MobileNetV2 A and MobileNetV2 B, respectively, in the remainder of this section. As shown in

Table 5-9 and Table 5-10, MobileNetV2 A and MobileNetV2 B are both deeper and have more

parameters than the network that we have employed (Table 5-7), while all three versions are

constructed based on the same inverted residual blocks (Figure 5-14). We trained 30 models for

each of the MobileNetV2 A and B following the same setup described in Section 5.3.2. In general,

the two deeper networks should result in higher training accuracy (i.e., less misclassification on

the training examples) than our shallower version due to their stronger representational abilities

(He et al., 2016a).

However, as depicted in Figure 5-16, MobileNetV2 A and B consistently performed worse than

our version on both the training and validation sets. Also, the ranges of prediction accuracy for A

and B were wider than our version, which indicates that they were more sensitive to initialization

and less stable in performance. Besides, we observed that MobileNetV2 A, which is the deepest

but shrinks the feature maps to 1 × 1 (height × width), had the worst classification accuracy. As

already mentioned in Section 5.3.1, many studies have shown that batch normalization and residual

learning can effectively address the vanishing/exploding gradient and degradation problems (He

et al., 2016a; Ioffe and Szegedy, 2015; Santurkar et al., 2018). The three architectures involved in

the comparison all used BN and inverted residual blocks extensively. Hence, the deterioration of

the two deeper networks’ performance should not be caused by the vanishing gradient or

degradation problem. We consider that the small size of intermediate feature maps for A and B

may have led to a significant loss of local and global spatial information during the forward

information flow, which is one of the crucial contributors to the reduced performance from

MobileNetV2 A and B. Therefore, we employed the modified MobileNetV2 described in Section

5.3.1 in our analysis presented below.

110

Table 5-9: Network architecture of MobileNetV2 A*

Layer/block name Layer/block output dimension Operator** Stride for 3 × 3

Convolution

Input 32 × 32 × 4 - -

A_Conv1 16 × 16 × 32 3 × 3, 32 2

A_Block0 16 × 16 × 16 [1 × 1, 32

3 × 3 Dwise, 321 × 1, 16

] 1

A_Block1 8 × 8 × 24 [1 × 1, 96

3 × 3 Dwise, 961 × 1, 24

] 2

A_Block2 8 × 8 × 24 [1 × 1, 144

3 × 3 Dwise, 1441 × 1, 24

] 1

A_Block3 4 × 4 × 32 [1 × 1, 144

3 × 3 Dwise, 1441 × 1, 32

] 2

A_Block4 4 × 4 × 32 [1 × 1, 192

3 × 3 Dwise, 1921 × 1, 32

] × 2 1

A_Block5 2 × 2 × 64 [1 × 1, 192

3 × 3 Dwise, 1921 × 1, 64

] 2

A_Block6 2 × 2 × 64 [1 × 1, 384

3 × 3 Dwise, 3841 × 1, 64

] × 3 1

A_Block7 2 × 2 × 96 [1 × 1, 384

3 × 3 Dwise, 3841 × 1, 96

] 1

A_Block8 2 × 2 × 96 [1 × 1, 576

3 × 3 Dwise, 5761 × 1, 96

] × 2 1

A_Block9 1 × 1 × 160 [1 × 1, 576

3 × 3 Dwise, 5761 × 1, 160

] 2

A_Block10 1 × 1 × 160 [1 × 1, 960

3 × 3 Dwise, 9601 × 1, 160

] × 2 1

A_Block11 1 × 1 × 320 [1 × 1, 960

3 × 3 Dwise, 9601 × 1, 320

] 1

A_Conv2 1 × 1 × 1280 1 × 1, 1280 1

A_FC 1 × 1 × 3 avg. pool, three-way FC, softmax -

* This network has 2.26 million parameters, where 2.23 million are trainable and 34 thousand are non-trainable.

** Each square bracket represents an inverted residual block. Every row in the bracket denotes one layer of

operation. For instance, “3 × 3 Dwise, 32” means a 3 × 3 depthwise convolution operated with 64 kernels. The

“avg. pool, three-way FC, softmax” refers to a global average pooling, followed by a three-way fully-connected layer

with softmax. Batch normalization is applied after every convolutional operation, including regular, depthwise

and pointwise convolution.

111

For each of the modified AlexNet, ResNet50 and modified MobileNetV2, we compared the

validation accuracy of the 30 trained models and chose the one with the highest accuracy to be the

final model (i.e., three models, one from each architecture). The box plots in Figure 5-17

summarize the model performance on the validation set after 20 epochs of training. Each box plot

represents the accuracy of the 30 models with the same architecture, except that one model (with

a validation accuracy of 92.1%) for ResNet50 was considered an outlier and excluded from the

box plot. The best classifiers for the modified AlexNet, ResNet50, and modified MobileNetV2

Table 5-10: Network architecture of MobileNetV2 B*

Layer/block name Layer/block output dimension Operator** Stride for 3 × 3

Convolution

Input 32 × 32 × 4 - -

B_Conv1 16 × 16 × 32 3 × 3, 32 2

B_Block0 16 × 16 × 16 [1 × 1, 32

3 × 3 Dwise, 321 × 1, 16

] 1

B_Block1 8 × 8 × 24 [1 × 1, 96

3 × 3 Dwise, 961 × 1, 24

] 2

B_Block2 8 × 8 × 24 [1 × 1, 144

3 × 3 Dwise, 1441 × 1, 24

] 1

B_Block3 4 × 4 × 32 [1 × 1, 144

3 × 3 Dwise, 1441 × 1, 32

] 2

B_Block4 4 × 4 × 32 [1 × 1, 192

3 × 3 Dwise, 1921 × 1, 32

] 1

B_Block5 4 × 4 × 32 [1 × 1, 192

3 × 3 Dwise, 1921 × 1, 32

] 1

B_Block6 2 × 2 × 64 [1 × 1, 384

3 × 3 Dwise, 3841 × 1, 64

] × 3 1

B_Block7 2 × 2 × 96 [1 × 1, 384

3 × 3 Dwise, 3841 × 1, 96

] 1

B_Block8 2 × 2 × 96 [1 × 1, 576

3 × 3 Dwise, 5761 × 1, 96

] × 2 1

B_Conv2 2 × 2 × 1280 1 × 1, 1280 1

B_FC 1 × 1 × 3 avg. pool, three-way FC, softmax -

* This network has 691 thousand parameters, where 672 thousand are trainable and 19 thousand are non-trainable. ** Each square bracket represents an inverted residual block. Every row in the bracket denotes one layer of

operation. For instance, “3 × 3 Dwise, 32” means a 3 × 3 depthwise convolution operated with 64 kernels. The

“avg. pool, three-way FC, softmax” refers to a global average pooling, followed by a three-way fully-connected layer

with softmax. Batch normalization is applied after every convolutional operation, including regular, depthwise

and pointwise convolution.

112

had a validation accuracy of 98.3%, 99.1%, and 99.0%, respectively. These three best classifiers

were chosen to be the final models for moisture map generation, and they were also evaluated

against the test set. It is worth noticing that although the modified MobileNetV2 models had the

highest mean and median accuracy, the best performance was achieved by a ResNet50 model,

which correctly classified 99.1% of the validation rasters (i.e., 11,891 out of 12,000 examples were

classified correctly). This variation in model performance reveals the inherent randomness and

uncertainty involved in the model training process.

The performances of the three final models on the test set (12,000 testing examples) are reported

in Table 5-11. The highest classification accuracy was 99.2% from the ResNet50 model.

Nonetheless, the computational time (with a batch size = 1) for the model is approximately three

Figure 5-16: Comparison of learning performance of the modified MobileNetV2 (red), MobileNetV2 A

(magenta), and MobileNetV2 B (cyan) on the training and validation sets. The upper and lower bounds

of the shaded areas are the maximum and minimum values among the 30 models, respectively. Notice

the different scale in y-axes of the two plots. (a) Training accuracy curves of the three architectures.

Each solid line represents the mean training accuracy of the 30 models with the same architecture. (b)

Validation accuracy curves of the three architectures. Best viewed in colour.

Table 5-11: Evaluation results of the final classification models on the test set (12,000 testing examples)

Metrics Modified AlexNet ResNet50 Modified MobileNetV2

Classification Accuracy 98.1% 99.2% 99.0%

Runtime (batch size = 1)

on NVIDIA GeForce

GTX 1050 Ti 4 GB

6 ms/example 17 ms/example 4 ms/example

113

times the modified AlexNet and four times the modified MobileNetV2 on the “laptop” computer

(Table 5-8). The high accuracy came with the cost of long execution time and a large memory

footprint. In contrast, the modified MobileNetV2 model achieved a high prediction accuracy

(99.0%) and the shortest runtime (4 ms/example); thus, provided the best balance between

classification accuracy and computational efficiency.

Figure 5-17: Validation accuracy of the three employed architectures. Each boxplot represents the

distribution of validation accuracy for the 30 models with the same architecture. An outlier (with a

validation accuracy of 92.1%) for the ResNet50 is not shown.

114

5.3.4 Moisture Map Generation

The moisture map generation process consists of three steps, which are graphically summarized in

Figure 5-18. The entire workflow was implemented in Python 3, and there is no human

intervention required after specifying the input raster based on which a moisture map should be

generated. In this study, the inputs to the workflow were the overview rasters of the HLP that had

been produced during the data preparation (see Section 5.2.3). However, the process can be applied

to any input data that fulfill the following requirements. The input of the process should be a four-

channel raster with a height and width of at least 32 pixels. The first three channels of the raster

ought to be red, green, and blue colour channels, respectively, while the fourth should be a

temperature channel. All pixel values contained in the input raster should be floating-point

numbers, where every intensity value in the colour channels should be within the range of [0, 255].

The temperature values in the temperature channel should be expressed in degrees Celsius, and

thus the digital numbers generally fell within the range of 10 to 70 in our case.

Since every overview raster covered a large ground area and had several thousand of pixels for

both height and width, the first step was to subdivide an input raster into non-overlapping tiles

such that each tile corresponds to a small area over the HLP. As mentioned in Section 5.2, the

moisture classifiers were designed for data with an input dimension of 32 × 32 × 4. Therefore,

each raster tile after subdivision had a height and width of 32 pixels. If the size of an input is not

divisible by 32 for either its height, width or both, we omitted the right-most columns and/or

bottom-most rows to avoid resizing and transforming the input during the process such that the

generated moisture map (i.e., the output) had the same GSD as the input raster. In this way, the

number of created raster tiles could be calculated as 𝑁tiles = ⌊𝐻in

32⌋ ∙ ⌊

𝑊in

32⌋ , where ⌊∙⌋ is the floor

operator; Hin and Win are the height and width of the input raster, respectively; and Ntiles is the

number of tiles obtained after the subdivision. Afterwards, the second step of the process was to

use a moisture classification model that had been developed to identify and assign a moisture class

to each tile (Figure 5-18). The CNN classifier used in this procedure could be any one of the three

final models described in Section 5.3.3.

The final step of the workflow was to colour code and combine all the tiles to produce the moisture

map. The colour coding was performed based on the moisture class of a tile, and every pixel within

the same tile was assigned with the same colour. We denoted the “wet” (> 8% moisture content),

115

“moderate” (4% to 8%), and “dry” (< 4%) class by using a blue, greenish, and red colour,

respectively, as shown in Figure 5-18. It is worth noting that the GSD of each raster was

approximately 10 cm/pixel, which means each tile represented an area of 3.2 m × 3.2 m on the

HLP surface. In this way, the colour coding assumed that the moisture status was uniform within

Figure 5-18: Moisture map generation using a convolutional neural network (CNN) classifier.

116

the area covered by each raster tile. Lastly, we arranged all the colour-coded tiles according to

their initial positions in the input raster and combined them to generate the final output. A

comparison example of the generated moisture maps using the modified AlexNet, ResNet50, and

modified MobileNetV2 classifiers is provided in Figure 5-19. The ground-truth moisture maps in

Figure 5-19 were generated by following the same process used to prepare the training, validation,

and testing data (Section 5.2.4). In short, a ground-truth moisture map for an overview raster was

created by first applying equation (4.2) to every digital number in the temperature channel,

followed by performing a threshold operation to categorize the pixels that had an estimated

moisture content greater than 8% to be the “wet” class, those smaller than 4% as the “dry” class,

and the remaining as the “moderate” moisture class.

Overall, we can conclude that HLP surface moisture maps were efficiently and accurately

generated using the CNN-based moisture classifiers. Correlation between the input rasters and

their corresponding moisture classes were effectively learned by the trained networks, where the

error rates were below 2% for the final models. Meanwhile, the three employed CNN architectures

produced comparable results against each other, and only minor differences were observed in the

resultant output. The ResNet50 model achieved the highest classification accuracy, but the

execution time was relatively slow. Both the modified AlexNet and MobileNetV2 models had

short inference time, where the modified MobileNetV2 provided the best balance between

classification accuracy and computational efficiency. However, despite the effectiveness of using

CNN classifiers to identify the average moisture status of a given area, the prediction output from

the classification models was in coarse resolution. It did not preserve sufficient details for depicting

a fine-grained moisture distribution over a local region. As mentioned previously, the workflow

described in Figure 5-18 assumed that the moisture content was constant across the 3.2 m × 3.2 m

area represented by each raster tile. This assumption is acceptable if an overview of a large study

area is required (e.g., the moisture maps for the whole HLP shown in Figure 5-19). Nevertheless,

some monitoring tasks in practice, such as pinpointing malfunctioning sprinklers, may require the

moisture maps to provide information with a sub-meter level resolution. Given the dataset we have,

CNN classifiers are incompetent for such tasks because, by definition, classification performs

coarse inference that makes a prediction for a whole input. To achieve fine-grained inference, we

developed a semantic segmentation model in the next section (Section 5.4) to perform per-pixel

moisture prediction, so that moisture maps with fine details can be generated.

117

Figure 5-19: A comparison of the generated moisture maps using the modified AlexNet, ResNet50, and

modified MobileNetV2 moisture classifiers. The top two lifts moisture maps were generated based on

March 8, Afternoon dataset, and the moisture maps for the whole HLP were created based on March 7,

Morning dataset.

118

5.4 Segmentation-Based Heap Leach Pad Surface Moisture Mapping

This section presents our methodology for using a semantic segmentation CNN to generate fine-

grained moisture maps based on our prepared rasters for the HLP. We first illustrate the semantic

segmentation network that we have adopted in the experiment, followed by elaborating on the

model training and evaluation process. The section ends with depicting the moisture map

generation workflow, while several examples of the generated moisture maps by our method are

also provided.

5.4.1 Network Architecture

Semantic segmentation is a natural step towards fine-grained inference after image classification,

object detection and boundary localization (Garcia-Garcia et al., 2017; Lateef and Ruichek, 2019).

The goal of semantic segmentation is to accurately provide pixel-wise labelling for an input raster

such that the pixels correspond to different objects or regions within the raster can be correctly

classified and localized (Ulku and Akagunduz, 2020). The semantic segmentation task has long

been a hot and challenging topic in computer vision due to its broad range of applications, such as

autonomous robotics (Zhang et al., 2018), medical image analysis (Ronneberger et al., 2015),

ground moisture estimation (Zhang et al., 2020), and agriculture and industrial inspection (Kemker

et al., 2018; Sharifzadeh et al., 2020). Despite the accomplishments that have been achieved by

various traditional techniques (Zaitoun and Aqel, 2015; Ulku and Akagunduz, 2020), CNNs have

demonstrated their ability to surpass the traditional methods by a large margin in prediction

accuracy and sometimes computational efficiency (Garcia-Garcia et al., 2017). There are a large

number of semantic segmentation architectures proposed in the literature, where Lateef and

Ruichek (2019), and Ulku and Akagunduz (2020) provided thorough reviews on the

categorizations of different networks as well as the evolution of semantic segmentation CNNs.

U-Net is a popular semantic segmentation CNN proposed by Ronneberger et al. (2015) that has a

typical encoder-decode structure. The network architecture is comprised of two parts: the encoder

part (also called contracting path or compression stage in the literature) that convolves an input

raster and gradually reduces the spatial size of the feature maps; and the decoder part (also called

expansive path or decompression stage) which gradually recovers the spatial dimension of the

intermediate features and produces an output raster with the same or a similar height and width as

the input (Clement and Kelly, 2018; Ronneberger et al., 2015). The encoder and decoder parts are

119

more or less symmetric, and the network appears as a U shape leading to the name of the

architecture (Ronneberger et al., 2015). Many studies have shown that U-Net-based models can

be used to perform segmentation for different applications, including radio signal processing

(Akeret et al., 2017), image appearance transformation (Clement and Kelly, 2018), vegetation

detection (Ulku et al., 2019), and biomedical image processing (Ronneberger et al., 2015).

In this study, we employ a modified version of the original U-Net to perform pixel-wise moisture

classification over the prepared raster data. Overall, our modified U-Net involves four levels of

depth as shown in Figure 5-20. Every level in the encoding part consists of two 3 × 3 convolutions

(with zero padding and a stride of one), each followed by a ReLU nonlinearity. The downsampling

from one level to the next is performed using a 2 × 2 max pooling with a stride of two such that

the height and width of the feature map are halved after the pooling operation. Within each level

of the encoder part, we double the number of feature channels, and the feature maps at the fourth

level (i.e., the bottom level) have 512 channels (see Figure 5-20). In the decoder part of the

network, every preceding level is connected to the succeeding one through a transposed

convolution (also called up-convolution or sometimes deconvolution in the literature) that halves

the number of channels and doubles the height and width of the feature map. Meanwhile, a skip

connection is used to copy and concatenate the feature map at the same level of the encoder part

to the corresponding upsampled decoder feature so that the model can learn to create well-localized

hierarchical features and assemble precise output (Ronneberger et al., 2015; Ulku and Akagunduz,

2020). Every level of the decoder part consists of two 3 × 3 convolutions (with zero padding and

a stride of one), each followed by a ReLU nonlinearity. To generate the output, we use a 1 × 1

convolution with softmax to map the 64-channel features to the final three-channel feature map,

where each channel contains the predicted probabilities for the corresponding moisture class at

each pixel (i.e., the first channel at a pixel contains the probability for the “dry” class, and the

second and third channels are for the “moderate” and “wet” classes, respectively). The single-

channel segmentation map can then be created by applying a pixel-wise argmax operation over

the three-channel feature such that the moisture class with the highest probability is used to

represent the moisture status at each pixel location. In total, the network has 7.7 million parameters,

which are all trainable entries.

120

Figure 5-20: The modified U-Net architecture employed in this study. There are four levels of depth in

the architecture, where the bottom level is the fourth level, and the input and output are at level one.

Each box represents a multi-channel feature map, and the number of channels is labeled at either top or

bottom of the box. The height × width of the feature maps are denoted at the lower-left corners of the

boxes. The green boxes correspond to the feature maps generated in the encoder part of the network,

where the red boxes belong to the decoder part. The arrows with different colours represent different

operations. For the convolutional operations (Conv), letter “s” and “p” stand for stride and zero padding,

respectively. Best viewed in colour. Modified based on Ronneberger et al. (2015).

121

5.4.2 Training Setup

We implemented our modified U-Net2 in TensorFlow 2 (Abadi et al., 2016), and the models were

trained, evaluated, and tested using the raster data in the segmentation dataset described in Section

5.2. All models were trained from scratch using RMSProp optimizer with a learning rate of 0.001,

a momentum of zero, and a decay of 0.9 (Hinton et al., 2012). We did not use any extra learning

rate decay during the training, and neither dropout nor batch normalization was used in the

network. We normalized the training data to have zero mean and unit standard deviation for every

channel, and the data normalization procedure is described in Section 5.3.2. We did not perform

any data augmentation during the training process because we considered our training set (25,313

training examples) sufficiently large to train a semantic segmentation CNN.

Similar to the training setup for the classification models (Section 5.3.2), we trained 30 models of

our modified U-Net in which each model has a different initialization of the network parameters.

We initialized the models using the Kaiming initialization (He et al., 2015), where each

initialization adopted a designated seed for the random number generator in TF (i.e., one seed for

one model). The 30 seeds that were used for the model initialization is provided in Section 5.3.2.

Since models with different initialization resulted in different prediction accuracy, we selected the

one that returned the best performance on the validation set to be the final model, and the final

model was evaluated against the test set. We trained each model for 20 epochs with a minibatch

size of 32 and used the cross-entropy loss (sparse categorical crossentropy in TF) as the loss

function. The model training was carried out on the “workstation” computer with an NVIDIA RTX

2080 Ti 11 GB GPU (Table 5-8), and the training for the 30 models took approximately 12.5 hours

in total (i.e., 25 mins for every 20 epochs).

5.4.3 Model Evaluation

In this study, we adopted two commonly used accuracy metrics for semantic segmentation, Pixel

Accuracy (a.k.a. global accuracy) and Mean Intersection over Union (MIoU), to assess the trained

models. Pixel accuracy is a simple metric that calculates the ratio between the number of correctly

2 Our implementation was modified mainly based on https://github.com/zhixuhao/unet and https://github.com

/jakeret/unet, whilst the original implementation by Ronneberger et al. (2015) was in Caffe and is available at

https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/.

https://github.com/zhixuhao/unet

https://github.com/jakeret/unet

https://github.com/jakeret/unet

https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/

122

classified pixels and the total number of pixels (Garcia-Garcia et al., 2017; Ulku and Akagunduz,

2020). The formula of pixel accuracy is defined as

𝑃𝐴 =

∑ 𝑛𝑖𝑖𝑁cls𝑖=1

𝑁pixel, (5.1)

where PA is the pixel accuracy expressed in percentage; 𝑁cls is the total number of the classes,

which is equal to three in our case for the three moisture classes; 𝑛𝑖𝑖 denotes the number of pixels

that are both predicted and labelled as the ith class (i.e., true positive, TP); and 𝑁pixel is the total

number of pixels involved. The pixel accuracy metric is commonly used due to its simplicity, but

it may not be a good measure of the model performance when the dataset has a class imbalance

issue. A model that consistently biases toward the majority class can still result in a relatively high

pixel accuracy because of the class imbalance. Therefore, we also assessed the trained model using

MIoU, which is the most widely used and considered the standard metric for evaluating

segmentation techniques (Garcia-Garcia et al., 2017; Lateef and Ruichek, 2019). The expression

of MIoU is given as

𝑀𝐼𝑜𝑈 =1

𝑁cls∑

𝑛𝑖𝑖

𝑛𝑖𝑗 + 𝑛𝑗𝑖 + 𝑛𝑖𝑖

𝑁cls

𝑖=1

, 𝑖 ≠ 𝑗 (5.2)

where 𝑁cls is the total number of the classes; subscript i and j are indices denoting different classes;

𝑛𝑖𝑖 indicates the number of pixels that are both predicted and labelled as the ith class; 𝑛𝑖𝑗 is the

number of pixels that are predicted as the ith class, but the true label is the jth class (i.e., false

positive, FP); and 𝑛𝑗𝑖 is the number of pixels that are predicted as the jth class, but the true label is

the ith class (i.e., false negative, FN). In short, the MIoU metric is the ratio between true positive

(i.e., intersection) and the sum of TP, FP, and FN (i.e., union) averaged over the number of classes

involved (Lateef and Ruichek, 2019; Ulku and Akagunduz, 2020).

The performances of the 30 models on the training and validation sets after 20 epochs of training

are summarized in Table 5-12. Most of the trained models resulted in high pixel accuracy and

MIoU scores, and the best performer on the validation set was selected as the final model to

examine against the test set. It is worth noting that the highest pixel accuracy (99.7%) and MIoU

(98.9%) on the validation data were achieved by the same model in our experiment. However, in

practice, the highest scores on different metrics may be accomplished by different models, so a

123

weighted average of the scores may be used as a synthetic metric for evaluating the trained models.

Despite the successful ones, five of the models failed to converge and lead to unsatisfactory

performance as shown in Table 5-12. Future studies may adopt a learning rate warmup strategy to

improve the success of convergence by avoiding optimization difficulties in the early stage of the

training process (Goyal et al., 2017; He et al., 2016a).

The performance of the final model on the test set is presented in Table 5-13. Overall, the final

model achieved similar performance on the test set compared to the validation set, and the model

demonstrated a good competency and generalizability in creating accurate segmentation maps

based on our prepared data. Although the inference speed of the model is not fast compared to

many recent networks (Poudel et al., 2019; Zhuang et al., 2019), the model runtime is considered

acceptable because the key focus in this study is on prediction accuracy rather than running speed

of the model. Moreover, the average runtime on each input raster can be shortened by using a

larger batch size as shown in Table 5-13. This final model was then used in the moisture map

generation process to perform per-pixel moisture prediction.

Table 5-12: Performance for the modified U-Net models on the segmentation dataset (30 models).

Data Training Set (25,313 rasters) Validation Set (3,000 rasters)

Metric Pixel Accuracy MIoU Pixel Accuracy MIoU

Highest 99.3% 96.4% 99.7%* 98.9%*

Upper Quartile 99.1% 95.6% 98.9% 93.8%

Median 98.5% 93.2% 98.3% 92.3%

Lower Quartile 93.2% 61.0% 92.5% 60.3%

Lowest 35.4% 11.8% 35.7% 11.9%

* The highest pixel accuracy and MIoU were achieved by the same model in our experiment, and thus, the model

was used as the final model for moisture map generation. In practice, the two metrics can be combined to form a

synthetic metric to determine the best performing model.

Table 5-13: Evaluation results of the final segmentation model on the test set.

Metric Test Set

(3,000 rasters, 64 × 64 × 4)

Pixel Accuracy 99.7%

MIoU 98.6%

Runtime* (batch size = 1) 67 ms/example

Runtime* (batch size = 32) 5 ms/example * The model runtime was measured on a NVIDIA GeForce GTX 1050 Ti 4 GB GPU (Table 5-8).

124

5.4.4 Moisture Map Generation

The semantic segmentation-based moisture map generation process consisted of three steps

(Figure 5-21), and the whole workflow was automated in a script written in Python 3. The input to

the process should be a four-channel raster with at least 64 pixels in both height and width. The

four channels should be sequentially red, green, blue, and temperature channels, while all the

digital numbers should be floating-point values. In this study, the inputs to the workflow were the

overview rasters of the HLP, and the outputs were the generated moisture maps that had the same

GSD as the inputs.

The first step of the moisture map generation process was to subdivide an input raster into non-

overlapping small tiles such that each tile had a dimension of 64 × 64 × 4 (height × width ×

channel). If the input size was not divisible by 64 for its height, width, or both, we omitted the

right-most columns and/or bottom-most rows to avoid geometrically transforming the input raster.

In this way, the number of created raster tiles could be calculated as 𝑁tiles = ⌊𝐻in

64⌋ ∙ ⌊

𝑊in

64⌋, where ⌊∙⌋ is

the floor operator; 𝐻in and 𝑊in are height and width of the input raster, respectively; and 𝑁tiles is the

number of tiles after the subdivision. The second step of the process was to utilize the modified U-

Net model to create a segmentation map for each raster tile, such that every pixel of the raster tiles

was assigned with a moisture class. The “dry”, “moderate”, and “wet” moisture classes were

denoted by using red, greenish, and blue colour, respectively, as shown in Figure 5-21. Finally,

the individual segmentation maps were combined based on their corresponding positions in the

input raster to generate the output moisture map. An example of the generated moisture maps for

the HLP is provided in Figure 5-22.

Overall, the moisture maps generated by our method shown in Figure 5-22 were approximately

the same as the ground-truth thanks to the high prediction accuracy of the modified U-Net model.

The ground-truth moisture maps were generated by following the same process used to prepare

the training, validation, and testing segmentation data (Section 5.2.4). In short, a ground-truth

moisture map for an overview raster was created by first applying equation (4.2) to every digital

number in the temperature channel, followed by performing a threshold operation to categorize

the pixels that had an estimated moisture content greater than 8% to be the “wet” class, those

smaller than 4% as the “dry” class, and the remaining as the “moderate” moisture class. The results

125

depicted in Figure 5-22 demonstrate that the pixel-wise mapping between the input rasters and the

output segmentation maps were effectively learned by the trained network, and the generated

moisture maps had the same GSD (10 cm/pixel) as the model input. In this way, a fine-grained

visualization of the HLP surface moisture distribution could be produced by following the

workflow illustrated in Figure 5-21.

Figure 5-21: Moisture map generation using CNN-based semantic segmentation.

126

Figure 5-22: A comparison example between our generated moisture maps (right) and the ground-truth

(left). The ground-truth moisture maps are generated following the same method in which the

segmentation dataset is prepared (Section 5.2.4). (a) Generated moisture maps for the whole HLP

(March 7, Morning dataset). (b) Generated moisture map for the top two lifts of the HLP (March 8,

Afternoon dataset). (c) Created segmentation (moisture) map based on a test set example (ID_00017)

with a height and width of 64 pixels.

127

5.5 Discussion and Conclusion

The results shown in Figure 5-19 and Figure 5-22 demonstrate the feasibility of using

convolutional neural networks to generate HLP surface moisture maps based on the acquired

remote sensing data. All the classification and segmentation CNNs employed in our experiments

were capable of learning correlations between the four-channel input rasters and the corresponding

output moisture classes. Most of the trained models manifested high prediction accuracy on our

prepared dataset, and the training scheme that we adopted was effective for training the model to

converge successfully. One significant advantage of CNNs is their ability to extract hierarchical

knowledge from the input while accommodating data from various sources with different

modalities (e.g., data acquired from visible-light and thermal cameras) (Gómez-Chova et al.,

2015). Such capacity allows CNN models to adapt to practical applications that include multiple

sensors, and many studies have demonstrated that CNNs can be deployed in automated systems to

perform real-time prediction (Burnett et al., 2020; Wei et al., 2018). In our case, since the data

processing and moisture map generation were conducted offline, the execution speeds of the

predictive models were not considered a crucial metric when evaluating the model performance.

However, the future progress of this project will emphasize onboard data analysis, and the ultimate

goal is to develop a system that is capable of real-time and on-demand HLP surface moisture

monitoring. Therefore, future studies will devote to integrating efficient and accurate CNN models

into the existing UAV platform to for real-time performance.

In this study, the data preparation process described in Section 5.2 was used to create adequate

datasets for training, validating, and testing CNN models. The workflow produced multichannel

raster data with geometric precision, and the resultant rasters contained colour and temperature

information derived from the visible and thermal images. Although we used only visible-light and

thermal infrared imagery in our experiment, the proposed method can be utilized to combine raw

image data captured in other spectral regions (e.g., near-infrared). Despite the effectiveness of our

approach, the time-consuming data preprocessing and association process remains a critical

challenge, especially if efficient data analysis is desired. The image registration, for instance, was

performed manually to co-register the colour and thermal orthomosaics. Such a process was

labour-intensive, and the resultant product was subject to reproducibility issues due to the

extensive human intervention. A number of studies in the literature have proposed various area-

based approaches (e.g., Fourier methods, mutual information methods) and feature-based

128

approaches (e.g., using image feature extractors combined with the random sample consensus

algorithm to compute image transformation) to automate the registration of data with

multimodality (Aganj and Fischl, 2017; Kemker et al., 2018; Liu et al., 2018; Raza et al., 2015).

Our future studies will incorporate and implement several of these methods to streamline and to

automate the data processing for a more efficient data analysis workflow.

The classification and segmentation models generate moisture maps with a different resolution of

spatial details, where the distinction is less significant when the input raster covers a large studied

area but more prominent when the input size (height and width) is relatively small. Figure 5-23

provides comparison examples of the generated moisture maps by the two types of models using

input rasters with different sizes. As shown in Figure 5-23(a), the two maps display the same

moisture distribution pattern across the whole HLP, and thus both versions are suitable to provide

an overview of the moisture status of the HLP surface. However, the distinction becomes apparent

when the ground area covered by the input is relatively small. The generated result by the

classification model in Figure 5-23(b) appears pixelated, and the wetted area (or wetted radius) of

each sprinkler is not noticeable in the classification moisture map. In contrast, the segmentation

moisture map preserves fine-grained details, and the sprinklers that were not working at their full

capacity can be easily pinpointed (e.g., those at the bottom-right corner and several at the center

of the studied region). Moreover, if an input raster has a size of 64 × 64 (i.e., representing

approximately 6.4 m × 6.4 m over the HLP), the results generated by the classification model are

at coarse resolution (Figure 5-23(c)), which are not particularly useful for studying the moisture

distribution within the area. Conversely, the semantic segmentation model performs pixel-wise

prediction, and thus the boundaries of different moisture zones are clearly outlined in the generated

moisture maps.

Although moisture maps generated by the segmentation model preserve fine-grained details, the

model has a larger number of parameters and require more multiply-adds calculations than its

classification counterpart. This implies that the segmentation model requires more computational

resources (i.e., memory footprint and computational operations), and the computational burden

may become a crucial challenge if an efficient data analysis process is required. Therefore, the

decision on which CNN model to use should consider not only the amount of detail required from

the output but also the amount of time and computational resources available for the inference.

129

Figure 5-23: Comparison examples between the HLP moisture maps generated by using classification

and segmentation CNN models. The “dry”, “moderate”, and “wet” moisture classes are denoted by red,

greenish, blue colour, respectively. The classification- and segmentation-based moisture maps were

generated using the modified MobileNetV2 and modified U-Net models, respectively. (a) Generated

moisture maps for the whole HLP using classification (left) and segmentation (right) models. The input

raster had a height × width of 4800 × 6848. (b) Generated moisture maps for top two lifts of the HLP

using classification (left) and segmentation (right) models. (c) Generated moisture maps based on

examples from the segmentation dataset (one from each training, validation, and test set). The upper

rows are the input to the models, and the bottom rows are the output moisture maps. The top-left are the

RGB channels of the input rasters, and the top-right are the temperature channel (shown in grayscale

images) of the input. The bottom-left and bottom-right are the moisture maps generated using

classification and segmentation models, respectively. Each input had a size of 64 × 64.

130

In conclusion, this chapter presented our methodology for generating HLP surface moisture maps

using classification and segmentation CNNs based on visible-light and thermal infrared data

acquired by an unmanned aerial vehicle. The full process consisted of multiple stages, starting

with image preprocessing and data preparation, followed by CNN model development, training

and evaluation, and ending with moisture map generation. Each stage involved a sequence of steps,

and the implementation details of each step were elaborated throughout the chapter. Overall, the

most time-consuming and labour-intensive stage was the data preparation, where future studies

will devote to automate the process and to generate a more efficient data processing workflow. In

addition, the workflow for creating moisture maps with classification and segmentation models

were separately provided, and the proposed method can be deployed to perform HLP monitoring

as well as other applications. Future works will focus on incorporating more efficient CNN models

into the workflow and designing capable systems for conducting real-time data analysis.

131

Chapter 6 Conclusion

Conclusion, Recommendation, and Future Work

This thesis presented a thorough case study of implementing the general workflow for HLP surface

moisture monitoring, starting from UAV-based data collection, followed by off-line data

processing, and ending with surface moisture map generation. Methodology and implementation

details were explained throughout the thesis, and the benefits and limitations of the proposed

methods were discussed. The results have demonstrated the feasibility and practicality of the

proposed data acquisition and data analysis approaches. Overall, the practicality of the proposed

HLP surface moisture monitoring workflow resides in two factors: the improved data acquisition

process by using a UAV system; and the direct visualization of surface moisture variation through

the informative and intuitive moisture maps.

The main advantages of data acquisition using UAV-based remote sensing techniques are the

reduced time effort in data collection and the increased safety of personnel. By employing a UAV

system, a large survey area can be mapped without disrupting ongoing production operations, and

the regions inaccessible by human operators can also be covered. This leads to an on-demand and

nearly real-time data acquisition, which provides high-resolution data. In the field experiment

conducted at El Gallo gold mine’s HLP (Chapter 3), the time spent by the technical staff to collect

five ground samples at the sampling locations was approximately the same as the total flight times

of the two flight missions. The flight missions not only covered the entire HLP with high image

resolution, but also the top two lifts of the HLP with even finer resolution. In this way, flight

altitude was adjusted to target different regions over the HLP. This can become useful when

investigating regions inaccessible to humans. In addition, UAV-based data acquisition avoids

directly exposing technical staff to hazardous material (i.e., dilute cyanide solution), which

increases workplace safety. The collected data become permanent records of the HLP, which can

be used not only for generating surface moisture maps but also for change detection and

monitoring, HLP volume estimation, material particle size analysis, and HLP slope stability

analysis (Bamford et al., 2017; Medinac and Esmaeili, 2020; Zhang and Liu, 2017).

In general, several recommendations can be made for the deployment of UAV-based data

acquisition over HLPs. The selection of flight altitude and viewing angle should be decided based

132

on the surveying objective. If an overview of a large area is required, a flight mission with a high

flight altitude can be conducted to cover the entire area within the battery constraints. In contrast,

if a local region over the HLP (e.g., a slope) is to be investigated, a low flight altitude with an

oblique camera angle can be adopted to collect high-resolution and representative thermal

measurements. In general, the predawn time (around 4 a.m.) is the best for conducting thermal

infrared data collection because the differential heating effect is minimized. However, UAV

navigation over a large study area is difficult during periods of darkness, especially when both the

drone and the ground features are not physically visible by the pilot (Lillesand et al., 2015). Other

logistic issues, such as accessibility of the mine site during the midnight shift, can become

additional difficulties for the surveying team to acquire data in the predawn hours. Therefore, a

good alternative is to conduct field data collection in the early afternoon (2 – 3 p.m.), where most

of the ground features are at their maximum temperature (Gupta, 2017; Jensen, 2009; Sugiura et

al., 2007). In the case of rain, a common practice for data collection after rains is to delay the flight

by up to one day so that the influence of rain on moisture content becomes less significant on the

ground surface (Gupta, 2017). Lastly, appropriate cleaning and maintenance of the UAV system

after each data collection campaign is always recommended to improve equipment durability.

In Chapter 4, a framework for generating HLP surface moisture maps by using the acquired

thermal images and in-situ moisture measurements was proposed. The obtained data were first

used to derive an empirical relationship between the surface moisture content and the remotely

sensed surface temperature using linear regression. The thermal images were then used to generate

thermal orthomosaics representing the surface temperature across the HLP. The moisture maps

were lastly created by applying the linear relationship over the orthomosaics such that the HLP’s

surface moisture content was estimated. This framework is practical because of its efficient

product generation process and the adequate accuracy of the generated results. As soon as the

empirical model is developed, a moisture map can be generated within an hour of the data

acquisition, and the spatial distribution of the material moisture over the HLP can be intuitively

visualized. The produced moisture maps have a GSD of approximately 10 cm/pixel, and this

spatial resolution is hardly achievable by conventional point-measurement manual methods. The

limitations and possible improvements of the proposed method were carefully discussed in Section

4.6. Overall, future improvement of the moisture estimation step should take the various influential

factors into consideration, including meteorological and environmental conditions, material

133

properties, solar angles, geographical locations, and active heat sources. These factors often have

profound effects on the moisture estimates, and the ability for a model to account for these

components can increase the estimate accuracy. Despite what moisture estimation model is used,

one important recommendation is to regularly validate and calibrate the model with newly

collected data and always consider the site-specific operational and meteorological conditions

when interpreting the generated results.

Chapter 5 elaborated on the methodology for generating HLP surface moisture maps using two

kinds of convolutional neural networks based on the acquired colour and thermal images. The

explanation of the two approaches started with data preparation, followed by network construction,

model training, and ended with model evaluation and moisture map generation. Two custom

datasets were prepared based on the collected aerial remote sensing images, and the dataset

statistics were summarized and presented. Implementation details were provided throughout the

chapter, and a discussion comparing the two approaches was also outlined. Overall, the results

generated by both methods demonstrated the feasibility of using CNNs to produce surface moisture

maps, and the advanced computer vision techniques can bring values into the HLP monitoring

process. One significant advantage of CNNs is their ability to accommodate data taken from

different sensors simultaneously while learning complex functions automatically without the need

for feature engineering and variable selection. However, one remaining challenge was the time-

consuming and labour-intensive data preparation stage, where future studies will devote to

automate the process and to generate a more efficient data processing workflow. Regarding the

proposed approaches, the two methods generate moisture maps with a different resolution of

spatial details, where the distinction is more prominent when the input size is relatively small but

less significant when the input covers a large studied area. The selection between the two should

consider the resources available for computation and the amount of detail required from the

generated moisture maps.

To sum up, this thesis demonstrated that UAV-based data collection increases workplace safety

and the quality and quantity of acquired data. The proposed data analysis methods generate

informative surface moisture maps, which bring significant value to the HLP monitoring process.

134

6.1 Major Contributions

The main contributions of this work were:

• Emphasizing the importance of heap leach pad surface moisture monitoring and promoting

a complementary framework to the conventional monitoring techniques.

• Presenting a general workflow for HLP surface moisture mapping, starting from UAV-

based data collection, followed by off-line data processing, and ending with surface

moisture map generation.

• Providing a thorough case study for the implementation of the proposed monitoring

approach at an operating gold heap leach pad. The methodology and implementation steps

were described and explained in great detail, where recommendations for the deployment

of the proposed method were also outlined.

• Deriving a regression model that correlates remotely sensed surface temperature to material

surface moisture content.

• Incorporating advanced deep learning-based computer vision techniques into the moisture

map generation process. The results demonstrated the feasibility of using CNN models for

moisture map generation.

• Constructing two custom datasets, which can be used for the development of image

classification and semantic segmentation CNN models. The data involved in each dataset

were four-channel rasters, where the first three channels were sequentially red, green, and

blue colours, and the fourth one was a temperature channel.

• Discussing the various factors that can potentially influence the aerial remote sensing

measurements and the generated moisture maps in the context of HLP monitoring.

135

6.2 Future Work

Overall, the application of UAV technology in HLP monitoring has not yet been fully appreciated.

There exist many areas for future research and improvement:

• The future progress of this project should emphasize onboard data analysis, and the

ultimate goal is to develop a system that is capable of real-time and on-demand HLP

surface moisture monitoring. The entire data processing pipeline should move towards

fully automated to minimize human intervention. Such improvement can increase the

reproductivity of the generated results as well as reduce time effort for data processing.

• The empirical univariate model (equation (4.2)) developed in this work has an inability to

explain the effects of all the influential factors when relating the remotely sensed surface

temperature to the surface moisture content of the HLP material. To improve the proposed

method, a more sophisticated model that involves more variables may be developed in

future studies to account for the influences of meteorological, environmental, and

geographical factors on the moisture estimation.

• In this work, the field experiment and data acquisition were conducted at only one mine

site during a three-day duration. Future studies should conduct more field experiments and

collect more representative data to further improve and refine the proposed workflow for

HLP monitoring. Particular attention should be put on investigating HLPs that involve

extensive exothermic reactions (e.g., leaching of sulfide minerals).

• In this study, thermal and colour images were acquired by equipping an UAV platform

with one thermal and one RGB camera. Future studies may explore the feasibility of using

UAV systems to collect multispectral or hyperspectral data for HLP monitoring. The

collected information will be beneficial for various monitoring tasks because the remote

sensing data captured in different wavelengths can reveal different properties of the HLP

material. This direction will result in a new set of data analytic methods, which will

complement the proposed approaches in this work.

136

Bibliography

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., … Zheng, X. (2016).

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. ArXiv

Preprint ArXiv:1603.04467.

Aganj, I., & Fischl, B. (2017). Multimodal image registration through simultaneous

segmentation. IEEE Signal Processing Letters, 24(11), 1661–1665.

Aggarwal, C. (2018). Neural Networks and Deep Learning. Springer.

Akeret, J., Chang, C., Lucchi, A., & Refregier, A. (2017). Radio frequency interference

mitigation using deep convolutional neural networks. Astronomy and Computing, 18, 35–

39. https://doi.org/10.1016/j.ascom.2017.01.002

Alom, Z., Taha, T. M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M. S., … Asari, V. K.

(2019). A State-of-the-Art Survey on Deep Learning Theory and Architectures. Electronics,

8, 292. https://doi.org/10.3390/electronics8030292

Alvarado, M., Gonzalez, F., Fletcher, A., & Doshi, A. (2015). Towards the Development of a

Low Cost Airborne Sensing System to Monitor Dust Particles after Blasting at Open-Pit

Mine Sites. Sensors, 15(8), 19667–19687. https://doi.org/10.3390/s150819667

Amara, J., Bouaziz, B., & Algergawy, A. (2017). A Deep Learning-based Approach for Banana

Leaf Diseases Classification. Datenbanksysteme Für Business, Technologie Und Web, 79–

88.

ASTM. (2017). D2487-17. https://doi.org/10.1520/D2487-17.

Ba, J. L., Kiros, J. R., & Hinton, G. E. (2016). Layer Normalization. ArXiv Preprint

ArXiv:1607.06450.

Bai, L., Zhao, Y., & Huang, X. (2018). A CNN Accelerator on FPGA Using Depthwise

Separable Convolution. IEEE Transactions on Circuits and Systems II: Express Briefs,

65(10), 1415–1419.

Baldi, P., & Sadowski, P. (2014). The dropout learning algorithm. Artificial Intelligence, 210,

78–122. https://doi.org/10.1016/j.artint.2014.02.004

Ball, J. E., Anderson, D. T., & Chan, C. S. (2017). Comprehensive survey of deep learning in

remote sensing: theories, tools, and challenges for the community. Journal of Applied

Remote Sensing, 11(4), 042609. https://doi.org/10.1117/1.JRS.11.042609

Bamford, T., Esmaeili, K., & Schoellig, A. P. (2017a). A real-time analysis of post-blast rock

fragmentation using UAV technology. International Journal of Mining, Reclamation and

Environment, 31(6), 1–18. https://doi.org/10.1080/17480930.2017.1339170

Bamford, T., Esmaeili, K., & Schoellig, A. P. (2017b). Aerial Rock Fragmentation Analysis in

137

Low-Light Condition Using UAV Technology. In Application of Computers and

Operations Research in Mining Industry.

Bamford, T., Medinac, F., & Esmaeili, K. (2020). Continuous Monitoring and Improvement of

the Blasting Process in Open Pit Mines Using Unmanned Aerial Vehicle Techniques.

Remote Sensing, 12(17), 2801. https://doi.org/10.3390/rs12172801

Bengio, Y. (2009). Learning deep architectures for AI. Now Publishers Inc.

Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning Long-Term Dependencies with Gradient

Descent is Difficult. IEEE Transactions on Neural Networks, 5(2), 157–166.

Bhappu, R. B., Johnson, P., Brierley, J., & Reynolds, D. (1969). Theoretical and practical studies

on dump leaching. AIME Trans, 244(September), 307--320.

Bouffard, S. C., & Dixon, D. G. (2000). Investigative study into the hydrodynamics of heap

leaching processes. Metallurgical and Materials Transactions B, 32(5), 763--776.

Brierley, J. A., & Brierley, C. L. (2001). Present and future commercial applications of

biohydrometallurgy. Hydrometallurgy, 59, 233–239.

Burkart, A., Cogliati, S., Schickling, A., & Rascher, U. (2014). A Novel UAV-Based Ultra-Light

Weight Spectrometer for Field Spectroscopy. IEEE Sensors Journal, 14(1), 62–67.

Burnett, K., Qian, J., Du, X., Liu, L., Yoon, D. J., Shen, T., … Barfoot, T. D. (2020). Zeus : A

system description of the two ‐ time winner of the collegiate SAE autodrive competition.

Journal of Field Robotics, (April). https://doi.org/10.1002/rob.21958

Calderón, R., Montes-Borrego, M., Landa, B. B., Navas-Cortés, J. A., & Zarco-Tejada, P. J.

(2014). Detection of downy mildew of opium poppy using high-resolution multi-spectral

and thermal imagery acquired with an unmanned aerial vehicle. Precision Agriculture,

15(6), 639–661. https://doi.org/10.1007/s11119-014-9360-y

Campbell, J. B., & Wynne, R. H. (2011). Introduction to remote sensing. Guilford Press.

Candiago, S., Remondino, F., Giglio, M. De, Dubbini, M., & Gattelli, M. (2015). Evaluating

Multispectral Images and Vegetation Indices for Precision Farming Applications from UAV

Images. Remote Sensing, 7(Vi), 4026–4047. https://doi.org/10.3390/rs70404026

Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models

for practical applications. ArXiv Preprint ArXiv:1605.07678.

Chai, S., Walker, J. P., Makarynskyy, O., Kuhn, M., Veenendaal, B., & West, G. (2010). Use of

Soil Moisture Variability in Artificial Neural Network. Remote Sensing, 166–190.

https://doi.org/10.3390/rs2010166

Chang, K. T., & Hsu, W. L. (2018). Estimating soil moisture content using unmanned aerial

vehicles equipped with thermal infrared sensors. Proceedings of 4th IEEE International

Conference on Applied System Innovation 2018, ICASI 2018, 168–171.

138

https://doi.org/10.1109/ICASI.2018.8394559

Chen, X., Chen, S., Zhong, R., Su, Y., Liao, J., Li, D., … Li, X. (2012). A semi-empirical

inversion model for assessing surface soil moisture using AMSR-E brightness temperatures.

Journal of Hydrology, 456–457, 1–11. https://doi.org/10.1016/j.jhydrol.2012.05.022

Chollet, F., & and others. (2015). Keras. Retrieved from https://keras.io

Ciresan, D. C., Meier, U., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2011). Flexible,

high performance convolutional neural networks for image classification. In Twenty-second

international joint conference on artificial intelligence (pp. 1237–1242).

Clement, L. (2020). On Learning Models of Apperance For Robst Long-term Visual Navigation.

University of Toronto.

Clement, L., & Kelly, J. (2018). How to Train a CAT : Learning Canonical Appearance

Transformations for Direct Visual Localization Under Illumination Change. IEEE Robotics

and Automation Letters, 3(3), 2447--2454.

Clevert, D.-A., Unterthiner, T., & Hochreiter, S. (2016). Fast and Accurate Deep Network

Learning by Exponential Linear Units (ELUs). In ICLR (pp. 1–14).

Colomina, I., & Molina, P. (2014). Unmanned aerial systems for photogrammetry and remote

sensing: A review. ISPRS Journal of Photogrammetry and Remote Sensing, 92, 79–97.

https://doi.org/10.1016/j.isprsjprs.2014.02.013

Dash, J. P., Watt, M. S., Pearse, G. D., Heaphy, M., & Dungey, H. S. (2017). Assessing very

high resolution UAV imagery for monitoring forest health during a simulated disease

outbreak. ISPRS Journal of Photogrammetry and Remote Sensing, 131, 1–14.


Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale

hierarchical image database. In 2009 IEEE conference on computer vision and pattern

recognition (pp. 248–255). IEEE. https://doi.org/10.1109/CVPR.2009.5206848

DJI. (2019a). ZENMUSE X5 Aerial imaging evolved. Retrieved from

https://www.dji.com/ca/zenmuse-x5

DJI. (2019b). ZENMUSE XT Unlock The Possibilities of Sight. Retrieved from

https://www.dji.com/ca/zenmuse-xt

Dong, Q., Gong, S., & Zhu, X. (2018). Imbalanced deep learning by minority class incremental

rectification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(6),

1367–1381.

Dumoulin, V., & Visin, F. (2016). A guide to convolution arithmetic for deep learning. ArXiv

Preprint ArXiv:1603.07285, 1–31.

Ercoli, M., Di Matteo, L., Pauselli, C., Mancinelli, P., Frapiccini, S., Talegalli, L., & Cannata, A.

139

(2018). Integrated GPR and laboratory water content measures of sandy soils: From

laboratory to field scale. Construction and Building Materials, 159, 734–744.

https://doi.org/10.1016/j.conbuildmat.2017.11.082

Everingham, M., Gool, L. Van, Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The

PASCAL Visual Object Classes (VOC) Challenge. Int J Comput Vis, 303–338.

https://doi.org/10.1007/s11263-009-0275-4

Francioni, M., Salvini, R., Stead, D., Giovannini, R., Riccucci, S., Vanneschi, C., & Gullì, D.

(2015). An integrated remote sensing-GIS approach for the analysis of an open pit in the

Carrara marble district , Italy: Slope stability assessment through kinematic and numerical

methods. Computers and Geotechnics, 67, 46–63.

https://doi.org/10.1016/j.compgeo.2015.02.009

Franson, J. C. (2017). Cyanide poisoning of a Cooper’s hawk (Accipiter cooperii). Journal of

Veterinary Diagnostic Investigation, 29(2), 258–260.

https://doi.org/10.1177/1040638716687604

Fu, T., Ma, L., Li, M., & Johnson, B. A. (2018). Using convolutional neural network to identify

irregular segmentation objects from very high-resolution remote sensing imagery. Journal

of Applied Remote Sensing, 12(2), 025010. https://doi.org/10.1117/1.JRS.12.025010

Gao, B. (1996). NDWI-A Normalized Difference Water Index for Remote Sensing of Vegetation

Liquid Water From Space. Remote Sensing of Environment, 266(April), 257–266.

Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., & Garcia-Rodriguez, J.

(2017). A Review on Deep Learning Techniques Applied to Semantic Segmentation. ArXiv

Preprint ArXiv:1704.06857.

Ge, L., Hang, R., Liu, Y., & Liu, Q. (2018). Comparing the Performance of Neural Network and

Deep Convolutional Neural Network in Estimating Soil Moisture from Satellite

Observations. Remote Sensing, 10(9), 1327.

Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for Autonomous Driving? The KITTI

Vision Benchmark Suite. In 2012 IEEE Conference on Computer Vision and Pattern

Recognition (pp. 3354–3361).

Ghorbani, Y., Franzidis, J.-P., & Petersen, J. (2016). Heap Leaching Technology — Current

State , Innovations , and Future Directions : A Review. Mineral Processing and Extractive

Metallurgy Review, 37(2), 73–119. https://doi.org/10.1080/08827508.2015.1115990

Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward

neural networks. In Proceedings of the thirteenth international conference on artificial

intelligence and statistics (Vol. 9, pp. 249–256).

Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep Sparse Rectifier Neural Networks. In

International Conference on Artificial Intelligence and Statistics (Vol. 15, pp. 315–323).

Gómez-Chova, L., Tuia, D., Moser, G., & Camps-Valls, G. (2015). Multimodal Classification of

140

Remote Sensing Images : A Review and Future Directions. Proceedings of the IEEE,

103(September), 1560--1584. https://doi.org/10.1109/JPROC.2015.2449668

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT press.

Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., … He, K. (2017).

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. ArXiv Preprint

ArXiv:1706.02677.

Graham, B. (2014). Fractional Max-Pooling. CoRR, abs/1412.6, 1–10.

Grosse, A. C., Dicinoski, G. W., Shaw, M. J., & Haddad, P. R. (2003). Leaching and recovery of

gold using ammoniacal thiosulfate leach liquors ( a review ). Hydrometallurgy, 69, 1–21.

https://doi.org/10.1016/S0304-386X(02)00169-X

Gupta, R. P. (2017). Remote Sensing Geology. Springer.

Hackeloeer, A., Klasing, K., Krisp, J. M., & Meng, L. (2014). Georeferencing: a review of

methods and applications. Annals of GIS, 20(1), 61–69.

https://doi.org/10.1080/19475683.2013.868826

He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of the

IEEE international conference on computer vision (pp. 2961--2969).

He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving Deep into Rectifiers : Surpassing Human-

Level Performance on ImageNet Classification. In International Conference on Computer

Vision (pp. 1026--1034).

He, K., Zhang, X., Ren, S., & Sun, J. (2016a). Deep residual learning for image recognition. In

Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770--

778).

He, K., Zhang, X., Ren, S., & Sun, J. (2016b). Identity mappings in deep residual networks. In

European conference on computer vision (pp. 630--645). Springer.

Heim, R. R. (2002). A Review of Twentieth-Century Drought Indices Used in the United States.

Bulletin of the American Meteorological Society, 83(August), 1149–1166. Retrieved from

https://doi.org/10.1175/1520-0477-83.8.1149

Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural

networks. Science, 313(5786), 504--507.

Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012).

Improving neural networks by preventing co-adaptation of feature detectors. CoRR,

abs/1207.0, 1–18.

Hinton, G., Srivastava, N., & Swersky, K. (2012). Neural networks for machine learning lecture

6a overview of mini-batch gradient descent.

141

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., … Adam, H.

(2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications.

ArXiv Preprint ArXiv:1704.04861.

Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., … Vasudevan, V. and others.

(2019). Searching for MobileNetV3. In Proceedings of the IEEE International Conference

on Computer Vision (pp. 1314--1324).

Hu, Z., Xu, L., & Yu, B. (2018). SOIL MOISTURE RETRIEVAL USING CONVOLUTIONAL

NEURAL NETWORKS: APPLICATION TO PASSIVE MICROWAVE REMOTE

SENSING. International Archives of the Photogrammetry, Remote Sensing & Spatial

Information Sciences, 42(3).

Huang, Z., Pan, Z., & Lei, B. (2017). Transfer Learning with Deep Convolutional Neural

Network for SAR Target Classification with Limited Labeled Data. Remote Sensing, 9.

https://doi.org/10.3390/rs9090907

Ioffe, S., & Szegedy, C. (2015). Batch Normalization : Accelerating Deep Network Training by

Reducing Internal Covariate Shift. CoRR, abs/1502.0.

Isikdogan, F., Bovik, A. C., & Passalacqua, P. (2017). Surface Water Mapping by Deep

Learning. IEEE Journal of Selected Topics in Applied Earth Observations and Remote

Sensing, 10(11), 4909–4918.

Ivushkin, K., Bartholomeus, H., Bregt, A. K., Pulatov, A., Franceschini, M. H., Kramer, H., …

Finkers, R. (2019). UAV based soil salinity assessment of cropland. Geoderma, 338, 502–

512. https://doi.org/10.1016/j.geoderma.2018.09.046

Jarrett, K., Kavukcuoglu, K., Ranzato, M. A., & Lecun, Y. (2009). What is the Best Multi-Stage

Architecture for Object Recognition? In 2009 IEEE 12th International Conference on

Computer Vision (pp. 2146–2153).

Jensen, J. R. (2009). Remote Sensing of the Environment: An Earth Resource Perspective (2/e).

Pearson Education India.

John, L. W. (2011). The art of heap leaching-The fundamentals. Percolation Leaching: The

Status Globally and in Southern Africa. Misty Hills: The Southern African Institute of

Mining and Metallurgy (SAIMM), 17–42.

Johnson, J. M., & Khoshgoftaar, T. (2019). Survey on deep learning with class imbalance.

Journal of Big Data, 6, 27. https://doi.org/10.1186/s40537-019-0192-5

Kalogirou, S. A. (2013). Solar energy engineering: processes and systems (2nd ed.). Academic

Press. https://doi.org/10.1016/B978-0-12-397270-5.01001-3

Kamilaris, A., & Prenafeta-boldú, F. X. (2018a). A review of the use of convolutional neural

networks in agriculture. The Journal of Agricultural Science, 156(3), 312–322.

Kamilaris, A., & Prenafeta-boldú, F. X. (2018b). Deep learning in agriculture : A survey.

142

Computers and Electronics in Agriculture, 147(July 2017), 70–90.

https://doi.org/10.1016/j.compag.2018.02.016

Kappes, D. W. (2002). Precious Metal Heap Leach Design and Practice. In Proceedings of the

Mineral Processing Plant Design, Practice, and Control 1 (pp. 1606–1630). Retrieved from

http://www.kcareno.com/pdfs/mpd_heap_leach_desn_and_practice_07apr02.pdf

Kemker, R., Salvaggio, C., & Kanan, C. (2018). Algorithms for semantic segmentation of

multispectral remote sensing imagery using deep learning. ISPRS Journal of

Photogrammetry and Remote Sensing, 145(April), 60–77.


Khan, A., Sohail, A., Zahoora, U., & Qureshi, A. S. (2020). A survey of the recent architectures

of deep convolutional neural networks. Artificial Intelligence Review, 1–70.

Khorram, S., Koch, F. H., van der Wiele, C. F., & Nelson, S. A. (2012). Remote sensing.

Springer Science & Business Media.

Kingma, D. P., & Ba, J. L. (2014). Adam: A method for stochastic optimization. ArXiv Preprint

ArXiv:1412.6980, 1–15.

Kislik, C., Dronova, I., & Kelly, M. (2018). UAVs in Support of Algal Bloom Research : A

Review of Current Applications and Future Opportunities. Drones, 2(4), 1–14.

https://doi.org/10.3390/drones2040035

Korchenko, A. G., & Illyash, O. S. (2013). The Generalized Classification of Unmanned Air

Vehicles. In 2013 IEEE 2nd International Conference Actual Problems of Unmanned Air

Vehicles Developments Proceedings (APUAVD) (pp. 28–34). IEEE.

Krawczyk, B. (2016). Learning from imbalanced data: open challenges and future directions.

Progress in Artificial Intelligence, 5(4), 221–232. https://doi.org/10.1007/s13748-016-0094-

0

Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep

Convolutional Neural Networks. In Advances in neural information processing systems (pp.

1097–1105).

Kuenzer, C., & Dech, S. (2013). Thermal Infrared Remote Sensing.

Kurt, M., Richard, S. J., Luigi, P., & Van, H. J. (2016). Mastering QGIS. Packt Publishing Ltd.

Kussul, N., Lavreniuk, M., Skakun, S., & Shelestov, A. (2017). Deep Learning Classification of

Land Cover and Crop Types Using Remote Sensing Data. IEEE Geoscience and Remote

Sensing Letters, 14(5), 778–782. https://doi.org/10.1109/LGRS.2017.2681128

Langford, M., Fox, A., & Smith, R. S. (2010). Chapter 5 - Using different focal length lenses,

camera kits. In Langford’s Basic Photography (Ninth Edition) (pp. 92–113).

143

https://doi.org/10.1016/B978-0-240-52168-8.10005-7

Laptev, D., Savinov, N., Buhmann, J. M., & Pollefeys, M. (2016). TI- POOLING :

transformation-invariant pooling for feature learning in Convolutional Neural Networks. In

The IEEE Conference on Computer Vision and Pattern Recognition (pp. 289–297).

Lateef, F., & Ruichek, Y. (2019). Survey on semantic segmentation using deep learning

techniques. Neurocomputing, 338, 321–348. https://doi.org/10.1016/j.neucom.2019.02.003

LeCun, Y. (1989). Generalization and Network Design Strategies. Connectionism in Perspective,

19, 143–155.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 436–444.

https://doi.org/10.1038/nature14539

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to

document recognition. Proceedings of the IEEE, 86(11), 2278--2324.

Lee, E. J., Shin, S. Y., Ko, B. C., & Chang, C. (2016). Early sinkhole detection using a drone-

based thermal camera and image processing. Infrared Physics and Technology, 78(August),

223–232. https://doi.org/10.1016/j.infrared.2016.08.009

Lehmann, T. M., Gonner, C., & Spitzer, K. (1999). Survey: Interpolation methods in medical

image processing. IEEE Transactions on Medical Imaging, 18(11), 1049–1075.

Lewandowski, K. A., & Kawatra, S. K. (2009). Binders for heap leaching agglomeration.

Mining, Metallurgy & Exploration, 26(1), 1–24.

Li, Y., Zhang, H., Xue, X., Jiang, Y., & Shen, Q. (2018). Deep learning for remote sensing

image classification: A survey. Wiley Interdisciplinary Reviews: Data Mining and

Knowledge Discovery, 8(April), 1–17.

Liang, S., Li, X., & Wang, J. (2012). Advanced remote sensing: terrestrial information

extraction and applications. Academic Press. https://doi.org/10.1016/B978-0-12-385954-

9.01001-7

Lillesand, T. M., Kiefer, R. W., & Chipman, J. W. (2015). Remote Sensing and Image

Interpretation (Seventh Ed). WILEY.

Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., … Dollár, P. (2014).

Microsoft COCO : Common Objects in Context. CoRR, abs/1405.0, 1–15.

Linder, W. (2013). Digital photogrammetry: theory and applications. Springer Science and

Business Media.

Liu, G., Liu, Z., Liu, S., Ma, J., & Wang, F. (2018). Registration of infrared and visible light

image based on visual saliency and scale invariant feature transform. EURASIP Journal on

Image and Video Processing.

144

Liu, W., Baret, F., Xingfa, G., Qingxi, T., Lanfen, Z., & Bing, Z. (2002). Relating soil surface

moisture to reflectance. Remote Sensing of Environment, 81, 238–246.

Liu, Z., & Zhao, Y. (2006). Research on the method for retrieving soil moisture using thermal

inertia model. Science in China, Series D: Earth Sciences, 49(5), 539–545.

https://doi.org/10.1007/s11430-006-0539-6

Lunt, I. A., Hubbard, S. S., & Rubin, Y. (2005). Soil moisture content estimation using ground-

penetrating radar reflection data. Journal of Hydrology, 307, 254–269.

https://doi.org/10.1016/j.jhydrol.2004.10.014

Lupo, J. F. (2010). Geotextiles and Geomembranes Liner system design for heap leach pads.

Geotextiles and Geomembranes, 28(2), 163–173.

https://doi.org/10.1016/j.geotexmem.2009.10.006

Ma, L., Liu, Y., Zhang, X., Ye, Y., Yin, G., & Johnson, B. A. (2019). Deep learning in remote

sensing applications: A meta-analysis and review. ISPRS Journal of Photogrammetry and

Remote Sensing, 152(March), 166–177. https://doi.org/10.1016/j.isprsjprs.2019.04.015

Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier Nonlinearities Improve Neural

Network Acoustic Models. In ICML (Vol. 30, p. 3).

Maltese, A., Capodici, F., Ciraolo, G., & La Loggia, G. (2013). Mapping soil water content

under sparse vegetation and changeable sky conditions: comparison of two thermal inertia

approaches. Journal of Applied Remote Sensing, 7(1), 079997.

https://doi.org/10.1117/1.jrs.7.079997

Marschner, S., & Shirley, P. (2015). Fundamentals of computer graphics. CRC Press.

Marsden, J. O. (2019). Gold and Silver. In R. C. Dunne, S. K. Kawatra, & C. A. Young (Eds.),

SME Mineral Processing and Extractive Metallurgy Handbook (pp. 1689–1728).

Medinac, F. (2019). Advances in Pit Wall Mapping and Slope Assessment using Unmanned

Aerial Vehicle Technology. Univeristy of Toronto.

Medinac, F., Bamford, T., Hart, M., Kowalczyk, M., & Esmaeili, K. (2020). Haul road

monitoring in open pit mines using unmanned aerial vehicles. Mining, Metallurgy &

Exploration, (1), 20–27.

Medinac, F., & Esmaeili, K. (2020). Integrating unmanned aerial vehicle photogrammetry in

design compliance audits and structural modelling of pit walls. In Proceedings of the 2020

International Symposium on Slope Stability in Open Pit Mining and Civil Engineering (pp.

1439–1454). https://doi.org/10.36487/ACG

Minacapilli, M., Cammalleri, C., Ciraolo, G., D’Asaro, F., Iovino, M., & Maltese, A. (2012).

Thermal inertia modeling for soil surface water content estimation: A laboratory

experiment. Soil Science Society of America Journal, 76(August 2016), 92–100.

https://doi.org/10.2136/sssaj

145

Moradi, R., Berangi, R., & Minaei, B. (2019). A survey of regularization strategies for deep

models. Artificial Intelligence Review, 1–40. https://doi.org/10.1007/s10462-019-09784-7

Mwase, J. M., Petersen, J., & Eksteen, J. J. (2012). A conceptual fl owsheet for heap leaching of

platinum group metals ( PGMs ) from a low-grade ore concentrate. Hydrometallurgy, 111–

112, 129–135. https://doi.org/10.1016/j.hydromet.2011.11.012

Nair, V., & Hinton, G. E. (2010). Rectified Linear Units Improve Restricted Boltzmann

Machines. In ICML (pp. 807–814).

Nex, F., & Remondino, F. (2014). UAV for 3D mapping applications : a review. Applied

Geomatics, 6(1), 1–15. https://doi.org/10.1007/s12518-013-0120-x

Nwankpa, C. E., Ijomah, W., Gachagan, A., & Marshall, S. (2018). Activation Functions :

Comparison of Trends in Practice and Research for Deep Learning. CoRR, abs/1811.0.

Oh, Y. (2004). Quantitative Retrieval of Soil Moisture Content and Surface Roughness From

Multipolarized Radar Observations of Bare Soil Surfaces. IEEE Transactions on

Geoscience and Remote Sensing, 42(3), 596–601.

Padilla, G. A., Cisternas, L. A., & Cueto, J. Y. (2008). On the optimization of heap leaching.

Minerals Engineering, 21(9), 673–678. https://doi.org/10.1016/j.mineng.2008.01.002

Paisitkriangkrai, S., Sherrah, J., Janney, P., & Hengel, A. Van Den. (2016). Semantic Labeling of

Aerial and Satellite Imagery. IEEE Journal of Selected Topics in Applied Earth

Observations and Remote Sensing, 9(7), 2868–2881.

Pajares, G. (2015). Overview and Current Status of Remote Sensing Applications Based on

Unmanned Aerial Vehicles ( UAVs ). Photogrammetric Engineering and Remote Sensing,

81(4), 281–330. https://doi.org/10.14358/PERS.81.4.281

Pan, S. J., & Yang, Q. (2010). A Survey on Transfer Learning. IEEE Transactions on Knowledge

and Data Engineering, 22(10), 1345--1359.

Peretroukhin, V. (2020). Learned Improvement to the Visual Egomotion Pipeline. University of

Toronto.

Petropoulos, G. P., Ireland, G., & Barrett, B. (2015). Surface soil moisture retrievals from remote

sensing : Current status , products & future trends. Physics and Chemistry of the Earth, 83–

84, 36–56. https://doi.org/10.1016/j.pce.2015.02.009

Pierson, H. A., & Gashler, M. S. (2017). Deep learning in robotics : a review of recent research.

Advanced Robotics, 31(16), 821–835. https://doi.org/10.1080/01691864.2017.1365009

Poudel, R. P., Liwicki, S., & Cipolla, R. (2019). Fast-SCNN : Fast Semantic Segmentation

Network. ArXiv Preprint ArXiv:1902.04502.

Prakash, A. (2000). Thermal remote sensing: concepts, issues and applications. International

Archives of Photogrammetry and Remote Sensing, 33, 239–243.

146

Pyper, R., Seal, T., Uhrie, J. L., & Miller, G. C. (2019). Dump and Heap Leaching. In R. C.

Dunne, S. K. Kawatra, & C. A. Young (Eds.), SME Mineral Processing and Extractive

Metallurgy Handbook (pp. 1207–1224).

Rawat, W., & Wang, Z. (2017). Deep Convolutional Neural Networks for Image Classification :

A Comprehensive Review. Neural Computation, 2449, 2352–2449.

https://doi.org/10.1162/NECO

Raza, S., Sanchez, V., Prince, G., Clarkson, J. P., & Rajpoot, N. M. (2015). Registration of

thermal and visible light images of diseased plants using silhouette extraction in the wavelet

domain. Pattern Recognition, 48(7), 2119–2128.

https://doi.org/10.1016/j.patcog.2015.01.027

Rippel, O., Snoek, J., & Adams, R. P. (2015). Spectral Representations for Convolutional Neural

Networks. Advances in Neural Information Processing Systems, 2449–2457.

Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical

Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention --

MICCAI 2015 (pp. 234--241).

Rumelhart, D. E., Hintont, G. E., & Williams, R. J. (1986). Learning representations by back-

propagating errors. Nature, (323), 533–536.

Sabins, F. F. (1987). Remote sensing--principles and interpretation. WH Freeman and company.

Salisbury, J. W., & D’Aria, D. M. (1992). Emissivity of terrestrial materials in the 8–14 μm

atmospheric window. Remote Sensing of Environment, 42(2), 83–106.

Salvini, R., Mastrorocco, G., Seddaiu, M., Rossi, D., Salvini, R., Mastrorocco, G., … Rossi, D.

(2017). The use of an unmanned aerial vehicle for fracture mapping within a marble quarry

( Carrara , Italy ): photogrammetry and discrete fracture network modelling discrete fracture

network modelling. Geomatics, Natural Hazards and Risk, 8(1), 34–52.

https://doi.org/10.1080/19475705.2016.1199053

Sandler, M., Zhu, M., Zhmoginov, A., & Mar, C. V. (2018). MobileNetV2: Inverted Residuals

and Linear Bottlenecks. In Proceedings of the IEEE conference on computer vision and

pattern recognition (pp. 4510--4520).

Santurkar, S., Tsipras, D., & Ilyas, A. (2018). How Does Batch Normalization Help

Optimization? In Advances in Neural Information Processing Systems (pp. 2483–2493).

Scheidt, S., Ramsey, M., & Lancaster, N. (2009). Determining soil moisture and sediment

availability at White Sands Dune Field, NM from apparent thermal inertia data, (412).

Scherer, D., Müller, A., & Behnke, S. (2010). Evaluation of Pooling Operations in Convolutional

Architectures for Object Recognition. In 20th International Conference on Artificial Neural

Networks (pp. 92–101).

Schmugge, T., French, A., Ritchie, J. C., Rango, A., & Pelgrum, H. (2002). Temperature and

147

emissivity separation from multispectral thermal infrared observations. Remote Sensing of

Environment, 79(2–3), 189–198.

Sermanet, P., Chintala, S., & Lecun, Y. (2012). Convolutional neural networks applied to house

numbers digit classification. In Proceedings of the 21st International Conference on Pattern

Recognition (pp. 3288–3291).

Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., & LeCun, Y. (2013). OverFeat:

Integrated Recognition, Localization and Detection using Convolutional Networks. ArXiv.

Sharifzadeh, S., Tata, J., Sharifzadeh, H., & Tan, B. (2020). Farm Area Segmentation in Satellite

Images Using DeepLabv3 + Neural Networks. In International Conference on Data

Management Technologies and Applications (Vol. 1, pp. 115–135).

https://doi.org/10.1007/978-3-030-54595-6

Shen, D., Wu, G., & Suk, H. (2017). Deep Learning in Medical Image Analysis. Annual Review

of Biomedical Engineering, 19, 221–248.

Shi, J., Wang, J., Hsu, A. Y., Neill, P. E. O., & Engman, E. T. (1997). Estimation of Bare

Surface Soil Moisture and Surface Roughness Parameter Using L-band SAR Image Data.

IEEE Transactions on Geoscience and Remote Sensing, 35(5), 1254–1266.

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., … Hassabis, D.

(2017). Article Mastering the game of Go without human knowledge. Nature, 550(7676),

354–359. https://doi.org/10.1038/nature24270

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image

recognition. ArXiv Preprint ArXiv:1409.1556, 1–14.

Slater, P. N. (1980). Remote sensing: optics and optical systems. Addison-Wesley Pub. Co.

Sobayo, R., Wu, H., Ray, R. L., & Qian, L. (2018). Integration of Convolutional Neural Network

and Thermal Images into Soil Moisture Estimation. 2018 1st International Conference on

Data Intelligence and Security (ICDIS), 207–210.

https://doi.org/10.1109/ICDIS.2018.00041

Srithammavut, W. (2008). Modeling of gold cyanidation.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout :

A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res., 15,

1929–1958.

Sugiura, R., Noguchi, N., & Ishii, K. (2007). Correction of Low-altitude Thermal Images applied

to estimating Soil Water Status. Biosystems Engineering, 96(3), 301–313.

https://doi.org/10.1016/j.biosystemseng.2006.11.006

Sun, S., Cao, Z., Zhu, H., & Zhao, J. (2019). A Survey of Optimization Methods from a Machine

Learning Perspective. CoRR, abs/1906.0, 1–30.

148

Suomalainen, J., Anders, N., Iqbal, S., Roerink, G., Franke, J., Wenting, P., … Kooistra, L.

(2014). A lightweight hyperspectral mapping system and photogrammetric processing chain

for unmanned aerial vehicles. Remote Sensing, 6(11), 11013–11030.

https://doi.org/10.3390/rs61111013

Swain, P. H., & Davis, S. M. (1978). Remote Sensing: The Quantitative Approach. New York:

McGraw-Hill.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … Rabinovich, A. (2015).

Going Deeper with Convolutions. In Proceedings of the IEEE conference on computer

vision and pattern recognition.

Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., & Liu, C. (2018). A Survey on Deep Transfer

Learning. In International Conference on Artificial Neural Networks (pp. 1–10).

Tan, M., & Le, Q. V. (2019). EfficientNet : Rethinking Model Scaling for Convolutional Neural

Networks. ArXiv Preprint ArXiv:1905.11946.

Tang, M., & Esmaeili, K. (2020). Mapping Surface Moisture Distribution of Heap Leach Pad

using Unmanned Aerial Vehicle. In MineXchange 2020 SME Annual Conference.

Thiel, R., & Smith, M. E. (2004). State of the practice review of heap leach pad design issues.

Geotextiles and Geomembranes, 22(6), 555–568.

https://doi.org/10.1016/j.geotexmem.2004.05.002

Tziavou, O., Pytharouli, S., & Souter, J. (2018). Unmanned Aerial Vehicle ( UAV ) based

mapping in engineering geological surveys : Considerations for optimum results.

Engineering Geology, 232, 12–21. https://doi.org/10.1016/j.enggeo.2017.11.004

Ulku, I., & Akagunduz, E. (2020). A Survey on Deep Learning-based Architectures for Semantic

Segmentation on 2D images. ArXiv Preprint ArXiv:1912.10230.

Ulku, I., Barmpoutis, P., Stathaki, T., & Akagunduz, E. (2019). Comparison of single channel

indices for U-Net based segmentation of vegetation in satellite images. In Twelfth

International Conference on Machine Vision (ICMV 2019) (Vol. 11433).

https://doi.org/10.1117/12.2556374

Ulyanov, D., Vedaldi, A., & Lempitsky, V. S. (2016). Instance Normalization: The Missing

Ingredient for Fast Stylization. ArXiv Preprint ArXiv:1607.08022. Retrieved from

http://arxiv.org/abs/1607.08022

Valavanis, K. P., & Vachtsevanos, G. J. (2015). Handbook of Unmanned Aerial Vehicles.

Valencia, J., Battulwar, R., Naghadehi, M. Z., & Sattarvand, J. (2019). Enhancement of

explosive energy distribution using UAVs and machine learning. In Mining goes Digital

(pp. 671–677).

Van Horn, G., Mac Aodha, O., Song, Y., Cui, Y., Sun, C., Shepard, A., … Belongie, S. (2017).

The iNaturalist Species Classification and Detection Dataset. In Proceedings of the IEEE

149

conference on computer vision and pattern recognition (pp. 8769--8778).

Velarde, G. (2007). Agglomeration control for heap leaching processes. Mineral Processing and

Extractive Metallurgy Review, 219–231. https://doi.org/10.1080/08827500590943974

Veroustraete, F., Li, Q., Verstraeten, W. W., Chen, X., Bao, A., Dong, Q., … Willems, P. (2012).

Soil moisture content retrieval based on apparent thermal inertia for Xinjiang province in

China. International Journal of Remote Sensing, 33(12), 3870–3885.

https://doi.org/10.1080/01431161.2011.636080

Verstraeten, W. W., Veroustraete, F., Van Der Sande, C. J., Grootaers, I., & Feyen, J. (2006).

Soil moisture retrieval using thermal inertia, determined with visible and thermal

spaceborne data, validated for European forests. Remote Sensing of Environment, 101(3),

299–314. https://doi.org/10.1016/j.rse.2005.12.016

Wallace, L., Lucieer, A., Malenovský, Z., Turner, D., & Vopěnka, P. (2016). Assessment of

Forest Structure Using Two UAV Techniques : A Comparison of Airborne Laser Scanning

and Structure from Motion ( SfM ) Point Clouds. Forests, 7(3).

https://doi.org/10.3390/f7030062

Wang, H., Li, X., Long, H., Xu, X., & Bao, Y. (2010). Monitoring the effects of land use and

cover type changes on soil moisture using remote-sensing data : A case study in China’s

Yongding River basin. Catena, 82, 135–145. https://doi.org/10.1016/j.catena.2010.05.008

Wang, T., Liang, J., & Liu, X. (2018). Soil Moisture Retrieval Algorithm Based on TFA and

CNN. IEEE Access, 7, 597–604. https://doi.org/10.1109/ACCESS.2018.2885565

Watling, H. (2006). The bioleaching of sulphide minerals with emphasis on copper sulphides —

A review. Hydrometallurgy, 84, 81–108. https://doi.org/10.1016/j.hydromet.2006.05.001

Wei, P., Cagle, L., Reza, T., Ball, J., & Gafford, J. (2018). LiDAR and Camera Detection Fusion

in a Real-Time Industrial Multi-Sensor Collision Avoidance System. Electronics, 7(6).

https://doi.org/10.3390/electronics7060084

Weiss, K., Khoshgoftaar, T. M., & Wang, D. (2016). A survey of transfer learning. Journal of

Big Data, 3(1), 9.

Weng, Q., Lu, D., & Schubring, J. (2004). Estimation of land surface temperature – vegetation

abundance relationship for urban heat island studies. Remote Sensing of Environment, 89(4),

467–483. https://doi.org/10.1016/j.rse.2003.11.005

Wu, Y., & He, K. (2018). Group Normalization. In The European Conference on Computer

Vision (ECCV) (pp. 3–19).

Xia, G.-S., Hu, J., Hu, F., Shi, B., Bai, X., Zhong, Y., … Lu, X. (2017). AID : A Benchmark

Data Set for Performance Evaluation of Aerial Scene Classification. IEEE Transactions on

Geoscience and Remote Sensing, 55(7), 3965–3981.

https://doi.org/10.1109/TGRS.2017.2685945

150

Yang, Y., & Newsam, S. (2010). Bag-Of-Visual-Words and Spatial Extensions for Land-Use

Classification. In Proceedings of the 18th SIGSPATIAL international conference on

advances in geographic information systems (pp. 270–279).

https://doi.org/10.1145/1869790.1869829

Yao, H., Qin, R., & Chen, X. (2019). Unmanned Aerial Vehicle for Remote Sensing

Applications — A Review. Remote Sensing, 11(12), 1–22.

Ye, R., Liu, F., & Zhang, L. (2019). 3D depthwise convolution: Reducing model parameters in

3D vision tasks. In Canadian Conference on Artificial Intelligence (pp. 186--199).

Yu, D., Wang, H., Chen, P., & Wei, Z. (2014). Mixed Pooling for Convolutional Neural

Networks. In International Conference on Rough Sets and Knowledge Technology (pp.

364–375). https://doi.org/10.1007/978-3-319-11740-9

Zaitoun, N. M., & Aqel, M. J. (2015). Survey on Image Segmentation Techniques. Procedia

Computer Science, 65, 797–806. https://doi.org/10.1016/j.procs.2015.09.027

Zeiler, M. D. (2012). ADADELTA: An Adaptive Learning Rate Method. CoRR, abs/1212.5.

Zeiler, M. D., & Fergus, R. (2013). Stochastic pooling for regularization of deep convolutional

neural networks. ArXiv Preprint ArXiv:1301.3557., 1–9.

Zeiler, M. D., & Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. In

European Conference on Computer Vision (pp. 818–833). Springer.

Zhang, C., Sargent, I., Pan, X., Li, H., Gardiner, A., Hare, J., & Atkinson, P. M. (2018). An

object-based convolutional neural network (OCNN) for urban land use classification.

Remote Sensing of Environment, 216(July), 57–70.

https://doi.org/10.1016/j.rse.2018.06.034

Zhang, D., & Zhou, G. (2016). Estimation of soil moisture from optical and thermal remote

sensing: A review. Sensors (Switzerland), 16(8). https://doi.org/10.3390/s16081308

Zhang, H., Wu, C., Zhang, Z., Zhu, Y., Zhang, Z., Lin, H., … Manmatha, R. and others. (2020).

ResNeSt: Split-Attention Networks. ArXiv Preprint ArXiv:2004.08955.

Zhang, J., Yang, X., Li, W., Zhang, S., & Jia, Y. (2020). Automatic detection of moisture

damages in asphalt pavements from GPR data with deep CNN and IRS method. Automation

in Construction, 113(September 2019), 103119.

https://doi.org/10.1016/j.autcon.2020.103119

Zhang, S., & Liu, W. (2017). Application of aerial image analysis for assessing particle size

segregation in dump leaching. Hydrometallurgy, 171(February), 99–105.

https://doi.org/10.1016/j.hydromet.2017.05.001

Zhang, X., Zhou, X., Lin, M., & Sun, J. (2018). ShuffleNet : An Extremely Efficient

Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE conference

on computer vision and pattern recognition (pp. 6848–6856).

151

Zhang, Y., Chen, H., He, Y., Ye, M., Cai, X., & Zhang, D. (2018). Road segmentation for all-

day outdoor robot navigation. Neurocomputing, 314, 316–325.

https://doi.org/10.1016/j.neucom.2018.06.059

Zhao, S., Zhang, D. M., & Huang, H. W. (2020). Deep learning – based image instance

segmentation for moisture marks of shield tunnel lining. Tunnelling and Underground

Space Technology, 95(October 2019), 103156. https://doi.org/10.1016/j.tust.2019.103156

Zhao, W., & Li, Z. (2013). Sensitivity study of soil moisture on the temporal evolution of surface

temperature over bare surfaces. International Journal of Remote Sensing, 1161, 3314–3331.

https://doi.org/10.1080/01431161.2012.716532

Zhou, B., Zhao, H., Puig, X., Xiao, T., Fidler, S., Barriuso, A., & Torralba, A. (2019). Semantic

Understanding of Scenes Through the ADE20K Dataset. International Journal of Computer

Vision, 127(3), 302–321. https://doi.org/10.1007/s11263-018-1140-0

Zhu, X. X., Tuia, D., Mou, L., Xia, G., Zhang, L., Xu, F., & Fraundorfer, F. (2017). Deep

learning in remote sensing: A comprehensive review and list of resources. IEEE Geoscience

and Remote Sensing Magazine, 5(4), 8–36.

Zhuang, J., Yang, J., Gu, L., & Dvornek, N. (2019). ShelfNet for Fast Semantic Segmentation. In

Proceedings of the IEEE International Conference on Computer Vision Workshops.

Zitova, B., & Flusser, J. (2003). Image registration methods: a survey. Image and Vision

Computing, 21, 977–1000. https://doi.org/10.1016/S0262-8856(03)00137-9

Zou, Q., Ni, L., Zhang, T., & Wang, Q. (2015). Deep learning based feature selection for remote

sensing scene classification. IEEE Geoscience and Remote Sensing Letters, 12(11), 2321–

2325.

Zribi, M., & Dechambre, M. (2002). A new empirical model to retrieve soil moisture and

roughness from C-band radar data. Remote Sensing of Environment, 84, 42–52.

Zwissler, B. (2016). Dust Susceptibility at Mine Tailings Impoundments : Thermal Remote

Sensing for Dust Susceptibility Characterization and Biological Soil Crusts for Dust

Susceptibility Reduction. Michigan Technological University.

Zwissler, B., Oommen, T., Vitton, S., & Seagren, E. A. (2017). Thermal remote sensing for

moisture content monitoring of mine tailings: laboratory study. Environmental &

Engineering Geoscience, XXIII(4), 299–312. https://doi.org/10.2113/eeg-1953

Advances in Heap Leach Pad Surface Moisture Mapping ...

Documents

Transcript of Advances in Heap Leach Pad Surface Moisture Mapping ...