A Comparative Study of Deep Learning and Traditional Methods for Environmental Remote Sensing

. Because of the accessibility of massive data from remote sensing data and developments in ML, machine learning (ML) techniques have been extensively applied in environmental remote sensing research. Modern machine learning (ML) frameworks like deep learning (DL) have significantly outperformed older models in terms of performance. This study focuses on the software that uses a traditional neural network (NN) as well as Deep Learning (DL) approaches in environmental remote sensing, which also covers land cover mapping, retrieval of environmental parameters, data fusion, image compression, and information reconstruction and prediction. It is also explained how DL may be used to monitor other aspects of the ecosystem, including the environment, water management, ground and air surface temperatures, transpiration, ultraviolet (UV) rays, and sea color all factors to consider. Following that, the essay explores the challenges and prospective uses of DL in environmental remote sensing.


Introduction
The ecosystem of the world has significantly deteriorated as a consequence of man's actions, providing a huge barrier to the continued world economy. Materials being in short supply and the ecosystem being damaged are now global issues rather than just regional ones. Space-based technology, such as satellite remote sensing, has made it possible to analyze the earth's resources, track regional and local environmental changes, and look into globalization during the previous 50 years. These methods offer several advantages, notably statistical service, extensive data collection, rapid data processing, ongoing tracking, and high accuracy [1], [2]. According to Liang's proposal [3], remote sensing data is mostly used for appropriate measuring monitoring using physical models. Physical models may accurately depict the link between environmental characteristics and remote sensing measurements, but their accuracy is mostly reliant on parametric information beforehand. Owing to the complexity of physical processes, this information frequently involves large errors that differ throughout various eras and geographical areas. As a result, the accuracy of environmental remote sensing using physical models may be constrained. Data-based machine learning approaches have become increasingly significant in environmental remote sensing over time. New possibilities for creative approaches to earth environmental monitoring have emerged as a result of the expanding accessibility of enormous volumes of environmental data and the quick development of machine learning. Deep learning is a method that focuses on big, sophisticated artificial neural networks, a field that has recently attracted a lot of attention. According to research by LeCun et al. [4]and Bengio et al. [5], deep learning models can precisely capture complicated non-linear interactions between environmental elements through multi-layer learning. This feature makes it possible to identify probable relationships between environmental variables for a variety of applications, including downscaling, fusion, and remote sensing retrieval. Moreover, DL has demonstrated to be quite successful in extracting multiscale and multilayer characteristics from remote sensing pictures and combining them at all levels, from low to high. The research of Zhang et al. emphasizes the significance of particular abilities in producing remarkable outcomes in image analysis and categorization problems [6]. As consequently, deep learning models can monitor modifications to the Earth's environment using data from satellites, resulting in substantial improvement. Numerous review papers on the use of deep learning in remote sensing have been published. The majority of these publications, however, are strongly focused on the pretreatment and categorization components of image analysis [7]. There hasn't been much research done on the use of deep learning for quantitative remote sensing analysis in certain disciplines, such as hydrologic and atmospheric aerosols. There is a dearth of thorough research on the application of deep learning for statistical remote sensing research, even though the classic neural network and deep learning models have been used in environmental monitoring. There remains a great deal to be investigated in this field notwithstanding the dissemination of countless research over the past few decades. As a result, the focus of this study will be on DL techniques that enhance environmental remote sensing. This paper's primary outline is shown in Figure 1. A thorough analysis of the DL's ability for environmental remote sensing will be carried out, comprising a mapping of land covering, retrieval of ecological parameters, combining downscaling and data fusion, and restoration and prediction of missing information. The utilization of a few well-known DL network topologies for distant sensing applications will be covered. A review of certain crucial domains in environmental remote sensing where conventional neural networks (NN) and deep learning (DL) shall be carried out. These fields include land cover mapping, calculation of atmospheric parameters, quantitative retrieval of the land surface, and parameter sensing for hydrological processes. The goal of this article is to explore how to use deep learning (DL) effectively in settings with small sample sizes. The article is broken into sections, the first of which addresses the many uses of DL in the field of environmental remote sensing. Section 3 examines prominent network architectures and their relative functions in various data processing activities. Section 4 provides a detailed study of classical neural networks (NN) and deep learning (DL) in environmental remote sensing, with a particular emphasis on quantitative parameter retrieval and land cover mapping. Section 5 also discusses prospective study paths and prospects, while Section 6 wraps up the review.

How can deep learning benefit remote sensing in environmental applications?
As deep learning (DL) has a great capacity to represent features, it has been used in environmental remote sensing. Applications of deep learning in remotely sensed pictures, however, are distinct from those in natural images. Remote sensing pictures provide more spatio-temporal-spectral information, more complex and diversified patterns, and the need for more sophisticated processing techniques. Data fusion, downscaling, information building, prediction, and retrieval of environmental parameters are a few examples of the diverse uses of DL. There are many potential uses for DL Fig. 1 shows the main outline of this study.

Land cover mapping
Remote sensing imaging is used to map land cover by conducting the classification of images, which involves categorizing pixels, objects, sceneries, or movement frames. However, traditional classification algorithms have limitations in recognizing sophisticated land formations or patterns because they rely on a limited set of rules based on low-level data in the temporal and spatial dimensions [10]. Using classification algorithms that rely on a large number of high-level characteristics is one way to solve the constraints of traditional approaches to land cover mapping. Deep learning (DL) has emerged as a viable approach for this purpose because of its ability to extract multiscale and multilayer characteristics, resulting in very accurate land cover maps. DL-based approaches offer significant benefits over rule-based and machine learning (ML) methods, particularly in complicated metropolitan regions. DL-based approaches are especially useful for high-resolution and very-high-resolution satellite images, and they have shown promise in a variety of applications [11].

2.2.Extraction of ecological factors
Remote sensing is a common method used to retrieve environmental parameters, and it typically relies on physical models that are based on complex processes and physical concepts. However, these physical models require numerous model parameters, and for some environmental phenomena, a reliable actual model has yet to be created. In this regard, DL presents a promising opportunity for environmental parameter retrieval. To make the procedure for retrieving data easier to handle, DL can imitate or modify physical models. Since DL has considerable simulation capability, it can be used to simulate either part or all of the physical models. Furthermore, because it can approximate complicated connections, DL is useful in determining the statistical link between remote sensing information and environmental factors. [12]. This approach avoids the use of complicated physical models and can yield comparable performance. Furthermore, DL provides a viable alternative for environmental parameter retrieval in cases where there is no reliable physical model available.

Image compression and merging
It is difficult to gather high-quality data with remote sensing satellite sensors due to their limits in spatial, temporal, and spectral resolution. Data fusion, on the other hand, is a potent method that may combine complementary data to provide high spatiotemporalspectral resolution data. [13]. As deep learning (DL) can extract abstract characteristics from remote sensing data and discover probable correlations between various observations through multilayer learning, it is a useful method for resolving this problem. DL may completely capture the complicated links necessary for data integration and downtasking [14]. Moreover, DL derives abstracted properties from data samples that are less impacted by sensor type and geographical size, resulting in DL models with strong correlations.

Data creation and forecasting
It is quite common for remote sensing data to have missing information due to factors such as deadlines, gaps, and cloud cover [15]. Various strategies have been devised throughout the years to address this issue for different applications such as gap filling, cloud removal, NDVI, and LST reconstruction [16,17,18]. These methods have generally been successful in achieving satisfactory results, but they often face challenges when dealing with interfacial complexity and big gaps. Recently, Convolutional Neural Networks (CNNs) have been used due to the tremendous asymmetric representational capabilities of models based on deep learning, which have been able to attain state-of-the-art outcomes. These advances demonstrate the potential of DL for reconstructing and predicting missing information in remote sensing data. To put it simply, DL has the potential to be a game-changer in addressing missing data in remote sensing.

The fundamental DL foundation
Despite the NN's remarkable generality, not every issue can be solved by a single network architecture. The fact that several distinct NN structures have indeed been created to date to address a variety of issues shows how crucial networking architectures are. The Radial basis neural network (RBFNN) and the Self-organizing map neural network (SOMNN) are two typical instances of the standard neuron network design. The CNN network, deep belief network (DBN), and recurrent neural network RNN are the four most common DL architectures tasking. Each of these designs will be discussed in further depth in the sections that follow.

RBFNN:
RBFNN is one of the basic NNs. In radial basis, feed-forward neural networks, or RBFNs, functions with a radial basis are utilized as functions for activation. RBFNN (Fig. 2) has three layers: an input layer, a hidden layer with radial basis functions, and an output layer. The radial basis functions turn the input data into a mega-dimensional distance, allowing for simpler differentiation. RBFN has several applications, including functional estimation, classification, and time-period prediction. Radial basis function neural network (RBFNN) image categorization for incorporating remote sensing is explored. Its training may be solved analytically using a closed-form equation, and no parameters must be manually tweaked. It has a substantially lower computational cost than the popular support vector machine (SVM) [20]. RBFNN, a famous ML method, has been employed in almost all domains of remote sensing research. The proposed design shortens the processing time. It also produces acceptable results with tiny samples for training.

Fig. 2. RBFNN structure
3.2 Self-Organizing Maps. Self-Organizing Maps (SOM) are a sort of unsupervised neural network that is used to reduce the dimensions of highdimensional data and to visualize it. It is made up of a grid of neurons that are arranged so that they react similarly to comparable stimuli. Input data is fed into the SOM during training, and each neuron is given a weight vector that represents a location in the input space. The structural links between the data inputs are preserved by simultaneously modifying the weights of adjacent cells. Once trained, the SOM may be used to interpret the input data in a lower-dimensional space, with comparable inputs mapping to neighboring neurons. SOM (Self-Organizing Map) is a well-liked unsupervised classification approach in remote sensing that has been extensively employed in a variety of applications [21], including the categorization of land cover [22], vegetation mapping [23], and image segmentation [24]. SOM is a kind of artificial neural network that groups related data into a small amount of space via unsupervised learning. The architecture of SOM is illustrated below in  Traditional neural networks have been successful in some areas, but their shallow structure can limit their fitting ability. To address this limitation, deeper networks have been developed and are referred to as deep learning algorithms, which distinguish them from traditional machine learning algorithms. These deep networks have more layers than traditional neural networks, allowing them to potentially better fit complex data.

DBN:
Deep Belief Networks (DBNs), which fall under the category of generative models, are an essential part of deep learning. As seen in Figure 5, the design of a typical DBN consists of many Restricted Boltzmann Machines (RBMs) layered with a Backpropagation (BP) layer. The actual RBMs are made up of two layers: visible and buried, each with a certain number of neurons. Instead of between the layers themselves, the connections between the RBMs are made among the units of the layers. Deep Belief Networks (DBNs) are a form of neural network made up of numerous layers of Restricted Boltzmann Machines. (RBMs). The output of an RBM's first hidden layer is utilized as the input for the next visible layer of another RBM in a DBN. This procedure is repeated for each successive layer, allowing the network to learn more complicated aspects of the incoming data. Throughout the training, the weights of each layer are modified to optimize the network's overall efficiency. A DBN is trained layer by layer, with each layer providing an estimate of the number of weight adjustments required for enhancing the efficiency of the network. Using the Backpropagation (BP) method, a finished DBN model is constructed by adjusting the weights that are used for each of the layers. In essence, the DBN is trained hierarchically, with each layer building on the one before it to increase the network's reliability. DBN has advantages over conventional neural network models like RBFNN, including the ability to avoid local optima and shorten training times due to random weight initialization. Resolution rates are reduced by needing only a minimal scan of the parameter space. DBN is now a useful tool for resolving issues in remote sensing as a result of this advancement. Diao et al. applied DBN to their object recognition work and proved its precision as well as effectiveness. [25]  The convolutional layer divides the input data using local filters, creating output data with identical proportions to the input data. To reduce the complexity of the input data, the pooling layer employs techniques such as average pooling and max pooling. The activated layer employs irregular mechanisms to improve the model's capacity to handle data fitting. Through a series of processes that gradually reduce the data, CNN aims to reduce the disparity between the output and label data. CNN adjusts its weights via a Backpropagation algorithm. In conclusion, CNN is a deep learning model that trains to classify data that arrives successfully by collecting characteristics from it using convolutional, pooling, and activation layers. In contrast to traditional neural networks, CNN foregoes the entirely connected level in favor of local connectivity. This approach requires fewer computations since it makes use of the sample's relative position information. The sharing convolutional unit works well with data that is highly dimensional. Nonetheless, CNN has certain limitations, such as the necessity for enormous amounts of training data and substantial processing costs.

RNN, LSTM:
It is challenging to understand the associations between correlated samples, especially sequential data because the typical neural network model assumes that input, output, and input-output samples are independent. Using the correlation between samples and even creating sequences, Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) models have been built to manage sequence data. Each input sequence unit is sequentially fed into the RNN's three major sections with hidden layers to produce the appropriate output sequence unit and data for the following phase. In the analysis of time-series changes in remote sensing, NN has demonstrated good performance. Researchers have applied RNN and LSTM in a range of different disciplines, including agribusiness classification, pollution in air forecasting, and disturbances to forests. When processing lengthy sequences of data, LSTM outperforms RNN, which is best at processing short sequences. By merging geographical and temporal data, LSTM models have been used to forecast agricultural fields and increase prediction accuracy for sea surface temperature. According to Fang et al.'s research [26], long-term soil moisture (SM) estimations obtained by LSTM exhibit greater generalization capacities when compared with classic linear regression or autoregressive models. In addition, GRU, an LSTM variation, can handle long-term sequences with less training time and produces even higherquality outcomes. Ndikumana et al. paper [27] introduced GRU for agricultural categorization tasks with multi-temporal SAR data, demonstrating its advantage over LSTM. In a similar work [28], Zhao et al. tested RNN, LSTM, and GRU for early crop categorization using Sentinel-1A imagery, with GRU outperforming the other two techniques.

Applications
In land management, planning, and environmental analysis, land cover is an important consideration. Land cover maps may be created using remote sensing images, including optical and radar information. Numerous land cover categorization methods that combine spectral and spatial data have been thoroughly investigated [30]. These techniques examine the spectral properties of individual pixels as well as the spatial properties of nearby areas or segmented objects. These techniques, however, frequently struggle to distinguish complicated terrain formations or patterns with low-level spectral and spatial properties and are typically susceptible to changes and noise within classes [31]. To address these difficulties, it is crucial to take into account categorization approaches that make use of high-level characteristics. Due to the development of Deep Learning (DL), the area of remote sensing has recently seen substantial advancements, which have been emphasized in several review publications [32]. As shown by the outstanding outcomes obtained in research by [33], DL has been demonstrated to be extremely successful in image classification, particularly for land cover classification. While traditional approaches for classifying land cover rely on low-level spectral and spatial data, DL has the unique capacity to adaptively imagine distinct characteristics that can be learned through supervised learning, considerably increasing classification results [34]. While semantic filtering in natural photos and DL-based land cover classification in remotely sensed images have some parallels, the former makes use of rich spectral and temporal information to provide superior classification outcomes. Additionally, single-source or multisource data, such as optical, radar, and DEM facts, may be used to provide DL inputs for land cover categorization. The results of DL-based land cover classification are pixel-level class and scene-level category label mappings of the input image(s), where each scene/pixel is given a class label based on the maximal class probability. Scenes and pixels are the two different spatial representation layers that may be used to broadly classify studies on DL-based land cover categorization. Several academics [35] have studied scene-level approaches, which are widely employed to tackle the problem of identifying land-use situations. To extract multi-scale and multi-level features using convolutional layers, these algorithms often entail downsampling the input picture. Fully connected layers are then used to make the final prediction of class probabilities. Previous research has looked at several intricate deep land cover mapping models. In several investigations, Huang et al [36] a two-branch CNN architecture was developed to independently process panchromatic and multispectral data sources, as well as remote sensing images. In addition, some recent studies [37] have looked into the fusion of various CNN classifiers into a combined classification for classifying land cover, which enhances the classification capability of single classifiers through tasking classifier ensembles. Recently, Interdonato et al. [38] have combined CNN and RNN into one end-to-end architecture for accurate land cover classification with diverse spectral-spatial-temporal feature representation, taking advantage of the cooperation of convolutional and recurrent neural networks, which capture various elements of the data. Overall, DL methods have been successfully applied to classify land cover and have produced cutting-edge outcomes. It is possible to increase classification accuracy by integrating CNNs with other DL or ML classifiers or traditional image analysis methods like OBIA and conditional random field (CRF). Land cover mapping is more accurate when multitemporal and multisource data are merged. The manually labeled land cover dataset, however, is still insufficient for the majority of jobs when used to train deep models. The extensive practical use of DL for land cover categorization is constrained by this limitation.

Aerosol
Obtaining the aerosol optical depth (AOD), commonly referred to as the aerosol optical thickness, allows for the monitoring of aerosol, a significant meteorological component. AOD may be retrieved using a variety of techniques, including the Dark Target and Deep Blue algorithms, which largely rely on data from the red and blue bands. Utilizing observations of predictors and in-situ AOD, mathematical equations are also utilized to estimate AOD. However, the assumptions used about aerosols and surface characteristics have limits in mathematical models. Traditional NNs have typically been utilized for this task, but deep learning (DL) and neural networks (NN) have also shown promise in assuming statistical connections for AOD remote sensing. Satellite illumination from various bands is used as the primary input for NN-based remote sensing, although auxiliary variables including angle data, topographic details, and weather conditions are also occasionally used. AOD retrieval and bias correction for AOD products are the main focuses of NN-based remote sensing applications. In the context of AOD extraction, correction of bias has been a common use for neural networks (NN). The primary distinction between NN-based AOD rectification and retrieving is that the former method uses NN input to remove biases in the current AOD product, whilst the latter calculates AOD by developing a model that associates satellite radiance data with AOD. Land cover data, satellite reflectivity data, and angles data were employed as inputs for the NN-based AOD correction model by Lary et al. [39] to increase the correlation with AERONET. Similarly to this, Ristovski et al. [40] successfully trained an NN-based estimator for retrieving risk. The necessity for additional sophisticated input for the MODIS and AERONET AOD has arisen because the bias between the two can be affected by several variables. The MODIS AOD product was used to derive corrected AOD values using a specific method by Lanzaco et al. [41]. Qin et al. [42] also employed a similar approach, but with the inclusion of meteorological factors and cloud fraction for correcting the MERRA-2 AOD. This involved using a BPNN optimized with a genetic algorithm. As a result, they achieved significantly improved product quality. While deep learning (DL) has not been frequently used for this issue, neural networks (NN) have been widely employed in aerosol retrieval and correction with some promising outcomes. When compared to NN, DL has a greater chance of properly capturing the correlations between multiband radiance data and AOD. It also has a stronger capability for obtaining data. Given the success of NN in the earlier AOD experiments, looking into the possibilities of DL thus represents an exciting area.

Precipitation
The estimation of precipitation is critical in the study of meteorology, ecology, and hydrology since it is an important component of the water cycle. However, the coverage and validity of ground-based precipitation gauges are constrained. Remote sensing technology has been used to create satellite-based precipitation products that provide highresolution worldwide data to address this problem. To increase accuracy, various geostationary orbit (GEO) satellite channels, including longwave infrared (IR) and water vapor (WV) channels, as well as precipitation radar data, have been incorporated into artificial neural networks (NNs) for satellite rainfall estimating (SRE). There are two types of NNs used for SRE: one for a variety of approaches for resolving the opposite issue, such as physical or tasking/statistical methods, and the other for the various satellite inputs used for precipitation calculation, such as infrared radiation imagery, passive microwave information, and satellites combining data [43]. In comparison to previous approaches, neural networks (NNs) can preserve satellite measurement accuracy while simultaneously boosting computing efficiency post-training. There is no need to examine the underlying correlations between geophysical factors and precipitation when employing NNs for rainfall estimate, making it an empirical technique. The University of Arizona created PERSIAN (Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks), a system that estimates rainfall rates using a configurable three-layer feedforward NN model. This is accomplished by the analysis of both infrared satellite imaging and ground surface data [44]. To elaborate, as compared to previous methodologies, neural networks are a preferable alternative for retaining satellite measurement quality and enhancing computing performance. Using NNs to estimate rainfall is an empirical method that does not need investigation of the links between geophysical factors and rainfall. The PERSIANN system at the University of Arizona employs a three-tier feedforward neural network model capable of forecasting rainfall rates using infrared satellite images and surface elevation data. Recently, DL approaches have been applied effectively to increase SRE accuracy while reducing bias and false alarms. Tao et al. [45]employed PERSIANN-SDAE in conjunction with Stacked Denoising Auto-Encoders (SDAE) to predict large-scale precipitation from GEOS imagery. In 2019, Hen and colleagues developed a two-step deep neural network (DNN) approach for predicting rainfall using data from ground radar reflectivity, TRMM precipitation radar, and gauge stations [46]. Their DNN method demonstrated excellent performance in both regional and global rainfall mapping. These results suggest that deep learning techniques can achieve cutting-edge accuracy in precipitation estimation. Furthermore, combining data from multiple sources such as satellite-based rainfall items, microwave data, and gauge facilities to build continuous spatiotemporal precipitation maps on a global scale is a vital but difficult field of research.

Snow cover
Snow cover is a significant indication of global warming as it impacts surface energy balance, water circulation, and the functioning of ecosystems. Calculating snow parameters such as snowfall depth(SD), snow water equivalent (SWE), and fractional snow cover accurately is critical for climatology and hydrological research. Artificial neural networks (ANNs) have been widely employed to increase the reliability of these variable estimations. Passive microwave brightness temperature assessment of snow depth (SD) or snow water equivalent (SWE) includes a nonlinear connection that typical linear methods cannot adequately represent. Because of their capacity to handle nonlinear mapping well, artificial neural networks (ANNs) have gained appeal in this field. To increase the accuracy of SD and SWE estimates, the ANN models incorporate inputs such as vertically and horizontally polarised brightness temperatures of 19 and 37 GHz, as well as supplementary data. These models have been trained to recover SD or SWE using experiments, and their performance has been proved in contrast with linear approaches such as the Change algorithm. Additionally, ANNs have been applied in SWE product fusion and SD reconstruction. It can be done to create maps of a fraction of snow cover by creating a correlation among satellite imagery reflection and snow cover proportional while incorporating additional data such as the Normalized Difference Snow Index (NDSI) obtained from MODIS bands 4 and 6 (or 7), as well as NDVI. ANNs have also been used in several studies to accurately calculate the link between partial snow cover and satellite reflectance.
Deep learning models are well-known for their ability to extract features automatically and for their great nonlinear expressive capacity. Despite this, they have received little attention in snow-related studies. Nijhawan et al. [49] presented a multilayer deep learning framework for categorizing snow and non-snow locations utilizing multiple satellite pictures for snow cover mapping in their study. Their technique outperformed artificial neural networks in the precision of classification. ANNs have found widespread use in the estimation of numerous snow parameters such as SWE, SD, and fractional snow cover, resulting in enhanced estimate precision. Except for snow cover visualization, it is obvious that DL models may increase the precision of quantified snow component estimations. Using ANN or DL models to combine low-resolution passive microwave data (or SWE and SD data) with high-resolution optical remote sensing data can aid in downscaling SD and SWE. A strategy like this could help in improving estimation reliability.

Discussions and recommendations for future work
Environmental remote sensing is based on a variety of physical frameworks that are grounded on fundamental theories in science. There has been increasing criticism, however, that DL functions as a black box, with no clear explanations for why and how the models work.

Emphasizing the need of including physical concepts in DL architecture:
We think that DL is unable to entirely substitute physical models and that a mix of physical models and DL has considerable potential for ecological remote sensing. To optimize environmental remote sensing, an integrated strategy combining physical frameworks and DL is advised. This may be accomplished in four ways.
1. Employing deep learning techniques to replicate physical models through simulation: When applied to environmental modeling jobs, the process of forward simulation for physical models often necessitates a significant amount of processing capacity. Instead of depending primarily on traditional approaches, deep learning may be used to do partial or complete forward simulations of physical models, lowering computing costs.
2 Using deep learning to measure the results of a physical framework: Physical frameworks output might be inaccurate due to the unpredictability of parameters. Deep learning (DL) approaches may be used for validation to improve these outputs by combining observations from the real world and more data. Shen et al. [50] demonstrated that including Global Land Data Integration System models, such as soil moisture content and albedo, into a DL framework may improve the highest temperature of the air estimation.
3 Developing DL models while keeping physical concepts in consideration: Environmental remote sensing applications that need an integration of physics and DL processes have been particularly created for DL structures. One of the inherent benefits of DL architecture is its ability to record physical relationships while also providing comprehension. Schütt et al. [51], for example, proposed a molecular deep tensor NN that governs molecule characteristics using quantum mechanics rules. This method demonstrates the power of DL in modeling complicated physical structures.
4 Physics-constrained DL modeling: Incorporating physical limitations into deep learning models is one method for environmental modeling. This entails creating a loss function that combines physical mechanisms and information to verify that the resultant model is physically consistent. The resultant model can achieve both high performance and physical consistency by optimizing the physically limited cost function. Karpatne et al. [52], for example, created a physics-constrained neural network model for lake temperature modeling that took into account physical correlations between temperatures, volume, and level of water are all factors to consider. Physical simulations combined with deep learning have the potential to increase model accuracy as well as our understanding of the underlying physical processes in environmental remote sensing.

Implementing geographic constraints into deep learning DL:
Because environmental processes happen in time and space, they must follow geographical principles that describe the spatial correlation and diversity of environmental data.. Spatiotemporal autocorrelation denotes that environmental variables are associated with themselves throughout time and space, whereas heterogeneity denotes that the connection between environmental factors varies over space and time. However, DL models are often employed to build global numerical correlations between variables that do not take spatial considerations into account. The investigation of environmental remote sensing research can be enhanced by applying geographical concepts to DL models. There are two basic approaches for including temporal correlation of environmental variables in DL models: (1) As variables of input for DL models, spatial heterogeneity, and (2) in DL modeling with spatial and temporal constraints.

5.3.
Transfer learning and small sample size: Deep learning models are frequently complicated and need a huge quantity of training data. However, due to variables such as cloud cover and sparse ground stations, environmental remote-sensing datasets frequently have a restricted number of samples. Transfer learning can be utilized to alleviate this issue. This entails fine-tuning a model that has been pre-trained on big datasets with few samples to attain optimal performance in the current job. By using past information and learning from a smaller dataset, the model can improve its forecast accuracy by employing pretrained parameters. A typical difficulty in machine learning is a small sample size, which occurs when the available dataset is insufficient to develop a strong model. The model may not generalize effectively to fresh data in such instances, resulting in poor performance. Transfer learning is a strategy that can assist in dealing with the problem of limited sample size. It entails taking pre-trained models that have been trained on big datasets and finetuning them to accomplish a specific job on a smaller dataset. As a consequence, the model may benefit from the large dataset's knowledge and adjust it to the smaller sample, resulting in improved performance. dataset. In conclusion, transfer learning is a strong strategy for addressing the issue of limited sample size and improving performance.

Conclusion
The goal of this study is to offer a thorough analysis of the applications of standard neural networks (NN) and advanced deep learning (DL) approaches in environmental remote sensing. DL approaches, which arose from the discipline of machine learning (ML), are increasingly being employed in remote sensing applications for image processing, classification, and parameter retrieval. The investigation's findings demonstrated remarkable advancement in the application of DL approaches in environmental remote sensing. The report also recommends some prospective research avenues for enhancing DL tools in this sector, such as mixing physical and DL models and incorporating geographical rules into intelligent DL architecture. Furthermore, while in typical DL a substantial amount of experimental data are required for algorithms., the article suggests combining transfer learning with DL to boost their performance.