Inversion Model for Salinization in Kashgar Oasis Area using Deep Learning

Cuicui Wang, Yinfeng He, Pengwei Zhang, Qihan Feng, Xinlei Lin, Qiang Wang, Wenwen Shi, Haibao Wen, Liming Liu and Rajesh Govindan
* Author for corresponding; e-mail address: 2211331@tongji.edu.cn
ORCID ID: https://orcid.org/0009-0005-9977-9257
Volume: Vol.53 No.1 (January 2026)
Research Article
DOI: https://doi.org/10.12982/CMJS.2026.001
Received: 25 March 2025, Revised: 24 July 2025, Accepted: 3 September 2025, Published: 29 December 2025

Citation: Wang C.C., He Y.F., Zhanga P.W., Feng Q.H., Lin X.L., Wang Q., et al., Inversion model for salinization in Kashgar oasis area using deep learning. Chiang Mai Journal of Science, 2026; 53(1): e20260001. DOI 10.12982/CMJS.2026.001.

Abstract

Soil salinization is a severe soil degradation process which represents a critical ecological challenge, threatening the sustainable development of agriculture in the Kashgar oasis region of Xinjiang. Therefore, the timely and efficient monitoring and the accurate estimation of soil salinity are highly imperative for the prevention and management of soil salinization. The study presented in this paper involves the development of a new soil salinity inversion model based on the TabNet deep learning algorithm using remote sensing data and environmental variables. The model outperforms common ensemble learning algorithms based on decision trees. This improvement is achieved through the use of attention mechanism and the deep learning architecture in TabNet. In addition, the novelty of the proposed soil salinity inversion model lies in its use of deep learning to construct an inversion model for salinization. The feature variable dataset was initially constructed using the land surface parameters derived from Landsat 8 imagery and other environmental variables influencing soil salinity. This includes data pre-processing for feature selection using the XGBoost model. Separate soil salinity inversion models were developed using XGBoost, LightGBM, CatBoost, CNN and TabNet algorithms, and their performance was compared. The results indicate that TabNet achieved the best predictive performance among the five models, with ( R^2 = 0.57 ), ( MAE = 8.10 ), and ( RMSE = 11.53 ) on the test dataset. The results of the best performing model, TabNet, and the importance of individual features were subsequently analyzed using SHAP. The effect of some important factors such as groundwater table depth and altitude on salinization is clearly evident. Furthermore, the threshold of groundwater table depth

Keywords: soil salinization, deep learning, ensemble algorithm, inversion model, remote sensing

1. INTRODUCTION

The process of soil salinization is a gradual accumulation of water-soluble salts in the soil to a level that negatively affects the structure of the soil, leading to its degradation [1]. As a global ecological issue, soil salinization poses a significant threat to the sustainability of irrigated agriculture [2,3]. This threat arises because it has the potential to alter soil microbial communities, reduce soil fertility, and lead to the loss of biodiversity. Most crops are highly sensitive to excess salt in the soil, which inhibits their ability to absorb and retain water [4]. Currently, 20% of arable land and 33% of irrigated land in the world are significantly affected by excessive soil salinity [5]. Around 10 million hectares of agricultural land are lost annually due to salinization. This phenomenon is exacerbated by factors such as climate change, poor drainage, excessive use of saline groundwater, and intensive farming practices. As the global population continues to rise and the demand for arable land increases, the risks associated with soil salinization to both food security and environmental safety have become increasingly critical. To enable better management, reclamation and restoration of saline soils in a timely manner, it is necessary to track the changes in soil salinity.
Saline soils are usually characterized by the electrical conductivity of the saturated soil extract greater than $ 4~\text{dS m}^{-1} $ at $ 25^\circ\text{C} $ [6]. This assessment of soil salinization requires the precise measurement of salt ion concentrations within the soil, such as sodium $ (Na^+) $, chloride $ (Cl^-) $, calcium $ (Ca^{2+}) $, magnesium $ (Mg^{2+}) $, sulfate $ (SO_4^{2-}) $, carbonate $ (CO_3^{2-}) $, and nitrate $ (NO_3^-) $ [7].
In the past twenty years, many research studies have shown that remote sensing technology has considerable potential in soil salinization identification for large areas due to its fast, effective and non-destructive characteristics [8]. Based on ground observations and radiation measurements, the primary factors affecting the reflectivity of saline soils are the amount of salt, mineral content, soil moisture, roughness and color [9]. These characteristics give saline soils distinctive morphological features. In comparison to non-saline soils, saline soils exhibit higher spectral characteristics, such as increased reflectance in the visible and near-infrared wavelengths [10]. Farifteh et al. used laboratory data, field measurements and optical remote sensing data to map soil salinity, discovering that the near-infrared (NIR) and shortwave infrared (SWIR) regions are the most favorable spectral bands for salinity estimation [11]. Gorji et al. analyzed the spatial and temporal variability of soil salinity in the Tuz Lake area of Turkey using Landsat-5 TM and Landsat-8 imagery with regression algorithms [12]. Jiang et al. assessed the degree of soil salinization in the Yanqi Basin based on Landsat, Sentinel and MODIS data [13]. Zhao et al. developed and optimized an inversion model for monitoring soil salt content based on multispectral remote sensing data obtained by unmanned aerial vehicle (UAV) [14]. Based on experience and practical experiments, spectral indices such as Soil-Adjusted Vegetation Index (SAVI), Normalized Difference Vegetation Index (NDVI), Salinity Index (SI), Vegetation Indices (VIs), and surface albedo [15]. These indices benefit the digital mapping of soil salinization by utilizing spectral information and surface vegetation data. In addition to this, many factors such as groundwater table depth and the surface temperature are characteristics that can influence soil salinization and have been verified in the previous studies [16].
Machine learning has found a wide range of applications across various fields [17,18,19,20,21,22,23,24,25,26], due to its powerful ability to process complex data and capture intricate relationships. Integration of salinity inversion models with machine learning is a promising approach. Methods such as multi-layer perceptrons (MLP), support vector machines (SVM), random forests (RF), and decision trees (DT) can be used for evaluating soil salinity. Compared to traditional methods, the advantage of these methods lies in their ability to better handle high-dimensional data and complex nonlinear relationships, thereby improving the predictive accuracy and generalization ability of the models.
The countries most affected by salinization are typically located in arid and semi-arid regions of the world, including China, Australia, India, North and South America, the Commonwealth of Independent States (CIS), the Mediterranean and Middle East regions, and Southeast Asia. In China, saline soil is mainly distributed in the northern inland areas and the coastal belt north of the Yangtze River, characterized by its wide distribution, extensive coverage, and continuous expansion. In Xinjiang, areas affected by salinization account for 36.8% of the total saline-alkaline land in China, with the majority located in the southern Xinjiang [27]. There have already been several investigations into soil salinization in other regions of Xinjiang [28]. However, large-scale inversion models for salinization in the Kashgar oasis area remain absent. This paper provides a sufficient number of samples to construct a model, making it more reliable.
In this work, our research aims to: (1) construct an inversion model for salinization in Kashgar oasis area using deep learning. (2) conduct a quantitative analysis of soil salinity by integrating Landsat 8 remote sensing data (with 30m resolution) and ground survey results. (3) compare the effectiveness of several machine learning methods, such as XGBoost, LightGBM, CNN, and CatBoost, with the deep learning approach TabNet in the inversion model of soil salinization. (4) to address the challenge of limited interpretability in deep learning and ensemble algorithms, the SHAP (Shapley Additive Explanations) method [29] is employed to interpret the model’s results. The approach identifies the most significant factors influencing soil salinization and provides a detailed analysis of their specific impacts on the salinization process. This study facilitates the development of digital soil salinity maps for the Kashgar region of Xinjiang, China, providing an analysis of the soil conditions in this area as well as related information.
The structure of this work is outlined as follows. Section 2 provides a detailed introduction to the inversion algorithm and the machine learning interpretability tool SHAP. Section 3 presents the results of the soil salinization inversion model and quantifies the specific impacts of key factors on salinization. Section 4 discusses the significance and value of the research findings. Finally, the conclusions of this study are summarized in section 5.

2. MATERIALS AND METHODS

The Kashgar region, located in the northwest of China within the Xinjiang Uyghur Autonomous Region, is shown in Figure 1. Geographically, Kashgar lies between approximately $ 35^\circ 20' $ to $ 40^\circ 18' $ N latitude and $ 73^\circ 20' $ to $ 79^\circ 57' $ E longitude, positioned at the western edge of the Tarim Basin, the Kashgar River Basin, and the middle to upper reaches of the Yarkant River Basin.The total area of the Kashgar region is approximately 162,000 square kilometers. The study area is characterized by a dry climate with minimal rainfall, classified within the warm temperate continental arid to semi-arid climate zone. Kashgar receives an annual precipitation of about 65 mm, while the annual evaporation rate exceeds 2,100 mm.
Agriculture in the Kashgar region primarily relies on oasis farming, supported by glacial meltwater and river irrigation. However, Soil salinization is a prominent issue in the region, affecting nearly one-third of the cultivated land. Notably, Kashgar contains the largest area of salinized arable land in Xinjiang, posing a significant challenge to sustainable agricultural development. In this paper, sampling in the Kashgar oasis area serve as an experimental focus.

Figure 1. Map of the study area and sample distribution: Geographical location of the study area (a) and specific sampling sites (b).

2.2 Data Source
2.2.1 Remote sensing data
The remote sensing data were collected from May to June 2022, using the two sensors onboard Landsat 8, which are the Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIRS). The OLI acquires multispectral data spanning the visible, NIR, and SWIR spectral regions, while the TIRS captures thermal infrared (TIR) data. Table 1 demonstrates the specific spectral information used for the experiments from the sensors. To minimize errors caused by environmental and climatic factors, each remote sensing image was selected with a cloud cover of less than 10%.

Table 1. Spectral bands and sensor information of Landsat 8.

2.2.2 Environmental variables
Several environmental covariates are implicated in the process of soil salinization. Some factors are directly related to the origin of salts, including groundwater mineralization levels, surface soil salinity, and the composition of the soil’s parent material. Other factors govern the dynamics of salt movement within the soil, such as soil texture and the relative proportions of clay, silt, and sand. Irrigation methods affect the distribution of salts, while groundwater table depth impacts the upward movement and accumulation of salts. Additionally, factors such as topography, land use type, pollution degree, altitude and precipitation also affect soil salinization. All these factors are considered in the model. The data of Digital Elevation Model (DEM) is also used. The data were obtained from website https://www.resdc.cn.

2.3 Soil Sampling and Laboratory Measurement
Soil sampling in the Kashgar oasis area took place from May to June 2022, which is close to the time of the sensor sampling. The selection of 1,451 representative sampling sites was informed by a comprehensive integration of prior research findings, available digital soil maps, and an evaluation of land use types and the degree of soil salinization, as illustrated in Figure 1. This approach ensured a comprehensive and representative coverage of the study area. Figure 2 shows several types of soil salinity in this area.

Figure 2. Different types of soil in the study area: Bare grassland (a), saline-alkali land (b), farmland (c) and abandoned farmland (d).

The total salt content of the samples was determined using a water gravimetric method, which measures the total salt content in the soil leachate by analyzing the concentrations of various salt ions present. First, a $ 0.45~\mu\text{m} $ pore-sized filter membrane was used to precisely filter a specific amount of soil leachate, transferring the salts from the soil samples into the water. Subsequently, $ 100~\text{mL} $ of the soil leachate was taken, and hydrogen peroxide $ (H_2O_2) $ solution was used to remove residual substances and organic matter in the water. Next, anhydrous sodium carbonate $ (Na_2CO_3) $ and hydrogen peroxide $ (H_2O_2) $ were sequentially added to the water sample, and the mixture was dried at $ 103\text{--}107^\circ\text{C} $ to a constant weight. The resulting material, which contains the salt ions, was weighed to calculate the total salt content in the soil $ (\text{g/kg}) $.

2.4 Salinity Indices and Vegetation Indices
Band operation is a method used to assess soil salinity. Many studies have applied remote sensing multispectral data to develop and generally used diverse soil salinity indices [30]. Four soil salinity indices are selected in this paper, and they are calculated as shown in the Salinity indices part of Table 2. These indices are calculated based on the red bands at $ 0.63\text{--}0.69~\mu\text{m} $ and NIR at $ 0.76\text{--}0.90~\mu\text{m} $. These bands are particularly effective in identifying surface features and detecting salinity levels within the soil.
Vegetation indices of land are also commonly used to estimate soil salinity. 5 vegetation indices are selected and they can be calculated from spectral data, as shown in the Vegetation indices part of Table 2.

Table 2. Salinity indices and vegetation indices based on remote sensing. B: Blue band, G: green band, R: red band.

2.5 Inversion Model for Salinization using Deep Learning
2.5.1 Data preprocessing
Data preprocessing is a common tool before modeling. By standardizing and normalizing the data input into the machine learning model, data preprocessing improves the learning efficiency and accuracy of the model. The two main methods used here are variable coding and data normalization.
• (1) Variable encoding
In the soil salization, some categorical variables can only be used as inputs to machine learning models if they are numerically coded. In this paper, Ordinal Encoding and One-Hot Encoding are used due to the physical properties of these categorical features. Ordinal Encoding is used for category data that are discrete and have sequential relationships. Variables in this category include pollution levels and salinity of ground surface. For example, for the pollution level, which is categorized as none, mild, and moderate, it is coded as 0, 1, 2. One-Hot Encoding is used to transfer categorical data that are discrete but have no ordinal relationship, including land use type, topography, soil parent material and soil texture.
• (2) Data standardization
The purpose of data standardization is to adjust the scale of the data, eliminating the influence of different feature dimensions. This process enhances the stability of the data during model training, accelerates model convergence, and reduces training time. Data standardization is primarily applied to all continuous variables including groundwater table depth, precipitation, remote sensing data and so on. The process is as follows:

\[ x_{\text{new}} = \frac{x - \mu}{\sigma} \]

where $ \mu $ is the mean of the continuous variable $ x $, $ \sigma $ is the variance of $ x $.

2.5.2 Feature selection
After data preprocessing, the dataset contains a total of 177 variables, including many redundant variables with minimal relevance to the outcome. Before model training, it is essential to perform feature selection to evaluate the contribution of different features to the soil salinization inversion model, and to eliminate irrelevant features. This study employs XGBoost (Extreme Gradient Boosting) [38] for feature selection, based on its robust feature importance assessment capabilities. This method is introduced in Section 2.6.1 in detail. XGBoost utilizes an ensemble learning mechanism of decision trees, automatically computing feature importance scores during training. This effectively identifies and selects features that contribute most to model predictions, thereby improving model performance, efficiency, and interpretability. Feature selection was performed using Recursive Feature Elimination (RFE) coupled with an XGBoost regressor. Starting with all available features, the model was iteratively retrained, ranking features by importance derived from XGBoost. The least important feature was removed recursively in each iteration. The optimal feature subset size (108 features, 60% of initial) was determined by monitoring the R2 performance metric. The elimination process halted when a significant decline in R2 was observed compared to the previous iteration.

2.6 Modeling Method
This study compares XGBoost, LightGBM, CNN, CatBoost, and TabNet machine learning methods to select the optimal model for predicting soil salinity in the Xinjiang Kashgar oasis area. The following provides a detailed description of these methods.

2.6.1 XGBoost
XGBoost [38] is an ensemble learning technique based on boosting. It combines multiple weak learners, which are also decision trees, into a strong predictive model. XGBoost constructs decision trees sequentially, with each tree correcting the errors of the previous ones. The prediction result of XGBoost is obtained from an additive model consisting of k base models. Suppose the tree model to be trained for this t-th iteration is $ f_t(x) $, then

\[ \hat{y}_i^{(t)} = \sum_{k=1}^{t} f_k(x_i) = \hat{y}_i^{(t-1)} + f_t(x_i), \]

where $ \hat{y}_i^{(t)} $ is the prediction result of sample i after the t-th iteration, $ \hat{y}_i^{(t-1)} $ is the prediction result based on the first $ t-1 $ tree models, and $ f_t $ is the function of the t-th tree. The objective formula of XGBoost includes the loss function and regularization, and is as follows:

\[ F_{\text{objective}}^{\text{XGBoost}} = \sum_{i=1}^{n} L(y_i, \hat{y}_i) + \sum_{k=1}^{K} \Omega(f_k), \]

where $ L $ is the loss function, $ y_i $ is the true value, and $ \hat{y}_i $ is the predicted value. $ \Omega(f_k) $ is the regularization term for the k-th tree $ f_k $, representing the complexity of $ f_k $.

The model optimizes the loss function using gradient descent, and feature importance is determined by evaluating the gain from each feature split in the trees, which is used in Section 2.5.2. The main advantage of XGBoost is that it can handle large datasets efficiently, leading to its robust performance across various machine learning tasks.

2.6.2 LightGBM
LightGBM (Light Gradient Boosting Machine) [39] is an ensemble learning framework, and also a gradient boosting method, designed to enhance training speed and memory efficiency. The algorithm predicts the target variable by integrating multiple decision trees, where each tree is built based on the residual errors of the previous one. Unlike traditional GBDT methods, LightGBM introduces a histogram-based construction method, which discretizes continuous feature values into multiple bins, significantly reducing the computational complexity. Additionally, LightGBM employs a leaf-wise growth strategy, meaning it splits the leaf with the maximum gain at each step, rather than splitting level-wise, which is used in XGBoost and allows for a faster reduction in the loss function. This results in less training time and higher accuracy, but also overffiting on the training data, which could be alleviated by using the max-depth parameter.
The LightGBM model optimizes the following objective function:

\[ F_{\text{objective}}^{\text{LightGBM}} = \sum_{i=1}^{n} L(y_i, \hat{y}_i) + F_{\text{reg}}, \]

where $ L $ represents the loss function and $ F_{\text{reg}} $ is the regularization term, which controls the complexity of the trees.

2.6.3 CatBoost
CatBoost (Categorical Boosting) [40] is a gradient boosting algorithm specifically optimized for handling categorical features. It applies ordered boosting and target encoding techniques to reduce training bias and overfitting risks. Specifically, CatBoost utilizes stratified sampling on the data when constructing each tree, and uses ordered encoding in each boosting round to avoid target leakage. Instead of using complex encoding preprocessors before training in traditional GBDT, CatBoost automatically and efficiently encodes these variables, simplifying data preprocessing and enhancing model generalization. Besides, the objective function of CatBoost typically uses Mean Squared Error (MSE) or Cross-Entropy as the loss function. The Cross-Entropy loss function can be expressed as:

\[ L(y, \hat{y}) = -\frac{1}{n} \sum_{i=1}^{n} \left( y_i \log(\hat{y}_i) + (1 - y_i)\log(1 - \hat{y}_i) \right), \]

where $ y_i $ is the true value and $ \hat{y}_i $ is the predicted value. CatBoost is highly efficient in training models on high-dimensional data with missing values and demonstrates exceptional stability, making it a popular choice for handling complex classification tasks.

2.6.4 TabNet
TabNet (Attentive Interpretable Tabular Learning) [41] is a deep learning model designed specifically for processing tabular data using a sequential multi-step approach. It employs a novel attention mechanism, where attention layers sequentially process the data to select relevant features and learn interactions between them. Specifically, TabNet utilizes a sequential attention mechanism to iteratively select important features. The attention mechanism at each layer adjusts feature weights based on the outputs of previous layers, effectively capturing complex feature interactions. Besides, TabNet uses multiple decision tree layers to perform feature selection and learning. Each layer generates decision paths based on the output of the previous layer, progressively improving prediction accuracy.
One of the key advantages of TabNet is its dynamic feature selection capability. Each decision module selects the most contributive features based on the data, significantly reducing computational overhead and enhancing the model’s generalization ability.

2.6.5 Convolution neural network
Convolutional Neural Networks (CNNs) are a cornerstone deep learning architecture, renowned for their efficacy in capturing spatially structured patterns across domains like computer vision, autonomous systems, and natural language processing. In our regression framework, CNN’s convolutional layers extract hierarchical local features, while pooling operations enhance computational efficiency and reduce feature redundancy. Critically, we integrate residual connections to mitigate gradient degradation and accelerate convergence, addressing key optimization challenges in deep architectures. These design choices collectively enable robust spatial representations essential for our task. For comprehensive architectural details, we direct readers to [42].

2.7 K-fold Cross Validation

Table 3. Tuned hyperparameters of five algorithms: definitions and ranges in cross-validation.

Mitigating the critical risk of overfitting and obtaining reliable estimates of generalization performance are fundamental to credible machine learning evaluation. Cross-validation, particularly the widely adopted K-fold variant, provides a reliable solution. In K-fold cross-validation, the dataset is partitioned into K distinct subsets. Over K iterations, each subset serves exclusively as the validation set once, while the remaining K-1 subsets constitute the training set. Model performance is rigorously quantified as the average error and R2 computed across all K validation sets.
This robustness makes K-fold CV essential for hyperparameter tuning, where parameter choices profoundly impact model efficacy and generalization. The hyperparameters selected for tuning and their tuning ranges for each model are detailed in Table 3. We employed a sequential model-based optimization (SMBO) approach, specifically, a Tree-structured Parzen Estimator (TPE) strategy to optimize the hyperparameters. This Bayesian optimization method iteratively constructs probabilistic surrogate models to approximate the response surface of model performance. By dynamically balancing exploration (sampling from uncertain regions) and exploitation (refining known high-performance regions), it efficiently converges toward optimal hyperparameters with fewer evaluations than exhaustive methods. We operationalized this via the Optuna framework [43].
By rigorously applying cross-validation solely within the training data for tuning, we obtain a stable, bias-resistant assessment of model performance across diverse data partitions, effectively safeguarding against overfitting. The subsequent evaluation of final models on a strictly held-out test set then delivers an unbiased and compelling estimate of their true predictive performance, enabling a definitive and fair comparison between competing methodologies.

2.8 Inversion Model based on Deep Learning
This section describes the newly proposed inversion model for salinization. Given a dataset $ \mathcal{D} = \{(x_i, y_i)\}_{i=1}^{N}, $ where $ x_i $ represents the feature vector and $ y_i \in \mathbb{R}^+ $ denotes the soil salinity. The goal of the inversion algorithm is to achieve optimal prediction of soil salinity through feature preprocessing, feature selection, and multi-model comparison. For continuous variables $ x_i^{(j)} $, Z-score standardization is applied:

\[ \hat{x}_i^{(j)} = \frac{x_i^{(j)} - \mu_j}{\sigma_j}, \qquad \mu_j = \frac{1}{N} \sum_{i=1}^{N} x_i^{(j)}, \qquad \sigma_j = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i^{(j)} - \mu_j)^2 }. \]

Here, $ j $ denotes the $ j $-th feature. For categorical variables, one-hot encoding is employed:

\[ z_i^{(k)} = \begin{cases} 1, & \text{if } x_i^{(j)} = c_k, \\ 0, & \text{otherwise}. \end{cases} \]

$ z_i^{(k)} $ represents the $ k $-th one-hot encoded value for the $ i $-th sample, and $ c_k $ indicates the $ k $-th category. After preprocessing, the data matrices $ \hat{X} $ and $ \hat{y} $ are obtained. An XGBoost model is built using $ \hat{X} $ and $ \hat{y} $:

\[ \hat{y} = f_{\text{XGBoost}}(\hat{X}; \theta_{\text{XGBoost}}), \]

where $ f_{\text{XGBoost}} $ represents the XGBoost model and $ \theta_{\text{XGBoost}} $ denotes the hyperparameters of XGBoost. The importance score of each feature is calculated based on feature gain:

\[ I_j = \sum_{m=1}^{M} \sum_{s \in S_m} \Delta L_s, \qquad \Delta L_s = L_{\text{before}} - L_{\text{after}}. \]

$ S_m $ denotes features used in the $ m $-th split, and $ L_{\text{before}} $ and $ L_{\text{after}} $ respectively mean the loss before and after the split. Features with the lowest $ I_j $ are removed iteratively until the accuracy of the model significantly decreases. After feature selection, the feature set $ X $ is formed.

Using $ X $, multiple machine learning models, including XGBoost, LightGBM, CatBoost, CNN, and TabNet, are constructed. The prediction function is:

\[ \hat{y}^{(k)} = f_k(X; \theta_k), \qquad k \in \{\text{XGBoost, LightGBM, CatBoost, CNN, TabNet}\}. \]

$ f_k $ represents the prediction function for model $ k $. To rigorously evaluate generalization performance, the dataset is partitioned into a training set (80%) and a hold-out test set (20%). Using cross-validation on the training set, we optimize hyperparameters $ \theta_k $ for each model by minimizing the validation loss as follows:

\[ \theta_k^{*} = \arg\min_{\theta_k} \frac{1}{K} \sum_{v=1}^{K} \mathcal{L}\big(y_{\text{val}}^{(v)}, \hat{y}_{\text{val}}^{(v,k)}\big). \]

The loss function $ \mathcal{L}_{\text{val}}^{(k)} $ compares the actual soil salinity values $ y_{\text{val}} $ in the validation set with the predicted values $ \hat{y}_{\text{val}}^{(k)} $ for model $ k $. The final model is then trained on the full training set with these optimal hyperparameters, and its generalization error is robustly assessed on the untouched test set. The optimal model is selected as:

\[ k^{*} = \arg\min_{k \in \mathcal{K}_{\text{test}}} \mathcal{L}\big(y_{\text{test}}, \hat{y}_{\text{test}}^{(k)}\big). \]

The loss function $ \mathcal{L}_{\text{test}}^{(k)} $ compares the actual soil salinity values $ y_{\text{test}} $ in the test set with the predicted values $ \hat{y}_{\text{test}}^{(k)} $ for model $ k $. As shown in Figure 3, the flow chart illustrates the framework of the inversion method, providing a clear visual representation of the step-by-step process used in this study.

Figure 3. Flow chart of the inversion method.

2.9 Model Interpretability
While complex models are able to capture complex relationships in the data, the reason for the corresponding output given by the model for a particular input is unknown [42]. The SHAP algorithm is an algorithm for interpreting the results of machine learning models [29]. The SHAP theory not only reveals the importance of each feature but also gives a clear picture of how individual features affect the output of the model. The predicted value of a particular sample can be represented as:

$$\hat{y} = \mathbb{E}(y) + shap(X_1) + shap(X_2) + \dots + shap(X_p),$$

where $\mathbb{E}(y)$ denotes the average predicted value over all samples, $X_j$ denotes the $j$th feature, and $shap(X_j)$ denotes the SHAP value of $X_j$ which means the marginal contribution of $X_j$ to the prediction. Here the predicted value of the model is equal to the sum of the contribution of all features and the average predicted value. This property is called local accuracy. The absolute SHAP value of a feature reflects the impact of the feature on the prediction and can be considered as the importance score. In addition to local accuracy, the Shapley value also satisfies missingness and consistency [29]. The value that satisfies the above properties is unique that can be expressed as

$$shap(X_i) = \sum_{S \subseteq X \setminus \{i\}} \frac{|S|! (|X| - |S| - 1)!}{|X|!} [f(S \cup \{i\}) - f(S)],$$

Where $X$ is the set of all features, $i$ is the feature index, $S$ is a subset of the features, $|S|$ denotes the number of features in set $S$, and $f(S)$ denotes model prediction result of the feature set $S$. Different machine learning models can be interpreted by different versions of the SHAP, such as Kernel SHAP, Linear SHAP, and Tree SHAP. We utilize PermutationExplainer which is a version of the SHAP based on feature permutation to interpret the TabNet model. For tree-based machine learning models such as XGBoost, Tree SHAP is preferred.

3. RESULT

3.1 Soil Salinity Descriptive Statistics
Following statistical analysis of soil salinity, key descriptive statistics including maximum, minimum, variance, mean, and coefficient of variation were determined. These results are presented in Table 4. The soil salinity in the study area range from 0.30 g/kg to 298 g/kg. The standard deviation was 33.61, and the coefficient of variation was 1.77. These statistics reveal significant spatial variability in soil salinity within the study area.

Table 4. Descriptive statistics of soil salt.

3.2 Performance of Different Methods
To evaluate the predictive performance of different soil salinization inversion models, this study employs three metrics: Coefficient of Determination ($R^2$), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE). Suppose that $y_i$ represents the actual values, $\hat{y}_i$ denotes the predicted values, $\bar{y}$ is the mean of the actual values and $n$ is the number of samples. The calculation formulas for these evaluation metrics are as follows:

1. Coefficient of Determination ($R^2$):

$$R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2}$$

$R^2$ measures the proportion of the variance in the dependent variable that is explained by the independent variables, and it is commonly used to assess the interpretability and fitting performance of models. The closer the $R^2$ value is to 1, the greater the proportion of variance the model can explain, leading to higher predictive accuracy.

2. Root Mean Squared Error (RMSE):

$$RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$$

RMSE estimates the prediction error by calculating the square root of the mean of the squared differences between the predicted and actual values. A lower RMSE indicates smaller prediction errors and higher model accuracy. It is important to note that RMSE is particularly sensitive to larger errors, making it suitable for applications where larger errors are more significant.

3. Mean Absolute Error (MAE):

$$MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|$$

MAE measures model performance by calculating the average of the absolute differences between observed and predicted values. A MAE value closer to 0 indicates a smaller average difference between predicted and actual values, resulting in better model performance. Compared to MSE, MAE is less sensitive to large errors, providing a straightforward measure of the average prediction error.
All the models select the optimal hyperparameters based on the performance of cross-validation on the training set. The number of folds in cross validation is 5. The performance of different models in training set, test set and validation set in cross validation is presented in Table 5. On the test set, TabNet demonstrated superior performance with the highest $R^2$ of 0.57, the lowest MAE of 8.10 g/kg, and the lowest RMSE of 11.53 g/kg. This underscores the advantage of deep learning approaches in capturing complex feature relationships. While other machine learning models demonstrate strong performance on the training set, their generalization ability on the test and validation sets is slightly inferior, indicating a potential issue of overfitting.
Figure 4 displays the box plot of prediction errors across various inversion models. While the median errors of all five models are centered around zero, TabNet exhibits the most concentrated error distribution, with relatively fewer outliers in the error.
These findings indicate that TabNet provides the best predictive performance. Consequently, the subsequent sections will utilize the TabNet algorithm for predicting soil salinity.

Table 5. Performance of different models on training set, test set and validation set.

Figure 4. Boxplot to compare the errors of five models.

3.3 Variable Importance
The SHAP summary plot is shown in Figure 5 where features are ordered by their decreasing effect on salinization from top to bottom. Each point in the plot represents the SHAP value of a feature for a sample. A positive SHAP value indicates a positive effect on salinization, while a negative SHAP value indicates a negative effect. The larger the absolute value of the SHAP value, the greater the effect on salinization. From the plot, it is evident that factors such as the presence or absence of irrigation, groundwater table depth, soil type and land use type have a strong influence on salinization.
SHAP dependence plots with interactive features help to understand the effect of individual features on the model prediction. The effect of interactions between different variables on the results is also captured. The value of the feature is represented by horizontal coordinate while the SHAP value of the feature is represented by the vertical coordinate. The color of each point corresponds to the value of the other feature.
Figure 6(a) illustrates that when the groundwater table depth is too shallow, the SHAP value corresponding to the groundwater table depth is relatively large, which means that there is a strong promotion effect on soil salinization. But the promotion effect is rapidly weakened as the groundwater table depth increases. The relationship between groundwater table depth and soil salinity becomes more complicated after the groundwater table depth exceeds 4.86 m. It implies that 4.86m may be the threshold of groundwater table depth for soil salinization prevention and control in Kashgar. Deeper groundwater table depth and more precipitation greatly inhibit the soil salinization process.

Figure 5. SHAP summary plot of the top 18 features of the TabNet. A higher SHAP value for a feature indicates a greater impact of that feature on soil salinity. Features are ranked according to their importance. Each point in the plot represents the SHAP value of a specific feature for a particular sample. The magnitude of the feature value is reflected by the color of the point.

Figure 6(b) shows that as the value of NDVI increases, its SHAP value slowly increases. NDVI reflects the vegetation health, and a higher NDVI value reflects healthier vegetation. The expansion of vegetation exerts a marked inhibitory effect on soil salinization, with the strength of this suppression showing a negative linear correlation with NDVI values. This inhibitory effect becomes even more pronounced when groundwater depth increases significantly.
Figure 6(c) indicates that as the altitude increases, its SHAP value shows a decreasing trend. Points with deeper groundwater table depth are generally located in the lower right corner of the plot which suggests that groundwater table depths are generally deeper at higher altitudes, higher altitudes and greater groundwater table depth have a certain inhibitory effect on salinization.
Figure 6(d) shows that the SHAP value exhibits a gradual increase as the TIRS1 value rises. TIRS1 corresponds to the surface temperature. High values of TIRS1 mean high surface temperature and high evaporation from the surface, which promotes soil salinization. The interaction between surface temperature and precipitation on soil salinization is not significantly pronounced.
The relationship between the NIR band and soil salinization, as illustrated in Figure 6(e), demonstrates a non-linear pattern. As NIR values increase, the degree of soil salinization decreases progressively; however, the rate of reduction in soil salinization slows as NIR values continue to increase. Notably, NIR values within the range of 14,000 to 20,000 are particularly sensitive to changes in soil salinization levels, indicating that within this interval, variations in NIR are more strongly correlated with shifts in salinity. This suggests that NIR data can serve as a valuable indicator for monitoring soil salinization.
Figure 6(f) demonstrates a gradual increase in SHAP values with rising longitude, suggesting the severity of soil salinization tends to be relatively more severe in regions with higher longitude. However, the influence of geographic location on salinization remains relatively weak.

Figure 6. SHAP interactive dependence plots of the TabNet. The plots demonstrate the effect of interactions on soil salinity: Impact of the groundwater table depth and precipitation (a); Impact of the NDVI and groundwater table depth (b); Impact of the altitude and groundwater table depth (c); Impact of the TIRS1 and precipitation (d); Impact of the NIR and NDVI (e) and Impact of the longitude and latitude (f).

4. DISCUSSION

4.1 Larger Data Sampling Range
In this study, we employed a broader sampling range within the Xinjiang Kashgar oasis area compared to other related research. This larger sampling scope covers a variety of soil types and land use types, as well as regions with varying degrees of soil salinization. Such an extensive sampling range allows the model to capture the spatial distribution and variability of soil salinization more comprehensively.
In contrast to the limitations of previous studies, our research significantly enhances the representativeness and generalizability of the model by increasing both the number and distribution of sampling points. This not only allows the model to more accurately reflect the actual conditions of soil salinization but also improves its adaptability across different environmental conditions.

4.2 Remote Sensing Spectral Data and Land Surface Parameters
Satellite multispectral data play a significant role in large-scale monitoring and analysis of soil salinization, providing valuable insights for the future development of irrigated agriculture. In this study, the remote sensing data were sourced from the two sensors of Landsat 8, specifically the OLI and the TIRS. Compared to traditional laboratory methods for detecting soil salinity, this approach offers advantages such as easier data acquisition and the ability to cover extensive monitoring areas.
Raw spectral data are often susceptible to noise due to atmospheric conditions, vegetation cover, and other environmental factors, together with high dimensionality. In this study, land surface parameters are used to reduce noise interference and simplify the process of feature extraction [43]. This approach improves the computational efficiency and generalization capability of the model. Additionally, by focusing on these parameters, the model is better able to capture the essential physical characteristics related to soil salinization, resulting in more robust and interpretable outcomes.

4.3 Comparison of Soil Salinity Inversion Models
This study integrates environmental characteristics with spectral data and employs various methods, including XGBoost, LightGBM, CatBoost, CNN, and TabNet, to develop soil salinity inversion models for the Kashgar region. By feeding remote sensing data and ground information into these trained models, it is possible to estimate soil salinity accurately.
Among the five models tested, the deep learning model TabNet demonstrated superior inversion performance compared to the other three models. With its combination of satellite spectral data and machine learning, the inversion model based on TabNet achieves a more effective extraction of nonlinear relationships between features and outcomes. This highlights its potential for more accurately modeling complex interactions in soil salinization processes.

4.4 Evaluation of the Feature Importance for Soil Salinity Modeling
In the evaluation of feature importance for soil salinity modeling using the TabNet model, SHAP analysis provides critical insights into the factors influencing soil salinity in the Xinjiang Kashgar region. The SHAP summary plot reveals that "the presence or absence of irrigation," "salinity of the ground surface," and "groundwater table depth" are the most significant contributors to prediction of the model, highlighting their crucial role in understanding and modeling soil salinity. To further investigate these relationships, interactive dependence plots were employed, demonstrating how different features interact with other variables. This analysis provided a post-hoc interpretation of the model from both global and local perspectives.

4.5 Future Directions for Enhanced Feature Selection
To reduce feature redundancy and enhance efficiency, we progressively removed the least important features using XGBoost. Although this feature selection approach is widely adopted and the number of features removed was not substantial, this screening strategy may still be inherently linked to the architecture of XGBoost. In future work, we will introduce SHAP (SHapley Additive exPlanations) values, which quantify the marginal contribution of each feature to the prediction outcome in a model-agnostic manner, thereby enabling cross-method validation of feature importance. Furthermore, we plan to design tailored feature screening strategies based on the structural characteristics of different models: for tree-based models, features exhibiting high splitting gain will be prioritized for retention; while for linear models, the screening will focus more on features demonstrating low multicollinearity.

5. CONCLUSIONS

In the study, a variety of variables are used including remote sensing data, salinity indices, vegetation indices, and relevant environmental variables. After data preprocessing and feature selection, the precision of several algorithms is compared, and the optimal model TabNet is used to estimate soil salinity.
1. Among the five models evaluated, TabNet shows superior performance compared to the other three ensemble algorithms. On the test set, TabNet achieves an R² value of 0.57, a RMSE of 11.53, and a MAE of 8.10.
2. The factors affecting the final soil salinity are analyzed by the SHAP algorithm. Irrigation presence, salinity of ground surface, groundwater table depth and so on are the most important features to monitor the soil salinity.
3. The effect of some important features on soil salinity is analyzed and quantified by the SHAP algorithm. The threshold for groundwater depth to prevent soil salinization in the Kashgar region has been established. An increase in groundwater depth, elevation, and vegetation cover, along with a decrease in surface temperature, are factors that can mitigate the process of soil salinization. |
By using deep learning algorithms to capture the nonlinear relationships between features, SHAP further enhances our understanding of how each feature influences the final prediction, thereby providing valuable insights into the soil salinization process.

ACKNOWLEDGEMENTS

The authors acknowledge the support for this study from the Third Xinjiang Comprehensive Scientific Expedition Project (project title: investigation on ecological suitability of key areas in the Tarim River Basin; project code: 2022xjkk0300). The authors also thank the Geological Survey Projects of China Geological Survey (project codes: DD20220872, DD20220962) for their support and the National Earth System Science Data Center. Additionally, the authors acknowledge the National Aeronautics and Space Administration (NASA) for providing access to the dataset.

CONFLICT OF INTEREST STATEMENT

The authors declare that they have no conflict of interest in this paper.

REFERENCES

[1] Singh A., Soil salinization management for sustainable development: A review. Journal of Environmental Management, 2021; 277: 111383. DOI 10.1016/j.jenvman.2020.111383.

[2] Singh A., Soil salinity: A global threat to sustainable development. Soil Use and Management, 2022; 38(1): 39–67. DOI 10.1111/sum.12772.

[3] Parnian A., Chorom M., Jaafarzadeh N., Anosheh H.P., Ozturk M., Unal D., et al., Bioremediation of cadmium and nickel from a saline aquatic environment using Ceratophyllum demersum. Chiang Mai Journal of Science, 2022; 49(2): e2022020. DOI 10.12982/CMJS.2022.020.

[4] Machado R.M.A. and Serralheiro R.P., Soil salinity: Effect on vegetable crop growth. Management practices to prevent and mitigate soil salinization. Horticulturae, 2017; 3(2): 30. DOI 10.3390/horticulturae3020030.

[5] Shrivastava P. and Kumar R., Soil salinity: A serious environmental issue and plant growth promoting bacteria as one of the tools for its alleviation. Saudi Journal of Biological Sciences, 2015; 22(2): 123–131. DOI 10.1016/j.sjbs.2014.12.001.

[6] Hardie M. and Doyle R., Measuring Soil Salinity; in Shabala S. and Cuin T., eds., Plant Salt Tolerance: Methods in Molecular Biology, Volume 913, Humana Press, Totowa, NJ, 2012: 415–425. DOI 10.1007/978-1-61779-986-0_28.

[7] Qadir M., Ghafoor A., and Murtaza G., Amelioration strategies for saline soils: A review. Land Degradation & Development, 2000; 11(6): 501–521. DOI 10.1002/1099-145X(200011/12)11:6%3C501::AID-LDR405%3E3.0.CO;2-S.

[8] Metternicht G.I. and Zinck J.A., Remote sensing of soil salinity: Potentials and constraints. Remote Sensing of Environment, 2003; 85(1): 1–20. DOI 10.1016/S0034-4257(02)00188-8.

[9] Allbed A. and Kumar L., Soil salinity mapping and monitoring in arid and semi-arid regions using remote sensing technology: A review. Advances in Remote Sensing, 2013; 3: 1–8. DOI 10.4236/ars.2013.24040.

[10] Farifteh J., Farshad A. and George R. J., Assessing salt-affected soils using remote sensing, solute modelling, and geophysics. Geoderma, 2006; 130(3-4): 191–206. DOI 10.1016/j.geoderma.2005.02.003.

[11] Farifteh J., Van der Meer F., Atzberger C., and Carranza E.J.M., Quantitative analysis of salt-affected soil reflectance spectra: A comparison of two adaptive methods (PLSR and ANN). Remote Sensing of Environment, 2007; 110(1): 59–78. DOI 10.1016/j.rse.2007.02.005.

[12] Gorji T., Sertel E. and Tanik A., Monitoring soil salinity via remote sensing technology under data scarce conditions: A case study from Turkey. Ecological Indicators, 2017; 74: 384–391. DOI 10.1016/j.ecolind.2016.11.043.

[13] Jiang H., Rusuli Y., Amuti T. and He Q., Quantitative assessment of soil salinity using multi-source remote sensing data based on the support vector machine and artificial neural network. International Journal of Remote Sensing, 2019; 40(1): 284–306. DOI 10.1080/01431161.2018.1513180.

[14] Zhao W., Zhou C., Zhou C., Ma H. and Wang Z., Soil salinity inversion model of oasis in arid area based on UAV multispectral remote sensing. Remote Sensing, 2022; 14(8): 1804. DOI 10.3390/rs14081804.

[15] Elfarkh J., Johansen K., El Haji M.M., Almashharawi S.K. and McCabe M.F., Evapotranspiration, gross primary productivity and water use efficiency over a high-density olive orchard using ground and satellite based data. Agricultural Water Management, 2023; 287: 108423. DOI 10.1016/j.agwat.2023.108423.

[16] Ibrakhimov M., Khamzina A., Forkutsa I., Paluasheva G., Lamers J.P.A., Tischbein B., et al., Groundwater table and salinity: Spatial and temporal distribution and influence on soil salinization in Khorezm region (Uzbekistan, Aral Sea Basin). Irrigation and Drainage Systems, 2007; 21: 219–236. DOI 10.1007/s10795-007-9033-3.

[17] Gao Z., Lin Q., He Q., Liu C., Cai H. and Ni H., Rapid detection of spoiled apple juice using electrical impedance spectroscopy and data augmentation-based machine learning. Chiang Mai Journal of Science, 2024; 51(5): e2024071. DOI 10.12982/CMJS.2024.071.

[18] Xiang W., Pan C., Liu J. and Liu Y., A ghost and attention mechanism-based deep learning approach for SAR small target image detection. Chiang Mai Journal of Science, 2024; 51(5): e2024076. DOI 10.12982/CMJS.2024.076.

[19] Harnpadungkij T., Chaisangmongkon W. and Phunchongharm P., Risk-sensitive portfolio management by using C51 algorithm. Chiang Mai Journal of Science, 2022; 49(5): e2022094. DOI 10.12982/CMJS.2022.094.

[20] Liu Z., Fu Y., Jiang J., Huang Y., Li D., Yue Y., et al., Development and validation of a predictive model for herbaceous plant growth based on water-sediment stress. Chiang Mai Journal of Science, 2024; 51(6): e2024095. DOI 10.12982/CMJS.2024.095.

[21] Guo H., Ma Y., Xu W., Zhao Y., Yang Z., Xu Y., et al., Prediction of leakage rate and optimization of structural parameter of blade tip labyrinth seal. Chiang Mai Journal of Science, 2023; 50(1): e2023002. DOI 10.12982/CMJS.2023.002.

[22] Fu J., Xiao D., Fu R., Li C., Zhu C., Arcucci R., et al., Physics-data combined machine learning for parametric reduced-order modelling of nonlinear dynamical systems in small-data regimes. Computer Methods in Applied Mechanics and Engineering, 2023; 404: 115771. DOI 10.1016/j.cma.2022.115771.

[23] Pan X. and Xiao D., Domain decomposition for physics-data combined neural network based parametric reduced order modelling. Journal of Computational Physics, 2024; 519: 113452. DOI 10.1016/j.jcp.2024.113452.

[24] Fu R., Xiao D., Navon I.M., Fang F., Yang L., Wang C., and Cheng S., A non-linear non-intrusive reduced order model of fluid flow by auto-encoder and self-attention deep learning methods. International Journal for Numerical Methods in Engineering, 2023; 124(13): 3087-3111. DOI 10.1002/nme.7240.

[25] Fu R., Xiao D., Buchan A.G., Lin X., Feng Y. and Dong G., A parametric non-linear non-intrusive reduce-order model using deep transfer learning. Computer Methods in Applied Mechanics and Engineering, 2025; 438: 117807. DOI 10.1016/j.cma.2025.117807.

[26] Xiao D., Heaney C.E., Mottet L., Fang F., Lin W., Navon I.M., et al., A reduced order model for turbulent flows in the urban environment using machine learning. Building and Environment, 2019; 148: 323-337. DOI 10.1016/j.buildenv.2018.10.035.

[27] Muhetaer N., Nurmemet I., Abulaiti A., Xiao S. and Zhao J., A quantifying approach to soil salinity based on a radar feature space model using ALOS PALSAR-2 data. Remote Sensing, 2022; 14(2): 363. DOI 10.3390/rs14020363. [28] Hua F., Xudong P., Yuyi L., Fu C. and Fenghua Z., Evaluation of soil environment after saline soil reclamation of Xinjiang oasis, China. Agronomy Journal, 2008; 100(3): 471–476. DOI 10.2134/agronj2007.0100.

[29] Lundberg S.M. and Lee S.I., A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 2017; 30. DOI 10.48550/arXiv.1705.07874.

[30] Nguyen K.A., Liou Y.A., Tran H.P., Hoang P.P. and Nguyen T.H., Soil salinity assessment by using near-infrared channel and vegetation soil salinity index derived from Landsat 8 OLI data: A case study in the Tra Vinh province, Mekong Delta, Vietnam. Progress in Earth and Planetary Science, 2020; 7(1): 1–16. DOI 10.1186/s40645-019-0311-0.

[31] Khan S. and Abbas A., Proceeding of the International Congress on Modelling and Simulation. (MODSIM 2007). Land, Water & Environmental Management: Integrated Systems for Sustainability, Christchurch, New Zealand, 10-13 December 2007; 2632–2638.

[32] Allbed A., Kumar L. and Aldakheel Y.Y., Assessing soil salinity using soil salinity and vegetation indices derived from ikonos high-spatial resolution imageries: Applications in a date palm dominated region. Geoderma, 2014; 230: 1–8. DOI 10.1016/j.geoderma.2014.03.025.

[33] Carlson T.N. and Ripley D.A., On the relation between NDVI, fractional vegetation cover, and leaf area index. Remote Sensing of Environment, 1997; 62(3): 241–252. DOI 10.1016/S0034-4257(97)00104-1.

[34] Jiang Z., Huete A. R., Didan K. and Miura T., Development of a two-band enhanced vegetation index without a blue band. Remote Sensing of Environment, 2008; 112(10): 3833–3845. DOI 10.1016/j.rse.2008.06.006.

[35] Shi C., Chao G., Bin X., Yunxiang J., Jinya L., Hailong M., et al., Quantitative inversion of soil salinity and analysis of its spatial pattern in agricultural area in Shihezi of Xinjiang. Geographical Research, 2015; 33(11): 2135–2144. DOI 10.11821/dlyj201411013. [36] Scudiero E., Skaggs T.H. and Corwin D.L., Regional-scale soil salinity assessment using Landsat ETM+ canopy reflectance. Remote Sensing of Environment, 2015; 169: 335–343. DOI 10.1016/j.rse.2015.08.026.

[37] Jordan C.F., Derivation of leaf-area index from quality of light on the forest floor. Ecology, 1969; 50(4): 663–666. DOI 10.2307/1936256.

[38] Chen T. and Guestrin C., Xgboost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, San Francisco, 13-17 August 2016; 785–794. DOI 10.1145/2939672.2939785.

[39] Ke G., Meng Q., Finley T., Wang T., Chen W., Ma W., et al., Proceedings of the 31^st International Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, California, USA, 4-9 December 2017; 3149 - 3157.

[40] Prokhorenkova L., Gusev G., Vorobev A., Dorogush A.V. and Gulin A., CatBoost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems, 2018; arXiv:1706.09516. DOI 10.48550/arXiv.1706.09516.

[41] Arik S.O. and Pfister T., Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence, 2-9 February 2021 (Online); 6679–6687. DOI 10.1609/aaai.v35i8.16826.

[42] Kiranyaz S., Avci O., Abdeljaber O., Ince T., Gabbouj M. and Inman D.J., 1D convolutional neural networks and applications: A survey. Mechanical Systems and Signal Processing, 2021; 151: 107398. DOI 10.1016/j.ymssp.2020.107398.

[43] Akiba T., Sano S., Yanase T., Ohta T. and Koyama M., Optuna: A next-generation hyperparameter optimization framework. Proceedings of the 25^th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage AK USA, 4-8 August 2019; 2623–2631. DOI 10.1145/3292500.3330701.

[44] Feng D.C., Wang W.J., Mangalathu S. and Taciroglu E., Interpretable XGBoost-SHAP machine-learning model for shear strength prediction of squat RC walls. Journal of Structural Engineering, 2021; 147(11): 04021173. DOI 10.1061/(ASCE)ST.1943-541X.0003115.

[45] Li Y., Wang C., Wright A., Liu H., Zhang H. and Zong Y., Combination of GF-2 high spatial resolution imagery and land surface factors for predicting soil salinity of muddy coasts. Catena, 2021; 202: 105304. DOI 10.1016/j.catena.2021.105304.

Forest Carbon Estimation Using Two Vegetation Structural Indices Derived from Terrestrial Laser Scanner: Vegetation Area Index and Leaf Area Index
DOI: 10.12982/CMJS.2024.101.

Waiprach Suwannarat, Supisara Suwanprasert, Titinan Pothong, Songyot Kullasoot, Pitak Sapewisut, Rut Kasithikasikham, Chitchol Phalaraksh, Watit Khokthong and Nattawut Sareein

Vol.51 No.6 (November 2024)
Research Article View: 878 Download: 501

Rapid Detection of Spoiled Apple Juice Using Electrical Impedance Spectroscopy and Data Augmentation-Based Machine Learning
DOI: 10.12982/CMJS.2024.071.

Zhenchang Gao, Qing Lin, Qinyu He, Cuihua Liu, Honghao Cai and Hui Ni

Vol.51 No.5 (September 2024)
Research Article View: 4,636 Download: 771

Comparison and Analysis of Remote Sensing-based and Ground-based Precipitation Data Over India
page: 541 - 550

Thitikon Chanyatham and Sukrit Kirtsaeng

Vol.38 No.4 (OCTOBER 2011)
Research Article View: 920 Download: 244

Chiang Mai Journal of Science

1,647

Q3 0.80

Q3 1.3

7 days

Outline

Figures and Tables