PCA-domain fused singular spectral analysis for fast and noise-robust spectral-spatial feature mining in hyperspectral classification.

—The principal component analysis (PCA) and 2-D singular spectral analysis (2DSSA) are widely used for spectral domain and spatial domain feature extraction in hyperspectral images (HSI). However, PCA itself suffers from low efficacy if no spatial information is combined, whilst 2DSSA can extract the spatial information yet has a high computing complexity. As a result, we propose in this paper a PCA domain 2DSSA approach for spectral-spatial feature mining in HSI. Specifically, PCA and its variation, folded-PCA are utilized to fuse with the 2DSSA, as folded-PCA can extract both global and local spectral features. By applying 2DSSA only on a small number of PCA components, the overall computational complexity has been significantly reduced whilst preserving the discrimination ability of the features. In addition, with the effective fusion of spectral and spatial features, the proposed approach can work well on the uncorrected dataset without removing the noisy and water absorption bands, even under a small number of training samples. Experiments on two publicly available datasets have fully demonstrated the superiority of the proposed approach, in comparison to several state-of-the-art HSI classification methods and deep-learning models.


I. INTRODUCTION
ith rich spectral and spatial information in a 3-D hypercube, HSI can well characterize the material and objects based on their physical, e.g.moisture and temperature, and chemical properties.As a result, different HSI processing tasks, including data classification [1], spectral unmixing [2], and image restoration [3], have been explored to tackle various challenges in remote sensing.A HSI is usually composed of 2-D scenes in hundreds of contiguous wavelengths, in which each pixel has a 1-D spectral signature [4].Aside from spectral and spatial information, HSI data contains redundant content and noise due to environmental noise, sensor limitations and atmospheric impacts.As a result, even sophisticated classifiers like support vector machine (SVM) and deep learning (DL) models have limited classification accuracy.Herein, the bottleneck is how to derive the most representative features from the HSI data, i.e. spectral and spatial feature mining especially of the uncorrected dataset.
Considering the high redundancy in contiguous spectral bands, spectral feature extraction and dimensionality reduction Y. Yan and J. Ren are with the National Subsea Centre, Robert Gordon University, Aberdeen, U.K. (Corresponding author: Jinchang Ren).
Q. Liu and H. Sun are with Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Chuangchun, China.
has been popularly used in some early studies.Although PCA is most widely used for unsupervised dimension reduction and spectral feature extraction, it often fails to extract the useful local spectral information.To tackle this issue, several variations have been explored, such as a correlation based segmented PCA (SPCA) [5], where the spectral bands are segmented into groups for group based PCA followed by feature concatenation.In Tsai et al [6], a spectrally segmented PCA was proposed and shown better performance than PCA and SPCA for mapping of the plant species.Similar to the SPCA, FPCA was also developed to extract both the local and global structures in the spectral domain [7].However, the main difference is that FPCA reallocates the spectrum of each pixel into a matrix form, based on which, a partial covariance matrix can be directly determined and accumulated for subsequent Eigenvalue decomposition and data projection.In this case, it can be more efficient and effective than PCA and SPCA.More recently, Uddin et al [8], proposed a Segmented-FPCA approach, which was superior to PCA, FPCA and SPCA.However, due to noise caused intra-class variations and high inter-class similarity, those methods still suffer from lack of robustness and limited discriminability.
Recently, a new techniques, named 1D-SSA [9], was developed for more effectively exploiting the spectral features.It can extract the trend from the original signal as well as the oscillations and noise components.By only taking the main trend and selected oscillations as features whilst abandoning the noisy components, the classification accuracy can be much improved.In an extended 2DSSA [4], spatial features can be effectively extracted for significantly improved classification accuracy.However, 1D-SSA and 2DSSA needs to be applied either to every pixel or every spectral band of the HSI, thus it is very time-consuming.To reduce the overall computational complexity whilst maintaining the classification accuracy, fast implementation of 1D-SSA and 2DSSA were also developed [10], though the overall reduction of computational cost is still very limited.Recently, a 1.5D-SSA [11] has been proposed for near real-time HSI analysis yet with a much compromised classification accuracy.
When applying the DL-based approaches to HSI, some models prominent in computer vision are modified for data classification [12].Nonconvexity is also applied into DL models for improved interpretability in complicated real-world situations.As a result, many well-performing nonconvex DL models were investigated for HSI classification [3], often via extraction of spectral and/or spatial features, where the results can be fairly good in classification of HSI.However, they may suffer from either a very high computational cost or lack of sufficient training data.This is why classical machine learning models, such as SVM, is still widely used, in combination with an effective feature extractor, which may achieve comparable performance as deep learning models in classification of HSI for land cover mapping [13].
These challenges motivate us to propose a new framework of applying the 2DSSA on the PCA domain (PCA+2DSSA, FPCA+2DSSA), resulting in improved classification accuracy yet with significantly reduced computational complexity.By fusion of FPCA and PCA with 2DSSA, we further propose Fusion+2DSSA, for more improved data storage efficiency, classification accuracy and computation cost.The main contributions are summarized below: 1) We proposed a new framework of PCA domain 2DSSA for spectral-spatial feature extraction in HSI, where the computation cost can be significantly reduced whilst improving the classification accuracy.2) In the proposed framework, three different schemes i.e., PCA+2DSSA, FPCA+2DSSA, and Fusion+2DSSA, are introduced to balance the efficiency and efficacy to satisfy various practical needs, with parameters adaptively determined for ease of implementation.
3) The superiority of our approach has been validated in two corrected HSI datasets and two uncorrected HSI datasets when benchmarked with traditional feature extraction methods and deep learning models.

II. PROPOSED APPROACH
Fig. 1 shows the workflow of the proposed method, which is composed of three main steps, i.e., spectral feature extraction and dimension reduction in HSI, 2DSSA based PCA domain spatial feature extraction, and feature fusion, followed by data classification using SVM as detailed below.

A. PCA based spectral feature mining in HSI
Given an HSI hypercube  ∈ ℜ   ×  ×  , the spectral vector of a given pixel can be denoted as where  ∈ [1, ], and  =     is the total number of pixels.The mean-adjusted vector   of   will be used to calculate the covariance matrices of PCA.
Let   ∈ ℜ × be the converted matrix where H is the number of band group and W is the band number in each band group and  =   , the covariance matrices of FPCA can be obtained by [8] For a covariance matrix, the Eigen problem can be solved by decomposing  into the multiplication of three matrices as  =   (3) where  is the diagonal matrix composed by the Eigenvalues of , and  denotes the orthonormal matrix composed by the corresponding Eigenvectors [ 1 ,  2 , … ,    ] .To reduce the dimension of spectral features, top Eigenvectors corresponding bigger Eigenvalues are selected.For PCA, we take the first   components as the spectral features of   as follows.
() =     ∈ ℜ 1×  (4) For FPCA, we take the first  ̂ components for each band group, and the spectral features of   can be derived as where the total number of components in FPCA will be   =  ̂.For convenience, the spectral feature of  can be represented as () ∈ ℜ   ×  ×  and () ∈ ℜ   ×  ×  .
For the derived trajectory matrix T, the singular value decomposition (SVD) is applied to extract the Eigenvalues  1 ≥  2 ≥ ⋯ ≥   and Eigenvectors  ∈ ℜ × .As a result, T is decomposed in  =  1 +  2 + ⋯ +   components.After that, the grouping and diagonal averaging step are applied to invert the embedding step and obtain the reconstructed image Z. Accordingly, each featured image in () and () can be represented by where M is the number of selected Eigenvalues in the SVD.
When M = S, the reconstructed image is equal to the original image.Herein, we denote ( + 2) and ( + 2) as the PCA-based spectral-spatial features and FPCAbased spectral-spatial features, respectively.For consistency, the same configuration of 2DSSA in [4] is adopted, where L=10 and only the first Eigenvalue component, M=1, i.e. the trend, is used.Although varying parameters may affect the final classification performance for different datasets, the overall difference from different configurations is estimated to be less than 1%.Therefore, the parameters L and M are set to 10 and 1 in all the experiments for simplicity.

C. Feature fusion
Applying the 2DSSA on the PCA/FPCA domains can reduce the computation cost compared to band-wise operations.On the other hand, as demonstrated in Fig. 2, the discrimination ability of features extracted from the Indian Pines dataset by PCA+2DSSA, and FPCA+2DSSA can be enhanced.In PCA, we choose   = 10.For FPCA, we have H=10, W=20 and   = 10, i.e. the 200 bands are grouped into 10 groups, and only one component is extracted from each group.
As seen in Fig. 2, low-order principal components (PCs) have smoothed the spatial features while high-order PCs are quite noisy.Applying the 2DSSA to PCs can make these noisy components usable again in the derived trend signal.This has shown the added-value of 2DSSA on the PCA domain as the extracted spatial-domain trend signal can effectively suppress the noise and enhance the discrimination ability of the spectralspatial features.On the other hand, PCA can extract the global spectral structure using a small number of low-order PCs, whilst FPCA can preserve local spectral features.As seen, PCA and FPCA features are quite supplementary to each other, which has motivated our fused solution below.
As the FPCA components are extracted from locally grouped spectral bands, they appear to be significantly smoother than those from PCA.This actually shows that FPCA is more robust to spectral noise, hence it has the potential to achieve noiserobust feature extraction and data classification in HSI, especially from the uncorrected dataset without removing the noisy and water absorption bands.On the other hand, the features extracted from FPCA seem to be more redundant, due possibly to inappropriate grouping of bands.In addition, when applying 2DSSA to FPCA components, the effect of spatial smoothing is not as strong as those on the PCA components.This actually indicates potential limitations of FPCA+2DSSA hence the need for fusion with PCA+2DSSA.
For an HSI, the obtained spectral-spatial features ( + 2) and ( + 2), can be separately used for classification of the HSI.Meanwhile, they can also be fused to form a combined feature vector, denoted as ( + 2) = { ( + 2), ( + 2)} ∈ ℜ   ×  ×(  +  ) The combined feature has a dimension of (  +   ), which can be much smaller than   , though the spatial dimension remains the same.Note that   and   here are adaptively decided as follows.For FPCA in Fusion+2DSSA, we divide each spectrum into 10 groups and select the first component of each group to form 10 combined components, i.e.,   = 10.For PCA, the   is decided based on the accumulated variance of the PCA components no less than a threshold of the total variance, and this threshold is empirically determined as 99.98% as it can help to produce particular good results for all the datasets.Accordingly, the   values for the Indian Pines and Salinas are adaptively determined as 90, and 20, respectively.To this end, the total number of combined features after the feature fusion for the Indian Pines and Salinas is 100 and 30, respectively.The detail experimental results and efficacy analysis of the PCA+2DSSA, FPCA+2DSSA and Fusion+2DSSA schemes are presented in Section III.

A. Data description
In our experiments, two publicly available HSI datasets are used for performance evaluation.The first is Indian Pines, which is collected by the AVIRIS in 1992 in the USA.This dataset is labelled in 16 land cover classes and contains 145 × 145 pixels in 220 spectral bands.The second is Salinas, also collected by AVIRIS of the Salinas Valley in California, the USA, it has 512 × 217 pixels in 224 spectral bands labelled in 16 classes.After removing 20 noisy and water absorption bands, both HSI datasets will become corrected datasets.

B. Experimental Setup
The optimal numbers of PCs for PCA, FPCA, PCA+2DSSA and FPCA+2DSSA are determined within [10,100] at a step of 10 by maximizing the KP (%).To validate the efficacy of the extracted features, a standard Support Vector Machine (SVM) classifier [14] is employed for data classification.Consequently, the radical base function (RBF) is used as the kernel for the SVM, where the cost (c) and the gamma (γ) are optimized through a grid search [7].The overall accuracy (OA), average accuracy (AA) and Kappa coefficient (KP) are used for quantitative evaluation.Each experiment was repeated 10 times, where training and testing samples are randomly selected without overlap.Average results are taken for statistical significance analysis and comparison.

C. Experimental results
The quantitative comparison between our proposed method and other benchmarking techniques on two HSI datasets is shown in Table I and Table II.ND is the number of feature dimensions.Time is the running time of each method.The best results and the second-best results are highlighted in bold and italic shading, respectively.The optimal selection of the PC number in PCA+2DSSA, FPCA+2DSSA, PCA and FPCA is decided after massive experiments.As seen, Fusion+2DSSA always leads to a higher accuracy, thanks to the strong fusion of PCA, FPCA and, making full use of local/global-spectral, and spatial information while suppressing data noise.PCA+2DSSA and FPCA+2DSSA consistently produce better results than 2DSSA, and this is because PCA and FPCA reduce the redundant information in the spectral domain making 2DSSA more effective.In contrast, absence of spatial information causes PCA, FPCA, and 1D-SSA to generate low accuracy in benchmarking approaches.In addition, PCA and 1DSSA produce worse performance than raw data in Indian Pines and Salinas, respectively.All these adverse factors reflect the importance of combining spatial and spectral features for HSI classification.Last but not the least, applying 2DSSA on PCA domain makes the computation cost much lower which reflects on the running time.It can be seen that three proposed schemes can produce faster and better classification results than 2DSSA.Among our three schemes, FPCA+2DSSA has the fastest running speed, Fusion+2DSSA has the best classification performance and PCA+2DSSA is a balanced solution.Compared with other benchmarking methods such as CCJSR [15], SuperPCA [16] and JSRC [17], our methods are more effective and efficient.

D. Comparison with deep learning methods
To further validate the efficacy of our proposed method, we also do the comparison against another 4 deep learning models [18][19][20][21] using 200 training pixels per class (Table III).To be more specific, after removing classes with fewer than 200 pixels, only nine classes are used in the Indian Pines dataset.The experimental results shown that our proposed frameworks, Fusion+2DSSA and PCA+2DSSA, can consistently yield the best and second-best OA on both datasets.In this way, the effectiveness of our approaches is validated.

E. Computational complexity
The suggested spectral-spatial fusion approach improves the efficiency of the standard 2DSSA by integrating PCA and FPCA to minimize dimensionality in the spectral domain.In this subsection, we briefly analyze the computational complexity and memory requirement of each implementation stage in Table IV and Table V.As seen the saving factor referring to 2DSSA in Table IV, applying 2DSSA on PCA domain decreases the 2DSSA band repetition process, which turns to lower computation burden.Fusion+2DSSA has slightly higher complexity than the other two, because of the fusion of both PCA and FPCA.As we only apply on the PCs, this has significantly reduced the computational cost from conventional.As shown in Table V (D.M, C.M and P.M represent the size of input data matrix, covariance matrix and projection matrix, respectively), our proposed three frameworks need slightly more memory than the 2DSSA and PCA/FPCA alone due to the fusion of the spectral and spatial processing.However, the overall memory requirement is modest, which is very close to the size of the hypercube.For Indian Pines and Salinas datasets, the memory requirements are only up to about 25M and 102M bytes respectively, a very small portion of the computer RAM at 32G or even more.This has validated the computational efficiency of the proposed method.Detailed comparison of MACs, running time and memory requirements on the two HSI datasets can be found in the supplementary material (Tables S1-S3).

IV. CONCLUSION
In this letter, a novel PCA domain 2DSSA framework is proposed, where three schemes are introduced for noise robust spectral-spatial feature extraction.By applying PCA/FPCA in the PCA domain, the computational cost of band-wise 2DSSA can be significantly reduced whilst preserving the dominant spectral information for more effective data classification in HSI.Experiments on two publicly available datasets have fully validated both the efficiency and efficacy of the proposed framework.Among our proposed schemes, FPCA+2DSSA has the lowest computation cost, yet Fusion+2DSSA can produce consistently the best classification accuracy on the corrected and uncorrected datasets when benchmarked with several stateof-the-art approaches.Besides, PCA+2DSSA has relatively a good balance between the computation cost and the classification accuracy.
With the advantages of low computational cost, high classification accuracy and robustness to noise, the proposed methods have many potential application scenarios in hyperspectral remote sensing.As the future work, superpixel segmentation and band selection will be focused for improved spatial feature extraction and dimension reduction.

1 st to 10 Fig. 2 .
th components extracted by PCA 1 st to 10 th components extracted by FPCA First dimension feature from 2DSSA on each PCA component First dimension feature from 2DSSA on each FPCA component Obtained spatial scenes from PCA, FPCA, and 2DSSA.1545-598X (c) 2021 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
H. Zhao is with School of Computer Sciences, Guangdong Polytechnic Normal University, Guangzhou, China.J. Zabalza are with Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow, UK.

TABLE V MEMORY
REQUIREMENT OF DIFFERENT METHODS IN DIFFERENT STAGES USING PCA/FPCA AND 2D-SSA (L=10, M=1, 20 PCS).