Advanced Search+
Yihan Lv, Weiran Song, Zongyu Hou, Zhe Wang. Incorporating empirical knowledge into data-driven variable selection for quantitative analysis of coal ash content by laser-induced breakdown spectroscopy[J]. Plasma Science and Technology. DOI: 10.1088/2058-6272/ad370c
Citation: Yihan Lv, Weiran Song, Zongyu Hou, Zhe Wang. Incorporating empirical knowledge into data-driven variable selection for quantitative analysis of coal ash content by laser-induced breakdown spectroscopy[J]. Plasma Science and Technology. DOI: 10.1088/2058-6272/ad370c

Incorporating empirical knowledge into data-driven variable selection for quantitative analysis of coal ash content by laser-induced breakdown spectroscopy

  • Nowadays, laser-induced breakdown spectroscopy (LIBS) has become a widely used atomic spectroscopic technique for rapid coal analysis. While the vast spectral information in LIBS contains signal uncertainty, which can impact its quantification performance. In this work, we proposed a hybrid variable selection method to improve the performance of LIBS quantification. Important variables are first identified using Pearson’s correlation coefficient (PCC), mutual information (MI), least absolute shrinkage and selection operator (LASSO) and random forest (RF), and then filtered and combined with empirical variables related to fingerprint elements of coal ash content. Subsequently, these variables are fed into partial least squares regression (PLSR). Additionally, in some models, certain variables unrelated to ash content were removed manually to study the variable deselection’s impact on model performance. The proposed hybrid strategy was tested on three LIBS datasets for quantitative analysis of coal ash content and compared with the corresponding data-driven baseline method. It is significantly better than the variable selection only based on empirical knowledge and in most cases outperforms the baseline method. The results showed that on all three datasets, hybrid strategy for variable selection combining empirical knowledge and data-driven algorithms achieved the lowest RMSEP values of 1.605, 3.478 and 1.647, respectively, which were significantly lower than those obtained from multiple linear regression using only 12 empirical variables, which are 1.959, 3.718 and 2.181, respectively. The EMP-LASSO-PLSR model with 20 selected variables exhibited a significant improving performance after variable deselection, with RMSEP values dropping from 1.635, 3.962, 1.647 to 1.483, 3.086, 1.567, respectively. Such results demonstrate that using empirical knowledge as a support to data-driven variable selection can be a viable approach to improve the accuracy and reliability of LIBS quantification.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return