Evaluation of Polytopic Vector Analysis (PVA) Performance for Forensic Source Identification at Multi-Source Site.
May 3, 2021
TIG contributed presentations to this year’s SETAC EU virtual conference in the session Environmental Forensics - State of the Science and Global Applications. Nicholas D. Rose, Carlo Monti, and Timothy Negley shared a presentation on "Evaluation of Polytopic Vector Analysis (PVA) Performance for Forensic Source Identification at Multi-Source Sites".
Introduction. Unmixing models, including polytopic vector analysis (PVA), are commonly used in environmental forensics to determine the source composition and the proportion of each source in a sample. To our knowledge, the sensitivity of these models has not been tested for environmental samples. The goal of this work is to utilize artificially generated datasets representative of real-world environmental conditions to determine the sensitivity of PVA to various environmental sampling conditions including the seeding method, number of samples, noise in the dataset, presence of non-detects, and number of potential sources.
Materials and Methods. We developed an R-based implementation of PVA that was validated against the MATLAB version developed previously. We also developed an algorithm that generates a synthetic population of samples with sources of polychlorinated biphenyls (PCBs) taken from real-world environmental contaminant sources (such as Aroclor standards) and sampled this population. This algorithm was used to generate many different potential sampling events and introduce variability commonly encountered in environmental datasets (including small to large sample sizes and noise due to variation in sampling and analytical methods). These sampling events were then evaluated using PVA with four different seeding methods (EXRAWC, Fuzzy C-means clustering [FuzzyQ], nonnegative singular value decomposition [NNSVD], and externally provided sources) to estimate the composition of each source and the proportion of each source contributing to each sample. The results obtained from the PVA analysis were compared to the actual source composition and sample proportion used to generate the synthetic dataset to assess the sensitivity.
Results and Discussion. Our analysis of the data found that the choice of seeding method can influence whether the PVA algorithm converges, with EXRAWC and NNDSVD converging more often than Fuzzy Q or the external method. In addition, the FuzzyQ method had higher sample and source MSEs than the other methods in certain circumstances. This analysis also showed that increasing noise resulted in an increase of Sample MSE and Source MSE and that the lowest Sample MSE and Source MSE occur when 50 to 100 samples were collected. Based on this analysis, we recommend running PVA using EXRAWC or NNDSVD on between 50 to 100 samples with relatively low noise.