Back to Resources
ArticleJuly 4, 20261 min read

Evaluating Deep Learning Spectral Unmixing From Pure Reference Spectra

A deep learning model trained only on synthetic mixtures — generated from pure reference spectra — outperforms classical solvers on four- and five-material mixtures across nine sensors. The benchmark: 325 real clay powder mixtures measured by lab spectrometers, pushbroom cameras, snapshot cameras, MWIR, and RGB.

spectral unmixingdeep learninghyperspectral imagingclay mineralsabundance estimationsynthetic training data
Ahmed Sigiuk
Author

Ahmed Sigiuk

Senior Deep Learning Engineer

Clay endmember spectral unmixing - header

Spectral unmixing answers a more useful question than simple classification: not only what material is present, but how much of each material is present.

That matters because real samples are rarely clean. A pixel can contain several minerals, coatings, contaminants, soils, plastics, or plant materials at once. In those cases, assigning one label to the whole pixel loses the information the sensor was meant to capture.

The practical challenge is training. Real mixtures with known proportions are difficult to collect at scale. Pure reference spectra are much easier to measure and maintain. In this study, we evaluate whether a deep learning (DL) model can start from those pure spectra, generate synthetic training mixtures, and then perform well on real measured mixtures.

The clearest result appears as the number of materials increases (Table 5, Figure 2). Classical methods are strong on simple two-material mixtures. By four- and five-material mixtures, the DL model beats the best classical solver on every sensor we tested.

Benchmark Dataset

We used the public multisensor intimate-mixture benchmark (Table 1). It is a useful test case because the same physical mixtures were measured by several instruments, while the ground-truth material proportions remain the same. That lets us compare methods across sensors without changing the underlying samples.

The mixtures are made from five clay-related endmembers: Kaolin, Roof clay, Red clay, Mixed clay, and Calcium hydroxide. Each real mixture has known mass-fraction abundances, so the evaluation measures how close each method gets to the true proportions. The clay spectra intentionally overlap, especially across the clay materials, but they also contain useful separation in overall spectral shape and in absorption regions around 1400 nm, 1900 nm, and 2100-2500 nm (Figure 1).

Endmember spectra — VNIR–SWIR from ASD spectroradiometer and MWIR from Specim FX50

Figure 1. Measured pure endmember spectra: VNIR–SWIR from the ASD spectroradiometer (350–2500 nm) and MWIR from the Specim FX50 (2708–5279 nm).

ItemDescription
DatasetMultisensor intimate-mixture benchmark
Real mixture samples325 measured mixtures, plus pure reference spectra
EndmembersKaolin, Roof clay, Red clay, Mixed clay, Calcium hydroxide
Evaluation targetPredict the abundance of each endmember in each real mixture

Table 1. Summary of the benchmark dataset used for evaluation.

The 325 real mixtures are split by the number of materials present in each sample (Table 2):

Mixture orderMeaningSamples
Binary (k=2)Two materials present60
Ternary (k=3)Three materials present150
Quaternary (k=4)Four materials present100
Quinary (k=5)Five materials present15

Table 2. Breakdown of real mixture samples by mixture order.

The full benchmark includes 13 sensors across visible, near-infrared, short-wave infrared, mid-wave infrared, and long-wave infrared ranges. Our experiments use the nine sensors in Table 3, spanning lab spectrometers, pushbroom cameras, snapshot cameras, MWIR, VNIR, SWIR, and RGB:

SensorBandsRange (nm)Type / region
ASD spectroradiometer2151350–2500Point spectrometer, VNIR-SWIR
PSR-35001024345–2504Point spectrometer, VNIR-SWIR
Specim AisaFenix450378–2504Pushbroom camera, VNIR-SWIR
Specim FX503082708–5279MWIR camera
Specim sCMOS238398–1001Pushbroom camera, VNIR
Cubert Ultris X20P164350–1002Snapshot camera, VNIR
IMEC1001120–1675Snapshot camera, SWIR
Senops HSC250500–900Snapshot camera, VNIR
JAI RGB3440–630RGB camera

Table 3. The nine sensors used in our experiments.

Modeling Approach

Both the DL model and the classical baselines start from the same reference information: the five pure endmember spectra. What differs is how each one uses mixing methods. The DL model uses a mixing function only during training, both to generate synthetic data and inside the loss objective that reconstructs the mixed spectrum from predicted abundances, while the classical methods apply their mixing model directly to each real mixture at evaluation.

DL model: synthetic training

For each sensor, the DL model starts from five pure clay spectra, one measured reference per material. We turn those into synthetic training mixtures by sampling abundance combinations and mixing the pure spectra with a nonlinear PPNM recipe. The network learns from these pairs: the sampled abundances are the labels, and a reconstruction loss rebuilds the mixed spectrum from the predicted abundances using the same mixing function. The recipe is a training tool only; it is never applied to real mixtures.

We chose a nonlinear recipe based on experiment. Testing one linear and several nonlinear recipes for generating the synthetic mixtures, each with a matched reconstruction loss, the PPNM recipe was the strongest and most consistent across sensors. On the binary and ternary mixtures, for example, it lowered RMSE from about 0.133 (linear) to about 0.106.

This matches the physics of the dataset. The samples are intimate powder mixtures, not spatially separated patches, and the powders were sieved below 200 µm. At that grain size, light scatters through multiple particles before reaching the sensor, so the measured reflectance is not a simple linear sum of pure spectra. A nonlinear recipe approximates that behavior more closely, while the real measured mixtures are held out for evaluation.

Classical methods: inverse solvers

Classical methods work in the opposite direction. Each is an inverse solver: given a real mixed spectrum and the pure endmember spectra, it estimates the abundances that best reproduce the observation under its mixing assumption. There is no training step, the solver runs directly on each real mixture. We use FCLSU, PPNM, and MLM (Table 4) and report the strongest result in each setting.

Classical solverShort description
FCLSU (Fully Constrained Least Squares Unmixing)Linear unmixing with non-negative abundances that sum to one; equivalent to MCR-ALS (Multivariate Curve Resolution - Alternating Least Squares) with equality, non-negativity, and closure constraints.
PPNM (Polynomial Post-Nonlinear Mixing)Adds a nonlinear correction after linear mixing.
MLM (Multilinear Mixing)Accounts for multiple-scattering interactions between materials.

Table 4. Classical inverse solvers used as evaluation baselines.

Results

The main question is whether a model trained only on synthetic mixtures can handle real samples as they become more complex. A two-material mixture is the simplest case; four- and five-material mixtures are closer to real operating conditions, where several materials contribute to the same spectrum.

The trend in Table 5 is consistent across the whole sensor panel. On two-material mixtures the classical solvers are ahead almost everywhere, and the DL model is better on only one sensor. As materials are added the DL model pulls ahead: it leads on most sensors at three materials, and by four and five materials it has the lower RMSE on every sensor. Pooled across all mixture orders, the DL model is better on all nine.

SensorBands k=2 DLk=2 classical k=3 DLk=3 classical k=4 DLk=4 classical k=5 DLk=5 classical
ASD21510.1070.086 (PPNM)0.1120.116 (PPNM)0.0990.130 (PPNM)0.0730.120 (PPNM)
PSR-350010240.1090.104 (PPNM)0.1050.109 (MLM)0.0860.108 (MLM)0.0480.085 (MLM)
AisaFenix4500.2200.184 (MLM)0.1760.189 (MLM)0.1370.188 (MLM)0.1150.219 (MLM)
FX503080.1620.113 (MLM)0.1410.132 (MLM)0.1460.157 (PPNM)0.1760.190 (PPNM)
Specim sCMOS2380.1800.138 (FCLSU)0.1490.143 (FCLSU)0.1220.161 (FCLSU)0.0840.182 (FCLSU)
Cubert1640.2930.334 (FCLSU)0.1960.288 (MLM)0.1640.284 (MLM)0.0920.212 (PPNM)
IMEC1000.2240.160 (MLM)0.1790.184 (FCLSU)0.1350.217 (FCLSU)0.0900.246 (PPNM)
Senops HSC2500.2400.231 (PPNM)0.1960.240 (PPNM)0.1560.238 (PPNM)0.1210.224 (PPNM)
JAI RGB30.2690.196 (MLM)0.1990.209 (FCLSU)0.1400.196 (PPNM)0.0790.169 (PPNM)
Average (9 sensors)0.2000.1720.1610.1790.1320.1870.0980.183

Table 5. Abundance RMSE per sensor and mixture order (DL and best classical in separate columns). Bold is the better (lower) value; the winning classical method is in parentheses. The final row averages the nine sensors.

Figure 2 shows the same pattern as a heatmap of the gap (DL minus best-classical): orange where classical is ahead, blue where the DL model is ahead. It shifts from mostly orange at k=2 to consistently blue at k=5.

DL model gap by sensor and mixture order — heatmap

Figure 2. DL model advantage by sensor and mixture order (DL RMSE minus best-classical RMSE). Negative (blue) means the DL model is better.

The same per-sensor checkpoint is evaluated across k=2 through k=5; there is no separate model per material count. One caveat: the k=5 set is small (15 samples), so treat the quinary column as indicative, though the trend is consistent and already clear at k=3 and k=4.

Discussion

The main takeaway is not simply that the DL model wins more often at higher mixture orders. The more useful point is why that matters operationally: the model starts from pure spectra, which many teams can realistically measure and curate, rather than requiring a large labeled library of real mixtures. In many applications, collecting clean reference spectra is much easier than creating hundreds of physical mixtures with known proportions.

The mixture-order result in Table 5 and Figure 2 is also consistent with how the problem changes as samples become more complex. A two-material mixture is relatively constrained: only a few components are active, and a classical solver has fewer degrees of freedom to resolve. As more materials contribute to the same measured spectrum, the signatures overlap more, several abundance values become nonzero, and small modeling errors can spread across more components. That is where a method trained across many possible abundance combinations has more room to help.

This is the advantage of turning the pure spectral library into synthetic training data. A classical solver applies one chosen mathematical assumption directly to the measured spectrum. The DL approach can generate spectra across many material proportions, and it can be tested under different physical mixing assumptions when needed. The model is not just using the pure spectra as fixed endmembers; it is learning from a wider mixture space before being evaluated on real samples.

Classical methods still play an important role in that evaluation. FCLSU, PPNM, and MLM make different assumptions, and the strongest one differs by sensor and mixture order; in our runs no single classical solver was best everywhere, and all three won on at least one sensor. Treating "classical" as one fixed baseline would hide that variation. A useful workflow should compare the DL model against multiple classical methods, then let the data show where each approach is reliable.

This is where the Clarity platform fits naturally. Teams can import sensor data, inspect spectral signatures, manage reference libraries, generate synthetic spectra and training mixtures, train abundance models, and compare model iterations against classical methods in one workflow. As new sensors, materials, or modeling choices are added, the workflow can be repeated and modified instead of rebuilt from scratch.

Clarity

Talk to engineering about your hyperspectral workflow.

Share your data, target classes, and deployment constraints so we can scope the right workflow path.

Keep Reading

Related Resources

More product thinking and reference material from the Metaspectral team.