Internal Build Pipeline¶
This document is an internal reference for the retained SIAC build pipeline in
this repository. It is not part of the public spectral-library package
contract.
Public package installation and mapping usage are documented in
mapping_quickstart.md.
Purpose¶
The internal pipeline:
- fetches or reuses raw source datasets,
- normalizes them onto the shared
400-2500 nm/1 nmgrid, - applies source-aware filtering and repair stages,
- exports the canonical SIAC package used as mapping input.
Canonical Build Roots¶
Current retained build roots:
- pipeline root:
build/real_siac_pipeline_full_raw - final SIAC package:
build/siac_spectral_library_real_full_raw_no_ghisacasia_no_understory_no_santa37
Repository Paths¶
Important internal paths:
src/spectral_library/sources/source manifests, fetchers, fetch batching, and catalog assemblysrc/spectral_library/normalization/normalization pipeline, coverage filtering, quality plots, and SIAC package exportsrc/spectral_library/distribution/runtime download helperssrc/spectral_library/mapping/public mapping runtime, prepared-runtime build, and retrieval enginemanifests/sources.csvcurated source inventory, fetch adapters, source status, and notesmanifests/siac_excluded_spectra.csvexport-time spectrum exclusionsscripts/build_real_siac_library_from_scratch.pycanonical raw-to-SIAC build driverscripts/build_real_siac_library.pycache-first wrapper around the canonical driverscripts/source-specific repair, filtering, plotting, and review utilitiesbuild/cached source trees, pipeline artifacts, QA outputs, and SIAC exports
Canonical End-To-End Command¶
MPLCONFIGDIR=build/.mplconfig PYTHONPATH=src python3 scripts/build_real_siac_library_from_scratch.py \
--fallback-raw-roots build/local_sources_full_raw,build/local_sources_vegetation_all,build/local_sources \
--raw-sources-root build/local_sources_full_raw \
--pipeline-root build/real_siac_pipeline_full_raw \
--output-root build/siac_spectral_library_real_full_raw_no_ghisacasia_no_understory_no_santa37
Pipeline Stages¶
The current retained pipeline is organized as:
- source acquisition and cache reuse
- raw normalization
- coverage filter
- EMIT and Santa artifact repair
- GHISACASIA deep-band and tail repair
- global visible-band outlier repair
- source-specific deep-absorption smoothing
- native range filter
- vegetation outlier repair
- vegetation water-band spike repair
- curated subset filter
- remaining source-artifact repair
- landcover analysis and QA
- library package export
- suspicious-spectrum review
Representative internal commands used by those stages:
spectral-library-internal normalize-sourcesspectral-library-internal filter-coverage --min-coverage 0.8spectral-library-internal plot-qualityspectral-library-internal build-library-package
Representative repair and review scripts:
scripts/repair_emit_santa_artifacts.pyscripts/repair_ghisacasia_artifacts.pyscripts/robust_visible_outlier_fix.pyscripts/curate_source_absorption_rules.pyscripts/filter_by_native_range.pyscripts/repair_vegetation_outliers.pyscripts/repair_vegetation_water_band_spikes.pyscripts/filter_curated_subset_rules.pyscripts/repair_remaining_source_artifacts.pyscripts/landcover_analysis.py
QA Outputs¶
Current top-level QA locations in the retained build:
- normalized QA:
build/real_siac_pipeline_full_raw/11_source_artifacts_fixed/plots/quality - landcover QA:
build/real_siac_pipeline_full_raw/11_source_artifacts_fixed/landcover_analysis/plots - SIAC package plots:
build/siac_spectral_library_real_full_raw_no_ghisacasia_no_understory_no_santa37/plots - suspicious-spectrum review:
build/siac_spectral_library_real_full_raw_no_ghisacasia_no_understory_no_santa37/full_review
Notes¶
- The internal build pipeline remains repository-specific and may change without the compatibility guarantees applied to the public mapping package.
- The public prepared-runtime contract begins after SIAC export, at
build-mapping-library.