API Reference: `arc_scope.pipeline`¶

The pipeline module orchestrates the end-to-end workflow from field definition to SCOPE simulation.

For the core showcase experiment, the most relevant step functions are bridge_arc_to_scope(), build_observation_dataset(), and fetch_weather(). Those are the in-repo boundaries used before the optional prepare_scope_dataset() and run_scope_simulation() integration steps.

`PipelineConfig`¶

@dataclass
class PipelineConfig:
    # Field definition (required)
    geojson_path: PathLike
    start_date: str
    end_date: str
    crop_type: str
    start_of_season: int
    year: int

    # ARC options
    num_samples: int = 100000
    growth_season_length: int = 45
    s2_data_folder: PathLike | None = None
    data_source: str = "aws"

    # Weather options
    weather_provider: str = "era5"
    weather_config: dict[str, Any] = field(default_factory=dict)

    # SCOPE options
    scope_workflow: str = "reflectance"
    scope_root_path: PathLike | None = None
    scope_options: dict[str, Any] = field(default_factory=dict)
    device: str = "cpu"
    dtype: str = "float64"

    # Output options
    output_dir: PathLike = Path("./output")
    save_arc_npz: bool = True
    save_scope_netcdf: bool = True

    # Optimization
    optimize: bool = False
    optim_config: dict[str, Any] | None = None

Master configuration dataclass for the ARC-SCOPE pipeline.

Field Definition (required)¶

Field	Type	Description
`geojson_path`	`PathLike`	Path to GeoJSON file defining the field boundary.
`start_date`	`str`	Start date for satellite data, e.g., `"2021-05-15"`.
`end_date`	`str`	End date for satellite data.
`crop_type`	`str`	Crop type identifier, e.g., `"wheat"`, `"maize"`.
`start_of_season`	`int`	Day of year when the growth season begins.
`year`	`int`	Calendar year for the simulation period.

ARC Options¶

Field	Type	Default	Description
`num_samples`	`int`	`100000`	Number of archetype samples to generate.
`growth_season_length`	`int`	`45`	Length of growth season in days.
`s2_data_folder`	`PathLike` or `None`	`None`	Directory for caching Sentinel-2 downloads.
`data_source`	`str`	`"aws"`	S2 data source: `"aws"`, `"planetary"`, `"cdse"`, `"gee"`.

Weather Options¶

Field	Type	Default	Description
`weather_provider`	`str`	`"era5"`	Provider name: `"era5"` or `"local"`.
`weather_config`	`dict`	`{}`	Provider-specific configuration passed as kwargs to the provider constructor.

SCOPE Options¶

Field	Type	Default	Description
`scope_workflow`	`str`	`"reflectance"`	One of: `"reflectance"`, `"fluorescence"`, `"thermal"`, `"energy-balance"`.
`scope_root_path`	`PathLike` or `None`	`None`	Path to SCOPE upstream assets directory.
`scope_options`	`dict`	`{}`	Additional SCOPE options to override defaults.
`device`	`str`	`"cpu"`	PyTorch device (`"cpu"` or `"cuda"`).
`dtype`	`str`	`"float64"`	PyTorch dtype string.

Output Options¶

Field	Type	Default	Description
`output_dir`	`PathLike`	`"./output"`	Directory for saving results.
`save_arc_npz`	`bool`	`True`	Whether to save ARC outputs to NPZ.
`save_scope_netcdf`	`bool`	`True`	Whether to save SCOPE outputs to NetCDF.

Optimization Options¶

For practical fitting examples across fluorescence, thermal, and coupled energy-balance workflows, see the Optimization Guide.

Field	Type	Default	Description
`optimize`	`bool`	`False`	Run a SCOPE parameter optimisation loop before returning the final output.
`optim_config`	`dict` or `None`	`None`	Optimisation configuration. `optim_config["enabled"] = True` also enables optimisation for runner payloads that carry the flag inside the optimisation block.

When optimisation is enabled, optim_config must provide observed target data through observations (an xarray.Dataset, xarray.DataArray, or mapping) or observations_path (NetCDF). target_variables defaults to all variables in the observations dataset when omitted.

Example:

PipelineConfig(
    ...,
    scope_workflow="fluorescence",
    optim_config={
        "enabled": True,
        "observations_path": "observed_sif.nc",
        "target_variables": ["F740"],
        "parameters": [
            {
                "name": "fqe",
                "initial": 0.01,
                "lower": 0.001,
                "upper": 0.1,
                "transform": "log",
            }
        ],
        "optimizer": "scipy",
        "max_iter": 50,
    },
)

If parameters is omitted, ARC-SCOPE chooses a workflow default: fqe for fluorescence, rss/rbs for thermal, and the energy-balance preset for energy-balance.

Properties¶

resolved_scope_options -- Merges workflow defaults with user overrides:

config = PipelineConfig(..., scope_workflow="fluorescence")
config.resolved_scope_options
# {'calc_fluor': 1, 'calc_planck': 0}

scope_workflow="energy-balance" is special: ARC-SCOPE still uses the common prepared input dataset, but run_scope_simulation() now routes that workflow to the explicit coupled scope-rtm energy-balance runners instead of the generic run_scope_dataset() flag path.

`ArcScopePipeline`¶

class ArcScopePipeline:
    def __init__(self, config: PipelineConfig): ...

    def run(self) -> PipelineResult: ...
    def run_arc(self) -> ArcResult: ...
    def run_bridge(self, arc_result: ArcResult) -> tuple[xr.DataArray, xr.DataArray]: ...
    def run_weather(self) -> xr.Dataset: ...
    def run_observation(self, arc_result: ArcResult) -> xr.Dataset: ...
    def run_optimization(
        self,
        scope_input_ds: xr.Dataset,
    ) -> tuple[xr.Dataset, xr.Dataset, OptimizationResult]: ...
    def run_scope(
        self,
        post_bio_da: xr.DataArray,
        post_bio_scale_da: xr.DataArray,
        weather_ds: xr.Dataset,
        observation_ds: xr.Dataset,
    ) -> xr.Dataset: ...

End-to-end pipeline from field definition to SCOPE simulation.

`run()`¶

Execute the full pipeline: ARC -> Bridge -> Weather -> SCOPE.

Returns a PipelineResult containing all intermediate and final outputs. When optimize=True or optim_config["enabled"] = True, the final scope_input_ds and scope_output_ds are the optimised products, and optimization_result records the initial/final losses, parameter values, optimiser, convergence flag, and target variables. The output dataset also carries arc_scope_optimization_* attrs for downstream manifests.

`run_arc()`¶

Run ARC retrieval step only. Returns an ArcResult.

`run_bridge(arc_result)`¶

Convert ARC outputs to SCOPE format. Returns (post_bio_da, post_bio_scale_da).

`run_weather()`¶

Fetch weather data for the configured field and time range. Returns an xr.Dataset.

`run_observation(arc_result)`¶

Build the observation geometry dataset. Returns an xr.Dataset.

`run_optimization(scope_input_ds)`¶

Run the configured optimisation loop against observed target variables, inject the optimised parameter values into the prepared SCOPE input dataset, and run the final SCOPE simulation. Raises ValueError if optimisation is enabled without observed target data or if requested targets are absent.

`run_scope(post_bio_da, post_bio_scale_da, weather_ds, observation_ds)`¶

Prepare and run SCOPE from bridge outputs + weather + observations. Returns an xr.Dataset.

`PipelineResult`¶

@dataclass
class PipelineResult:
    arc_result: ArcResult | None = None
    post_bio_da: xr.DataArray | None = None
    post_bio_scale_da: xr.DataArray | None = None
    weather_ds: xr.Dataset | None = None
    observation_ds: xr.Dataset | None = None
    scope_input_ds: xr.Dataset | None = None
    scope_output_ds: xr.Dataset | None = None
    optimization_result: OptimizationResult | None = None

Container for full pipeline results. All fields are None until their corresponding step completes.

`OptimizationResult`¶

@dataclass
class OptimizationResult:
    status: str
    target_variables: list[str]
    initial_loss: float
    optimized_loss: float
    parameters_initial: dict[str, float]
    parameters_optimized: dict[str, float]
    optimizer: str
    converged: bool
    metadata: dict[str, Any] = field(default_factory=dict)

Summary of a completed SCOPE parameter optimisation.

Step Functions¶

Composable building blocks for users who want partial pipelines. Defined in arc_scope.pipeline.steps:

`retrieve_arc(config)`¶

Run ARC biophysical parameter retrieval. Requires ARC installed separately: pip install git+https://github.com/MarcYin/ARC.

`bridge_arc_to_scope(arc_result, year)`¶

Convert ARC retrieval outputs to SCOPE-compatible DataArrays.

`build_observation_dataset(doys, year, geojson_path, ...)`¶

Build an observation geometry dataset with solar zenith/azimuth angles computed from field location and overpass time.

Optional keyword arguments: viewing_zenith (default 0.0), viewing_azimuth (default 0.0), overpass_hour (default 10.5 for Sentinel-2).

`fetch_weather(config, time_range=None)`¶

Fetch weather data using the configured provider ("era5" or "local").

`prepare_scope_dataset(post_bio_da, post_bio_scale_da, weather_ds, observation_ds, config)`¶

Merge all inputs into a runner-ready SCOPE dataset. Requires scope-rtm.

`run_scope_simulation(scope_dataset, config)`¶

Execute the SCOPE simulation. Requires scope-rtm and torch.

reflectance, fluorescence, and thermal use the standard run_scope_dataset() dispatch.
thermal remains the prescribed-temperature thermal radiance workflow.
energy-balance now runs the coupled fluorescence and coupled thermal energy-balance branches explicitly and merges their outputs into one dataset.

`ArcResult`¶

@dataclass
class ArcResult:
    scale_data: np.ndarray
    post_bio_tensor: np.ndarray
    post_bio_unc_tensor: np.ndarray
    mask: np.ndarray
    doys: np.ndarray
    geotransform: np.ndarray | None = None
    crs: Any = None

Container for ARC retrieval outputs.

API Reference: arc_scope.pipeline¶

PipelineConfig¶

Field Definition (required)¶

ARC Options¶

Weather Options¶

SCOPE Options¶

Output Options¶

Optimization Options¶

Properties¶

ArcScopePipeline¶

run()¶

run_arc()¶

run_bridge(arc_result)¶

run_weather()¶

run_observation(arc_result)¶

run_optimization(scope_input_ds)¶

run_scope(post_bio_da, post_bio_scale_da, weather_ds, observation_ds)¶

PipelineResult¶

OptimizationResult¶