Core Concepts

CCRP is built on several fundamental concepts that guide its design and implementation. Understanding these concepts is essential for both implementing and using CCRP effectively.

RESTful Design and Discoverability

CCRP follows REST principles with an emphasis on discoverability. Every response includes links to related resources, allowing clients to navigate the API without hard-coded URLs.

Self-Describing Resources

Starting from the root endpoint, clients can discover:

Available datasets
Supported capabilities (conformance classes)
Related endpoints and their purposes

{
  "conformsTo": [
    "https://ccrp.io/spec/v1/conformance/core",
    "https://ccrp.io/spec/v1/conformance/range-requests"
  ],
  "links": [
    {
      "href": "/dataset",
      "rel": "https://ccrp.io/spec/v1/rel/datasets",
      "type": "application/json",
      "title": "Available datasets"
    }
  ]
}

HATEOAS (Hypermedia as the Engine of Application State)

Clients navigate through the API by following links, not by constructing URLs. Each dataset includes a link to its data endpoint:

{
  "id": "temperature-global",
  "links": [
    {
      "href": "/dataset/temperature-global/data",
      "rel": "https://ccrp.io/spec/v1/rel/data",
      "type": "multipart/mixed"
    }
  ]
}

Conformance-Based Architecture

CCRP uses conformance classes to advertise capabilities. This allows implementations to start simple and add features incrementally.

Conformance Classes

Each conformance class is identified by a URI and defines a set of requirements:

Core: Basic query and retrieval capabilities
Standard: Recommended extensions for enhanced functionality
Advanced: Optional extensions for complex use cases

Capability Discovery

Clients can determine server capabilities programmatically:

GET /conformance

{
  "conformsTo": [
    "https://ccrp.io/spec/v1/conformance/core",
    "https://ccrp.io/spec/v1/conformance/range-requests"
  ]
}

This allows clients to adapt their behavior based on available features.

Coordinate-Based Queries

CCRP operates on logical coordinates, not physical storage details. This abstraction is key to its flexibility.

Logical vs Physical

When you query:

GET /dataset/ocean-temp/data?time[gte]=2024-01&time[lt]=2024-02&depthi[gte]=0&depth[lt]=100

You're using:

Logical coordinates: January 2024, depths 0-100m
Not physical indices: chunks [15,0,0] through [17,10,5]

CCRP handles the translation.

Chunk Boundary Expansion

CCRP returns complete chunks that overlap your query region. If chunks contain depths 0-50m and 50-100m, and you request depth[gte]=25&depth[lt]=75, you'll receive both complete chunks (0-100m of data).

This behavior is predictable and simple:

The server expands to chunk boundaries
The client extracts the exact subset needed
No partial chunk complexity

Version Pinning and Reproducibility

For datasets that use versioning systems like Iceberg or Icechunk, CCRP provides mechanisms for reproducible access.

Optional Version Specification

For datasets with version control, queries can specify a version:

GET /dataset/weather/data?time=2024-01&version=v20240315

Or use the latest:

GET /dataset/weather/data?time=2024-01

Note: Version support is only available for datasets using transactional storage layers like Apache Iceberg or Icechunk for Zarr.

Version Resolution

When no version is specified for a versioned dataset, CCRP:

Resolves to the latest version
Returns the resolved version in a header
Allows subsequent requests to use this version for consistency

GET /dataset/weather/data?time=2024-01
→ CCRP-Resolved-Version: v20240320

This enables both convenience and reproducibility.

Dataset Abstraction

CCRP provides a uniform interface across different storage formats while preserving format-specific metadata.

Format Transparency

The same query pattern works for:

Zarr arrays with dimensions
Iceberg tables with columns
Future formats that organize data in chunks

Native Metadata Preservation

Each dataset returns its complete native metadata:

Zarr: Full zarr.json with array shapes, dtypes, codecs
Iceberg: Complete metadata.json with schema and partitioning

This allows format-aware clients to interpret the data correctly while keeping CCRP format-agnostic.

Two-Phase Protocol

CCRP supports both simple and sophisticated access patterns through its two-phase protocol.

Simple Access

For basic use cases, a single GET request suffices:

GET /dataset/temperature/data?lat[gte]=30&lat[lt]=40&lon[gte]=-120&lon[lt]=-110
→ 200 OK [data bytes]

Advanced Access with Planning

For large queries or parallel downloads:

HEAD request returns the total size and an ETag:

HEAD /dataset/temperature/data?lat[gte]=30&lat[lt]=40&lon[gte]=-120&lon[lt]=-110
→ Content-Length: 5368709120
→ ETag: "abc123"

Multiple GET requests with byte ranges:

GET /dataset/temperature/data?lat[gte]=30&lat[lt]=40&lon[gte]=-120&lon[lt]=-110
Range: bytes=0-1073741823
If-Match: "abc123"

This enables:

Parallel downloads for maximum throughput
Progress tracking
Resumable transfers
Consistent multi-request sessions

The ETag ensures all requests reference the same query plan, preventing inconsistencies.

These concepts work together to create a simple yet powerful protocol for chunked data access. By hiding complexity while preserving flexibility, CCRP makes cloud data as accessible as local files.

RESTful Design and Discoverability​

Self-Describing Resources​

HATEOAS (Hypermedia as the Engine of Application State)​

Conformance-Based Architecture​

Conformance Classes​

Capability Discovery​

Coordinate-Based Queries​

Logical vs Physical​

Chunk Boundary Expansion​

Version Pinning and Reproducibility​

Optional Version Specification​

Version Resolution​

Dataset Abstraction​

Format Transparency​

Native Metadata Preservation​

Two-Phase Protocol​

Simple Access​

Advanced Access with Planning​