Data access overview

It is possible to instantiate data structures programmatically in memory. But more often, data are read from files or other kinds of data stores. There is different ways to access those data, but an easy way is to use the DataStores.open(Object) convenience method. The method argument can be a path to a data file (File, Path, URL, URI), a stream (Channel, DataInput, InputStream, Reader), a connection to a data base (DataSource, Connection) or other kinds of object specific to the data source. The DataStores.open(Object) method detects data formats and returns a DataStore instance for that format.

DataStore functionalities depend on the kind of data (coverage, feature set, time series, etc.). But in all cases, there is always some metadata that can be obtained. Metadata allows to identify the phenomenon or features described by the data (temperature, land occupation, etc.), the geographic area or temporal period covered by the data, together with their resolution. Some rich data source provides also a data quality estimation, contact information for the responsible person or organization, legal or technical constraints on data usage, the history of processing apply on the data, expected updates schedule, etc.

Various data formats have their own metadata model, but Apache SIS translates all of them in a unique metadata model in order to hide this heterogeneity. This pivot model approach is often used by various libraries, with Dublin Core as a popular choice. For Apache SIS, the chosen pivot model is the ISO 19115 international standard. This model organizes metadata in a tree structure where each information is accessible by a well-defined path, regardless the origin of that information. For example if a data format can provides a geographic bounding box encompassing all data, then that information will always be accessible (regardless the data format) from the root Metadata object under the identification­Info node, extent sub-node, geographic­Element sub-node.

Example: following code read a metadata file from a Landsat-8 image and prints the declared geographic bounding box:

try (DataStore store = DataStores.open(new File("LC81230522014071LGN00_MTL.txt"))) {
    Metadata overview = store.getMetadata();

    // Convenience method for fetching the geographic bounding box at the right location in metadata tree.
    GeographicBoundingBox bbox = Extents.getGeographicBoundingBox(overview);

    System.out.println("The geographic bounding box is:");
    System.out.println(bbox);
}

This example produces the following output (this area is located in Vietnam):

The geographic bounding box is:
Geographic Bounding Box
  ├─West bound longitude…………………………… 108°20′10.464″E
  ├─East bound longitude…………………………… 110°26′39.66″E
  ├─South bound latitude…………………………… 10°29′59.604″N
  └─North bound latitude…………………………… 12°37′25.716″N

The ISO 19115 standard defines hundreds of elements. Some of them will be introduced progressively in next chapters. But in order to give some idea about what is available, the following table lists a few metadata elements. Most of the nodes accept an arbitrary number of values. For example the extent node may contain many geographic areas.

Extract of a few metadata elements from ISO 19115
Element Description
Metadata Metadata about a dataset, service or other resources.
  ├─Reference system info Description of the spatial and temporal reference systems used in the dataset.
  ├─Identification info Basic information about the resource(s) to which the metadata applies.
  │   ├─Citation Name by which the cited resource is known, reference dates, presentation form, etc.
  │   │   └─Cited responsible party Role, name, contact and position information for individuals or organizations that are responsible for the resource.
  │   ├─Topic category Main theme(s) of the resource (e.g. farming, climatology, environment, economy, health, transportation, etc.).
  │   ├─Descriptive keywords Category keywords, their type, and reference source.
  │   ├─Spatial resolution Factor which provides a general understanding of the density of spatial data in the resource.
  │   ├─Temporal resolution Smallest resolvable temporal period in a resource.
  │   ├─Extent Spatial and temporal extent of the resource.
  │   ├─Resource format Description of the format of the resource(s).
  │   ├─Resource maintenance Information about the frequency of resource updates, and the scope of those updates.
  │   └─Resource constraints Information about constraints (legal or security) which apply to the resource(s).
  ├─Content info Information about the feature catalog and describes the coverage and image data characteristics.
  │   ├─Imaging condition Conditions which affected the image (e.g. blurred image, fog, semi darkness, etc.).
  │   ├─Cloud cover percentage Area of the dataset obscured by clouds, expressed as a percentage of the spatial extent.
  │   └─Attribute group Information on attribute groups of the resource.
  │       ├─Content type Types of information represented by the values (e.g. thematic classification, physical measurement, etc.).
  │       └─Attribute Information on an attribute of the resource.
  │           ├─Sequence identifier Unique name or number that identifies attributes included in the coverage.
  │           ├─Peak response Wavelength at which the response is the highest.
  │           ├─Min/max value Minimum/maximum value of data values in each sample dimension included in the resource.
  │           ├─Units Units of data in each dimension included in the resource.
  │           └─Transfer function type Type of transfer function to be used when scaling a physical value for a given element.
  ├─Distribution info Information about the distributor of and options for obtaining the resource(s).
  │   ├─Distribution format Description of the format of the data to be distributed.
  │   └─Transfer options Technical means and media by which a resource is obtained from the distributor.
  ├─Data quality info Overall assessment of quality of a resource(s).
  ├─Acquisition information Information about the acquisition of the data.
  │   ├─Environmental conditions Record of the environmental circumstances during the data acquisition.
  │   └─Platform General information about the platform from which the data were taken.
  │       └─Instrument Instrument(s) mounted on a platform.
  └─Resource lineage Information about the provenance, sources and/or the production processes applied to the resource.
      ├─Source Information about the source data used in creating the data specified by the scope.
      └─Process step Information about events in the life of a resource specified by the scope.

Among metadata elements introduced in this chapter, there is one which will be the topic of a dedicated chapter: reference­System­Info. Its content is essential for accurate data positioning; without this element, even positions given by latitudes and longitudes are ambiguous. Reference systems have many characteristics that make them apart from other metadata: they are immutable, cannot be handled by MetadataStandard.ISO_19115.asValueMap(…), have a particular text representation and are associated to an engine performing coordinate transformation from one reference system to another.