MatrixLibrary

Specificities of a matrix library for GIS

GIS make an extensive usage of matrices for displaying maps or for transforming coordinates. There is many excellent open source or commercial matrix libraries available. However, GIS have some specific needs that differ a little bit from the goals of many existent libraries. Matrix operations like those described in the affine transform chapter appear in almost all coordinate operations applied by Apache SIS. But the analysis of those operations reveal some patterns:

Those matrices are almost always of small size, rarely more than 5 rows and 5 columns.
“Heavy” matrix operations (matrix inversions or multiplications) do not happen in performance-critical code. In almost every cases, those heavy operations are executed only once, for example when a data file is read or in preparation steps before to transform coordinates. Those heavy operations rarely happen in the loop that apply the coordinate operation after we finished to prepare it.
In a sequence of matrix multiplications or inversions, rounding errors accumulate and grow fast. If the sequence of matrix operations is long enough, rounding errors can become indistinguishable from real operations like datum shifts. This ambiguity can happen because the matrix representation of some datum shifts has small values, with scale factors of a few parts per million and rotation terms of a few arc-seconds.
It is quite common that two affine transforms cancel each other when they are concatenated, i.e. that matrix multiplications result in some or all scale factors equal to 1 and some or all translation terms equal to 0. However because of rounding errors the results are rarely exact, but are rather some values like 0,9999…97 instead of 1. The usual workaround is to compare floating point numbers with some tolerance threshold. Unfortunately it is difficult to choose a threshold that catch a wide range of rounding errors without wrongly considering legitimate datum shifts as rounding errors (see previous point).

As a consequence of above points, accuracy of a matrix library is more important than performance for a GIS like Apache SIS. Paradoxically, a good way to improve performance is to invest more CPU time for more accurate matrix operations when preparing (not executing) the coordinate operation, because it increases the chances to correctly detect which operations cancel each other. This investment can save execution time at the place where it matters most: in the code looping over millions of coordinates to transform.

However matrix libraries are often designed for high performances with large matrices, sometimes containing thousands of rows and columns. Those libraries can efficiently resolve systems of linear equations with hundreds of unknown variables. Those libraries resolve difficult problems, but not of the same kind than the problems that Apache SIS needs to solve. For that reason, and also for another reason described in next paragraphs, Apache SIS uses its own matrix implementation. This implementation addresses the accuracy issue by using “double-double” arithmetic (a technic for simulating the accuracy of approximately 120 bits wide floating point numbers) at the cost of performance in a phase (transform preparation) where performance is not considered critical.

What to do with non-square matrices (and why)

Apache SIS often needs to inverse matrices, in order to perform a coordinate operation in reverse direction. Matrix inversions are typically performed on square matrices, but SIS also needs to inverse non-square matrices. Depending on whether we have more lines than columns:

From Apache SIS library perspective, a non-square matrix is a coordinate operation that adds or removes a dimension.
From linear algebra libraries perspective, a non-square matrix is an under-determined or over-determined system of equations.

To illustrate the issues caused by direct use of libraries designed for linear algebra, consider a (φ₁, λ₁, h) → (φ₂, λ₂) conversion from three-dimensional points to two-dimensional points on a surface. The φ terms are latitudes, the λ terms are longitudes and (φ₂, λ₂) may be different than (φ₁, λ₁) if h axis is not perpendicular to the surface.

For linear algebra libraries, the above non-square matrix represents an under-determined system of equations and may be considered unresolvable. Indeed the above coordinate operation cannot be inverted as a (φ₂, λ₂) → (φ₁, λ₁, h) operation because we do not know which value to assign to h. Ignoring h implies that we cannot assign values to (φ₁, λ₁) neither since those values may depend on h. However in GIS case, the ellipsoidal h axis is perpendicular to the ellipsoid surface on which the geodetic latitudes and longitudes (φ, λ) are represented (note that this statement is not true for geocentric latitudes and longitudes). This perpendicularity makes φ₁ and λ₁ independent of h. In such cases, we can can still do some processing.

Apache SIS proceeds by checking if h values are independent of φ and λ values. We identify such cases by checking which matrix coefficients are zero. If SIS can identify independent dimensions, it can temporarily exclude those dimensions and invert the matrix using only the remaining dimensions. If SIS does not found a sufficient amount of independent dimensions, an exception is thrown. But if a matrix inversion has been possible, then we need to decide which value to assign to the dimensions that SIS temporarily excluded. By default, SIS assigns the NaN (Not-a-Number) value to those dimensions. However in the particular case of ellipsoidal height h in a (φ₂, λ₂) → (φ₁, λ₁, h) operation, the zero value may also be appropriate on the assumption that the coordinates are usually close to the ellipsoid surface. In any case, the coefficients that Apache SIS sets to zero or NaN is based on the assumption that the matrix represents a coordinate operation; this is not something that can be done with arbitrary matrices.

The above-described approach allows Apache SIS to resolve some under-determined equation systems commonly found in GIS. In our example using NaN as the default value, the h ordinate stay unknown – we do not create information from nothing – but at least the (φ, λ) coordinates are preserved. The opposite problem, those of over-determined equation systems, is more subtile. An approach commonly applied by linear algebra libraries is to resolve over-determined systems by the least squares method. Such method applied to our example would compute a (φ₂, λ₂, h) → (φ₁, λ₁) operation that seems the best compromise for various φ₂, λ₂ and h values, while being (except special cases) an exact solution for no-one. Furthermore linear combinations between those three variables may be an issue because of heterogenous units of measurement, for instance with h in metres and (φ, λ) in degrees. Apache SIS rather proceeds in the same way than for under-determined systems: by requiring that some dimensions are independent from other dimensions, otherwise the matrix is considered non-invertible. Consequently in over-determined systems case, SIS may refuse to perform some matrix inversions that linear algebra libraries can do, but in case of success the resulting coordinate operation is guaranteed to be exact (ignoring rounding errors).

Apache SIS matrix library

In summary, Apache SIS provides its own matrix library for the following reasons:

Lightweight objects for small matrices, especially the 3×3 ones.
Improved accuracy with “double-double” arithmetic, accepting performance cost in codes where performance is not critical.
Special handling of non-square matrices on the assumption that those matrices represent coordinate operations.

This library is provided in the org.apache.sis.matrix package of the sis-referencing module.