Specificities of a matrix library for GIS

GIS make an extensive usage of matrices for displaying maps or for transforming coordinates. There is many excellent open source or commercial matrix libraries available. However, GIS have some specific needs that differ a little bit from the goals of many existent libraries. Matrix operations like those described in the affine transform chapter appear in almost all coordinate operations applied by Apache SIS. But the analysis of those operations reveal some patterns:

As a consequence of above points, accuracy of a matrix library is more important than performance for a GIS like Apache SIS. Paradoxically, a good way to improve performance is to invest more CPU time for more accurate matrix operations when preparing (not executing) the coordinate operation, because it increases the chances to correctly detect which operations cancel each other. This investment can save execution time at the place where it matters most: in the code looping over millions of coordinates to transform.

However matrix libraries are often designed for high performances with large matrices, sometimes containing thousands of rows and columns. Those libraries can efficiently resolve systems of linear equations with hundreds of unknown variables. Those libraries resolve difficult problems, but not of the same kind than the problems that Apache SIS needs to solve. For that reason, and also for another reason described in next paragraphs, Apache SIS uses its own matrix implementation. This implementation addresses the accuracy issue by using “double-double” arithmetic (a technic for simulating the accuracy of approximately 120 bits wide floating point numbers) at the cost of performance in a phase (transform preparation) where performance is not considered critical.

What to do with non-square matrices (and why)

Apache SIS often needs to inverse matrices, in order to perform a coordinate operation in reverse direction. Matrix inversions are typically performed on square matrices, but SIS also needs to inverse non-square matrices. Depending on whether we have more lines than columns:

To illustrate the issues caused by direct use of libraries designed for linear algebra, consider a (φ₁, λ₁, h) → (φ₂, λ₂) conversion from three-dimensional points to two-dimensional points on a surface. The φ terms are latitudes, the λ terms are longitudes and (φ₂, λ₂) may be different than (φ₁, λ₁) if h axis is not perpendicular to the surface.

For linear algebra libraries, the above non-square matrix represents an under-determined system of equations and may be considered unresolvable. Indeed the above coordinate operation cannot be inverted as a (φ₂, λ₂) → (φ₁, λ₁, h) operation because we do not know which value to assign to h. Ignoring h implies that we cannot assign values to (φ₁, λ₁) neither since those values may depend on h. However in GIS case, the ellipsoidal h axis is perpendicular to the ellipsoid surface on which the geodetic latitudes and longitudes (φ, λ) are represented (note that this statement is not true for geocentric latitudes and longitudes). This perpendicularity makes φ₁ and λ₁ independent of h. In such cases, we can can still do some processing.

Apache SIS proceeds by checking if h values are independent of φ and λ values. We identify such cases by checking which matrix coefficients are zero. If SIS can identify independent dimensions, it can temporarily exclude those dimensions and invert the matrix using only the remaining dimensions. If SIS does not found a sufficient amount of independent dimensions, an exception is thrown. But if a matrix inversion has been possible, then we need to decide which value to assign to the dimensions that SIS temporarily excluded. By default, SIS assigns the NaN (Not-a-Number) value to those dimensions. However in the particular case of ellipsoidal height h in a (φ₂, λ₂) → (φ₁, λ₁, h) operation, the zero value may also be appropriate on the assumption that the coordinates are usually close to the ellipsoid surface. In any case, the coefficients that Apache SIS sets to zero or NaN is based on the assumption that the matrix represents a coordinate operation; this is not something that can be done with arbitrary matrices.

The above-described approach allows Apache SIS to resolve some under-determined equation systems commonly found in GIS. In our example using NaN as the default value, the h ordinate stay unknown – we do not create information from nothing – but at least the (φ, λ) coordinates are preserved. The opposite problem, those of over-determined equation systems, is more subtile. An approach commonly applied by linear algebra libraries is to resolve over-determined systems by the least squares method. Such method applied to our example would compute a (φ₂, λ₂, h) → (φ₁, λ₁) operation that seems the best compromise for various φ₂, λ₂ and h values, while being (except special cases) an exact solution for no-one. Furthermore linear combinations between those three variables may be an issue because of heterogenous units of measurement, for instance with h in metres and (φ, λ) in degrees. Apache SIS rather proceeds in the same way than for under-determined systems: by requiring that some dimensions are independent from other dimensions, otherwise the matrix is considered non-invertible. Consequently in over-determined systems case, SIS may refuse to perform some matrix inversions that linear algebra libraries can do, but in case of success the resulting coordinate operation is guaranteed to be exact (ignoring rounding errors).

Apache SIS matrix library

In summary, Apache SIS provides its own matrix library for the following reasons:

This library is provided in the org.apache.sis.matrix package of the sis-referencing module.