content/developer-guide/annexes/design/MatrixLibrary.html

<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE html>  <html xmlns="http://www.w3.org/1999/xhtml" xmlns:xi="http://www.w3.org/2001/XInclude" xml:lang="en"> <head> <title>MatrixLibrary</title> <meta charset="UTF-8"/> <link rel="stylesheet" type="text/css" href="../../book.css"/> </head> <body>  <section> <header> <h3 id="MatrixLibrary">Specificities of a matrix library for <abbr>GIS</abbr></h3> </header> <p> <abbr>GIS</abbr> make an extensive usage of matrices for displaying maps or for transforming coordinates. There is many excellent open source or commercial matrix libraries available. However, <abbr>GIS</abbr> have some specific needs that differ a little bit from the goals of many existent libraries. Matrix operations like those described in the <a href="#AffineTransform">affine transform chapter</a> appear in almost all coordinate operations applied by Apache <abbr>SIS</abbr>. But the analysis of those operations reveal some patterns: </p> <ul> <li><p>Those matrices are almost always of small size, rarely more than 5 rows and 5 columns.</p></li> <li><p>“Heavy” matrix operations (matrix inversions or multiplications) do not happen in performance-critical code. In almost every cases, those heavy operations are executed only once, for example when a data file is read or in preparation steps before to transform coordinates. Those heavy operations rarely happen in the loop that apply the coordinate operation after we finished to prepare it.</p></li> <li><p>In a sequence of matrix multiplications or inversions, rounding errors accumulate and grow fast. If the sequence of matrix operations is long enough, rounding errors can become indistinguishable from real operations like datum shifts. This ambiguity can happen because the matrix representation of some datum shifts has small values, with scale factors of a few parts per million and rotation terms of a few arc-seconds.</p></li> <li><p>It is quite common that two affine transforms cancel each other when they are concatenated, i.e. that matrix multiplications result in some or all scale factors equal to 1 and some or all translation terms equal to 0. However because of rounding errors the results are rarely exact, but are rather some values like 0,9999…97 instead of 1. The usual workaround is to compare floating point numbers with some tolerance threshold. Unfortunately it is difficult to choose a threshold that catch a wide range of rounding errors without wrongly considering legitimate datum shifts as rounding errors (see previous point).</p></li> </ul> <p> As a consequence of above points, accuracy of a matrix library is more important than performance for a <abbr>GIS</abbr> like Apache <abbr>SIS</abbr>. Paradoxically, a good way to improve performance is to invest more CPU time for more accurate matrix operations when <em>preparing</em> (not <em>executing</em>) the coordinate operation, because it increases the chances to correctly detect which operations cancel each other. This investment can save execution time at the place where it matters most: in the code looping over millions of coordinates to transform. </p><p> However matrix libraries are often designed for high performances with large matrices, sometimes containing thousands of rows and columns. Those libraries can efficiently resolve systems of linear equations with hundreds of unknown variables. Those libraries resolve difficult problems, but not of the same kind than the problems that Apache <abbr>SIS</abbr> needs to solve. For that reason, and also for another reason described in next paragraphs, Apache <abbr>SIS</abbr> uses its own matrix implementation. This implementation addresses the accuracy issue by using “double-double” arithmetic (a technic for simulating the accuracy of approximately 120 bits wide floating point numbers) at the cost of performance in a phase (transform <em>preparation</em>) where performance is not considered critical. </p> <h4 id="NonSquareMatrix">What to do with non-square matrices (and why)</h4> <p> Apache <abbr>SIS</abbr> often needs to inverse matrices, in order to perform a coordinate operation in reverse direction. Matrix inversions are typically performed on square matrices, but <abbr>SIS</abbr> also needs to inverse non-square matrices. Depending on whether we have more lines than columns: </p> <ul> <li>From Apache <abbr>SIS</abbr> library perspective, a non-square matrix is a coordinate operation that adds or removes a dimension.</li> <li>From linear algebra libraries perspective, a non-square matrix is an under-determined or over-determined system of equations.</li> </ul> <p> To illustrate the issues caused by direct use of libraries designed for linear algebra, consider a (<var>φ₁</var>, <var>λ₁</var>, <var>h</var>) → (<var>φ₂</var>, <var>λ₂</var>) conversion from three-dimensional points to two-dimensional points on a surface. The <var>φ</var> terms are latitudes, the <var>λ</var> terms are longitudes and (<var>φ₂</var>, <var>λ₂</var>) may be different than (<var>φ₁</var>, <var>λ₁</var>) if <var>h</var> axis is not perpendicular to the surface. </p><p> <xi:include href="../../../../layouts/partials/math/Geographic3Dto2D.html"/> </p><p> For linear algebra libraries, the above non-square matrix represents an under-determined system of equations and may be considered unresolvable. Indeed the above coordinate operation cannot be inverted as a (<var>φ₂</var>, <var>λ₂</var>) → (<var>φ₁</var>, <var>λ₁</var>, <var>h</var>) operation because we do not know which value to assign to <var>h</var>. Ignoring <var>h</var> implies that we cannot assign values to (<var>φ₁</var>, <var>λ₁</var>) neither since those values may depend on <var>h</var>. However in <abbr>GIS</abbr> case, the ellipsoidal <var>h</var> axis is perpendicular to the ellipsoid surface on which the <em>geodetic</em> latitudes and longitudes (<var>φ</var>, <var>λ</var>) are represented (note that this statement is not true for <em>geocentric</em> latitudes and longitudes). This perpendicularity makes <var>φ₁</var> and <var>λ₁</var> independent of <var>h</var>. In such cases, we can can still do some processing. </p><p> Apache <abbr>SIS</abbr> proceeds by checking if <var>h</var> values are independent of <var>φ</var> and <var>λ</var> values. We identify such cases by checking which matrix coefficients are zero. If <abbr>SIS</abbr> can identify independent dimensions, it can temporarily exclude those dimensions and invert the matrix using only the remaining dimensions. If <abbr>SIS</abbr> does not found a sufficient amount of independent dimensions, an exception is thrown. But if a matrix inversion has been possible, then we need to decide which value to assign to the dimensions that <abbr>SIS</abbr> temporarily excluded. By default, <abbr>SIS</abbr> assigns the <code>NaN</code> (<dfn>Not-a-Number</dfn>) value to those dimensions. However in the particular case of ellipsoidal height <var>h</var> in a (<var>φ₂</var>, <var>λ₂</var>) → (<var>φ₁</var>, <var>λ₁</var>, <var>h</var>) operation, the zero value may also be appropriate on the assumption that the coordinates are usually close to the ellipsoid surface. In any case, the coefficients that Apache <abbr>SIS</abbr> sets to zero or <code>NaN</code> is based on the assumption that the matrix represents a coordinate operation; this is not something that can be done with arbitrary matrices. </p><p> The above-described approach allows Apache <abbr>SIS</abbr> to resolve some under-determined equation systems commonly found in <abbr>GIS</abbr>. In our example using <code>NaN</code> as the default value, the <var>h</var> ordinate stay unknown – we do not create information from nothing – but at least the (<var>φ</var>, <var>λ</var>) coordinates are preserved. The opposite problem, those of over-determined equation systems, is more subtile. An approach commonly applied by linear algebra libraries is to resolve over-determined systems by the least squares method. Such method applied to our example would compute a (<var>φ₂</var>, <var>λ₂</var>, <var>h</var>) → (<var>φ₁</var>, <var>λ₁</var>) operation that seems the best compromise for various <var>φ₂</var>, <var>λ₂</var> and <var>h</var> values, while being (except special cases) an exact solution for no-one. Furthermore linear combinations between those three variables may be an issue because of heterogenous units of measurement, for instance with <var>h</var> in metres and (<var>φ</var>, <var>λ</var>) in degrees. Apache <abbr>SIS</abbr> rather proceeds in the same way than for under-determined systems: by requiring that some dimensions are independent from other dimensions, otherwise the matrix is considered non-invertible. Consequently in over-determined systems case, <abbr>SIS</abbr> may refuse to perform some matrix inversions that linear algebra libraries can do, but in case of success the resulting coordinate operation is guaranteed to be exact (ignoring rounding errors). </p> <h4 id="MatrixLibrarySummary">Apache <abbr>SIS</abbr> matrix library</h4> <p> In summary, Apache <abbr>SIS</abbr> provides its own matrix library for the following reasons: </p> <ul> <li>Lightweight objects for small matrices, especially the 3×3 ones.</li> <li>Improved accuracy with “double-double” arithmetic, accepting performance cost in codes where performance is not critical.</li> <li>Special handling of non-square matrices on the assumption that those matrices represent coordinate operations.</li> </ul> <p> This library is provided in the <code>org.apache.sis.matrix</code> package of the <code>sis-referencing</code> module. </p> </section> </body> </html>

content/developer-guide/annexes/design/MatrixLibrary.html (127 lines of code) (raw):