d3d/archive/D3D11_3_FunctionalSpec.htm [27310:27485]:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Description:    Thread group sync and/or memory barrier.

Operation:      Sync has options _uglobal, _ugroup, _g and _t,
                described further below.

                In graphics shader stages, only sync_uglobal is allowed.

                In the Compute Shader, (_uglobal or _ugroup*)
                and/or _g must be specified. _t is
                optional in addition.

                *Note the _ugroup option will not be exposed
                to developers unless discovered to be critical &ndash;
                discussed further below.

                _uglobal:
                ---------

                Global u# (UAV) memory fence.

                All prior u# memory reads/writes by this thread in
                program order are made visible to all threads
                on the "entire GPU" before any subsequent u# memory
                accesses by this thread.  The "entire GPU" part
                of the definition is replaced by a less-than-global
                scope in one case though, described below.

                This applies to all UAV memory bound at the
                currently executing pipeline (graphics or compute).

                _uglobal is available in any shader
                stage.

                For any bound UAV that has not been
                declared by the shader as "Globally
                Coherent" (see the discussion of the
                Shader Memory Consistency Model"), the
                _uglobal u# memory fence only has visibility
                within the current Compute Shader thread-group
                for that UAV (as if _ugroup instead of _uglobal).
                (This issue only applies to the Compute Shader,
                since the graphics shaders must declare all UAVs as
                Globally Coherent).

                _ugroup:
                --------

                Thread group scope u# (UAV) memory fence.

                All prior u# memory reads/writes by this thread in
                program order are made visible to all threads
                in the thread group before any subsequent u# memory
                accesses by this thread.

                This applies to all UAV memory bound at the
                current Shader stage.

                _ugroup is available in the Compute Shader only.

                Note that _ugroup will initially not be exposed to
                developers, although drivers will be tested by
                Microsoft such that they handle the option correctly
                through test shaders. If missing the _ugroup option
                becomes a significant issue for developers, Microsoft
                will consider exposing it in the future via compiler
                update.

                If _ugroup were to be exposed, for some implementations,
                the advantage of specifying _ugroup when that
                is all that is needed (instead of _uglobal) is that
                the sync operation can complete more quickly.
                Other implementations do not distinguish _ugroup from
                _uglobal, so both operations are equivalent and behave
                like _uglobal.  Basically, it does not hurt for
                applications to specify their intent by requesting
                the narrowest scope of sync necessary.

                Note that even if a particular UAV is declared as
                "Globally Coherent" (see the discussion of the
                Shader Memory Consistency Model), a _ugroup sync
                operation could still function more efficiently on
                that UAV if a global barrier is not required.

                _g:
                ---

                g# (Thread Group Shared Memory) fence.

                All prior g# memory reads/writes by this thread in
                program order are made visible to all threads
                in the thread group before any subsequent g# memory
                accesses by this thread.

                This applies to all of the current Thread
                Group's g# Shared Memory.

                _g is available in the Compute Shader only.


                _t:
                ---

                Thread group sync. All threads within a
                single thread group (those that can share
                access to a common set of shared register
                space) will be executed up to the point
                where they reach this instruction before
                any thread can continue.

                _t cannot be placed in
                dynamic flow control (branches which could
                vary within a thread group), but can be
                present in uniform flow control, where all
                threads in the group pick the same path.

                _t is available in the Compute Shader only.

                --------

                Listing of Compute Shader &ldquo;sync&rdquo; variants:

                sync_g
                sync_ugroup*
                sync_uglobal
                sync_g_t
                sync_ugroup_t*
                sync_uglobal_t
                sync_ugroup_g*
                sync_uglobal_g
                sync_ugroup_g_t*
                sync_uglobal_g_t

                *Variants with _ugroup may not be targeted
                by the HLSL compiler, per the earlier discussion
                in the _ugroup section above.

                Listing of Graphics Shader &ldquo;sync&rdquo; variants:

                sync_uglobal only.


                Observations:
                -------------
                Memory fences prevent affected instructions
                from being reordered by compilers or hardware
                across the fence.

                Multiple reads from the same address
                by a shader invocation that are not separated
                by memory barriers or writes to the address
                can be collapsed together.  Likewise for writes.
                But accesses separated by a barrier cannot be
                merged or moved across the barrier.

                Memory fences are not necessary for atomic operations
                to a given address by different threads to function
                correctly.  Fences are needed when atomics and/or
                load/store operations need to be synchronized with
                respect to each other as they appear in individual
                threads from the point of view of other threads.

                In the Pixel Shader, discard instructions imply
                a sync_uglobal fence, in that instructions cannot be
                reordered across the discard.  sync_uglobal in
                helper pixels (which run only to support
                derivatives) or discarded pixels may or may not
                have any affect.  Note it is disallowed
                for helper or discarded pixels to write to UAVs
                (in the case of discard, if the writes issued after
                the discard), and returned values from UAVs are
                not allowed to contribute to derivative calculations.
                Therefore whether or not sync_u is honored or not
                for helper pixels or when issued after a discard is
                moot.
</pre>
<hr><!-- ********************************************************************** -->
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -


d3d/archive/images/d3d11/D3D11_3_FunctionalSpec.htm [24936:25111]:
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Description:    Thread group sync and/or memory barrier.

Operation:      Sync has options _uglobal, _ugroup, _g and _t,
                described further below.

                In graphics shader stages, only sync_uglobal is allowed.

                In the Compute Shader, (_uglobal or _ugroup*)
                and/or _g must be specified. _t is
                optional in addition.

                *Note the _ugroup option will not be exposed
                to developers unless discovered to be critical &ndash;
                discussed further below.

                _uglobal:
                ---------

                Global u# (UAV) memory fence.

                All prior u# memory reads/writes by this thread in
                program order are made visible to all threads
                on the "entire GPU" before any subsequent u# memory
                accesses by this thread.  The "entire GPU" part
                of the definition is replaced by a less-than-global
                scope in one case though, described below.

                This applies to all UAV memory bound at the
                currently executing pipeline (graphics or compute).

                _uglobal is available in any shader
                stage.

                For any bound UAV that has not been
                declared by the shader as "Globally
                Coherent" (see the discussion of the
                Shader Memory Consistency Model"), the
                _uglobal u# memory fence only has visibility
                within the current Compute Shader thread-group
                for that UAV (as if _ugroup instead of _uglobal).
                (This issue only applies to the Compute Shader,
                since the graphics shaders must declare all UAVs as
                Globally Coherent).

                _ugroup:
                --------

                Thread group scope u# (UAV) memory fence.

                All prior u# memory reads/writes by this thread in
                program order are made visible to all threads
                in the thread group before any subsequent u# memory
                accesses by this thread.

                This applies to all UAV memory bound at the
                current Shader stage.

                _ugroup is available in the Compute Shader only.

                Note that _ugroup will initially not be exposed to
                developers, although drivers will be tested by
                Microsoft such that they handle the option correctly
                through test shaders. If missing the _ugroup option
                becomes a significant issue for developers, Microsoft
                will consider exposing it in the future via compiler
                update.

                If _ugroup were to be exposed, for some implementations,
                the advantage of specifying _ugroup when that
                is all that is needed (instead of _uglobal) is that
                the sync operation can complete more quickly.
                Other implementations do not distinguish _ugroup from
                _uglobal, so both operations are equivalent and behave
                like _uglobal.  Basically, it does not hurt for
                applications to specify their intent by requesting
                the narrowest scope of sync necessary.

                Note that even if a particular UAV is declared as
                "Globally Coherent" (see the discussion of the
                Shader Memory Consistency Model), a _ugroup sync
                operation could still function more efficiently on
                that UAV if a global barrier is not required.

                _g:
                ---

                g# (Thread Group Shared Memory) fence.

                All prior g# memory reads/writes by this thread in
                program order are made visible to all threads
                in the thread group before any subsequent g# memory
                accesses by this thread.

                This applies to all of the current Thread
                Group's g# Shared Memory.

                _g is available in the Compute Shader only.


                _t:
                ---

                Thread group sync. All threads within a
                single thread group (those that can share
                access to a common set of shared register
                space) will be executed up to the point
                where they reach this instruction before
                any thread can continue.

                _t cannot be placed in
                dynamic flow control (branches which could
                vary within a thread group), but can be
                present in uniform flow control, where all
                threads in the group pick the same path.

                _t is available in the Compute Shader only.

                --------

                Listing of Compute Shader &ldquo;sync&rdquo; variants:

                sync_g
                sync_ugroup*
                sync_uglobal
                sync_g_t
                sync_ugroup_t*
                sync_uglobal_t
                sync_ugroup_g*
                sync_uglobal_g
                sync_ugroup_g_t*
                sync_uglobal_g_t

                *Variants with _ugroup may not be targeted
                by the HLSL compiler, per the earlier discussion
                in the _ugroup section above.

                Listing of Graphics Shader &ldquo;sync&rdquo; variants:

                sync_uglobal only.


                Observations:
                -------------
                Memory fences prevent affected instructions
                from being reordered by compilers or hardware
                across the fence.

                Multiple reads from the same address
                by a shader invocation that are not separated
                by memory barriers or writes to the address
                can be collapsed together.  Likewise for writes.
                But accesses separated by a barrier cannot be
                merged or moved across the barrier.

                Memory fences are not necessary for atomic operations
                to a given address by different threads to function
                correctly.  Fences are needed when atomics and/or
                load/store operations need to be synchronized with
                respect to each other as they appear in individual
                threads from the point of view of other threads.

                In the Pixel Shader, discard instructions imply
                a sync_uglobal fence, in that instructions cannot be
                reordered across the discard.  sync_uglobal in
                helper pixels (which run only to support
                derivatives) or discarded pixels may or may not
                have any affect.  Note it is disallowed
                for helper or discarded pixels to write to UAVs
                (in the case of discard, if the writes issued after
                the discard), and returned values from UAVs are
                not allowed to contribute to derivative calculations.
                Therefore whether or not sync_u is honored or not
                for helper pixels or when issued after a discard is
                moot.
</pre>
<hr><!-- ********************************************************************** -->
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -