--- active_crumb: Intent Matching layout: documentation id: intent_matching ---
{% scaladoc NCModel NCModel %} processing logic is defined as a pipeline and the collection of one or more intents to be matched on. The sections below explain what intent is, how to define it in your model, and how it works.
The goal of the data model implementation is to take the user input text, pass it through processing pipeline and match the resulting variants to a specific user-defined code that will execute for that input. The mechanism that provides this matching is called an intent.
The intent generally refers to the goal that the end-user had in mind when speaking or typing the input utterance. The intent has a declarative part or template written in IDL - Intent Definition Language that strictly defines a particular form the user input. Intent is also bound to a callback method that will be executed when that intent, i.e. its template, is detected as the best match for a given input. A typical data model will have multiple intents defined for each form of the expected user input that model wants to react to.
For example, a data model for banking chatbot or analytics application can have multiple intents for each domain-specific group of input such as opening an account, closing an account, transferring money, getting statements, etc.
Intents can be specific or generic in terms of what input they match. Multiple intents can overlap and NLPCraft will disambiguate such cases to select the intent with the overall best match. In general, the most specific intent match wins.
NLPCraft intents are written in Intent Definition Language (IDL).
IDL is a relatively straightforward declarative language. For example,
here's a simple intent x
with two terms a
and b
:
/* Intent 'x' definition. */ intent=x term(a)~{# == 'my_elm'} // Term 'a'. term(b)={has(ent_groups, "my_group")} // Term 'b'.
IDL intent defines a match between the parsed user input represented as the collection of
{% scaladoc2 NCEntity entities %},
and the user-define callback method. IDL intents are bound to their callbacks via
Java annotation and can be located in the same Java annotations or
in external *.idl
files.
You can review the formal ANTLR4 grammar for IDL, but here are the general properties of IDL:
// Comment.
as well as multi-line /* Comment. */
.
'text'
) or double quotes ("text"
) simplifying IDL usage in Scala - you
don't have to escape double quotes. Both quotes can be escaped in string, i.e. "text with \" quote"
or
'text with \' quote'
true
, false
and null
for boolean and null values.
'_'
character for separation as in 200_000
.
flow fragment import intent meta options term true false null
length(trim(" text "))
vs.
OOP-style " text ".trim().length()
.
if
and or_else
also provide the similar short-circuit evaluation.
IDL program consists of intent, fragment, or import statements in any order or combination:
intent
statement
Intent is defined as one or more terms. Each term is a predicate over a instance of {% scaladoc NCEntity NCEntity %} trait. For an intent to match all of its terms have to evaluate to true. Intent definition can be informally explained using the following full-feature example:
intent=xa flow="^(?:login)(^:logout)*$" meta={'enabled': true} term(a)={month >= 6 && !# != "z" && meta_intent('enabled') == true}[1,3] term(b)~{ @usrTypes = meta_req('user_types') (# == 'order' || # == 'order_cancel') && has_all(@usrTypes, list(1, 2, 3)) } intent=xb options={ 'ordered': false, 'unused_free_words': true, 'unused_entities': false, 'allow_stm_only': false } term(a)={length("some text") > 0} fragment(frag, {'p1': 25, 'p2': {'a': false}})
NOTES:
intent=xa
line 1intent=xb
line 11
xa
and xb
are the mandatory intent IDs. Intent ID is any arbitrary unique string matching the following lexer template:
(UNI_CHAR|UNDERSCORE|LETTER|DOLLAR)+(UNI_CHAR|DOLLAR|LETTER|[0-9]|COLON|MINUS|UNDERSCORE)*
options={...}
line 12Option | Type | Description | Default Value |
ordered |
Boolean |
Whether or not this intent is ordered. For ordered intent the specified order of terms is important for matching this intent. If intent is unordered its terms can be found in any order in the input text. Note that ordered intent significantly limits the user input it can match. In most cases the ordered intent is only applicable to processing of a formal grammar (like a programming language) and mostly unsuitable for the natural language processing.
Note that while the
|
false |
unused_free_words |
Boolean |
Whether or not free words - that are unused by intent matching - should be
ignored (value true ) or reject the intent match (value false ).
Free words are the words in the user input that were not recognized as any
entity. Typically, for the natural language comprehension it is safe to ignore free
words. For the formal grammar, however, this could make the matching logic too loose.
|
true |
unused_entities |
Boolean |
Whether or not unused entities should be
ignored (value true ) or reject the intent match (value false ).
By default, tne unused entities are not ignored since it is assumed that user would
define pipeline entity parser on purpose and construct the intent logic appropriate.
|
false |
allow_stm_only |
Boolean |
Whether or not the intent can match when all of the matching entities came from STM.
By default, this special case is disabled (value false ). However, in specific intents
designed for free-form language comprehension scenario, like, for example, SMS messaging - you
may want to enable this option.
|
false |
flow="^(?:login)(^:logout)*$"
line 2Optional. Dialog flow is a history of previously matched intents to match on. If provided, the intent will first match on the history of the previously matched intents before processing its terms by using regular expressions. Dialog flow specification is a string with the standard Java regular expression. The history of previously matched intents is presented as a space separated string of intent IDs that were selected as the best match during the current conversation, in the chronological order with the most recent matched intent ID being the first element in the string. Dialog flow regular expression will be matched against that string representing intent IDs.
In the line 2, the ^(?:login)(^:logout)*$
dialog flow regular expression defines that intent
should only match when the immediate previous intent was login
and no logout
intents
are in the history. If the history is "login order order"
- this intent will match. However, for
"login logout"
or "order login"
history this dialog flow will not match.
Note that if dialog flow is defined and it doesn't match the history the terms of the intent won't be tested at all.
meta={'enabled': true}
line 3
Optional.
Just like the most of the components in NLPCraft, the intent can have its own metadata.
Intent metadata is defined as a standard JSON object which will be converted into java.util.Map
instance and can be accessed in intent's terms via
meta_intent()
IDL function.
The typical use case for declarative intent metadata is to parameterize its behavior, i.e. the behavior of its terms,
with a clearly defined properties that are provided inside intent definition itself.
term(a)={month >= 6 && !# != "z" && meta_intent('enabled') == true}[1,3]
line 4term(b)~{
line 5@usrTypes = meta_req('user_types')
(# == 'order' || # == 'order_cancel') && has_all(@usrTypes, list(1, 2, 3))
}
term(a)={length("some text") > 0}
line 18
Term is a building block of the intent. Intent must have at least one term.
Term has optional ID, an entity predicate and optional quantifiers.
It supports conversation context if it uses '~'
symbol or not if it uses '='
symbol in its definition. For the conversational term the system will search for a match using entities from
the current request as well as the entities from conversation STM (short-term-memory). For a non-conversational
term - only entities from the current request will be considered.
A term is matched if its entity predicate returns true.
The matched term represents one or more entities, sequential or not, that were detected in the user input. Intent has a list of terms
(always at least one) that all have to be matched in the user input for the intent to match. Note that term
can be optional if its min quantifier is zero. Whether the order of the terms is important
for matching is governed by intent's ordered
parameter.
Term ID (a
and b
) is optional. It is only required by
@NCIntentTerm
annotation to link term's entities to a formal parameter of the callback method. Note that term ID follows
the same lexical rules as intent ID.
Inside of curly brackets {
}
you can have an optional list of term variables
and the mandatory term expression that must evaluate to a boolean value. Term variable name must start with
@
symbol and be unique within the scope of the current term. All term variables must be defined
and initialized before term expression which must be the last statement in the term:
term(b)~{ @a = meta_req('a') @lst = list(1, 2, 3, 4) has_all(@lst, list(@a, 2)) }
Term variable initialization expression as well as term's expression follow Java-like expression grammar including precedence rules, brackets and logical combinators, as well as built-in IDL functions calls:
term={true} // Special case of 'constant' term. term={ // Variable declarations. @a = round(1.25) @b = meta_req('my_prop') // Last expression must evaluate to boolean. (@a + 2) * @b > 0 } term={ // Variable declarations. @c = meta_ent('prop') @lst = list(1, 2, 3) // Last expression must evaluate to boolean. abs(@c) > 1 && size(@lst) != 5 }
NOTE: while term variable initialization expressions can have any type - the term's expression itself, i.e. the last expression in the term's body, must evaluate to a boolean result only. Failure to do so will result in a runtime exception during intent evaluation. Note also that such errors cannot be detected during intent compilation phase.
?
and [1,3]
define an inclusive quantifier for that term, i.e. how many times
the match for this term should found. You can use the following quick abbreviations:
*
is equal to [0,∞]
+
is equal to [1,∞]
?
is equal to [0,1]
[1,1]
As mentioned above the quantifier is inclusive, i.e. the [1,3]
means that
the term should appear once, two times or three times.
fragment(frag, {'p1': 25, 'p2': {'a': false}})
line 19
Fragment reference allows to insert the terms defined by that fragment in place of this fragment reference.
Fragment reference has mandatory fragment ID parameter and optional JSON second parameter. Optional
JSON parameter allows to parameterize the inserted terms' behavior and it is available to the
terms via meta_frag()
IDL function.
fragment
statement
Fragments allow to group and name a set of reusable terms. Such groups can be further parameterized at the place of reference and enable the reuse of one or more terms by multiple intents. For example:
// Fragments. fragment=buzz term~{# == meta_frag('id')} fragment=when term(nums)~{ // Term variable. @type = meta_ent('num:unittype') @iseq = meta_ent('num:isequalcondition') # == 'num' && @type == 'datetime' && @iseq == true }[0,7] // Intents. intent=alarm // Insert parameterized terms from fragment 'buzz'. fragment(buzz, {"id": "x:alarm"}) // Insert terms from fragment 'when'. fragment(when)
NOTES:
buzz
and when
) and a list of terms.
import
statement
Import statement allows to import IDL declarations from either local file, classpath resource or URL:
// Import using absolute path. import('/opt/globals.idl') // Import using classpath resource. import('org/apache/nlpcraft/examples/alarm/intents.idl') // Import using URL. import('ftp://user:password@myhost:22/opt/globals.idl')
NOTES:
import
keyword and has a string parameter that indicates
the location of the resource to import.
During {% scaladoc NCModelClient NCModelClient %} initialization it scans the provided model class for the intents. All found intents are compiled into an internal representation.
Note that not all intent-related problems can be detected at the compilation phase, and {% scaladoc NCModelClient NCModelClient %} can be initialized with intents not being completely validated. For example, each term in the intent must evaluate to a boolean result. This can only be checked at runtime. Another example is the number and the types of parameters passed into IDL function which is only checked at runtime as well.
Intents are compiled only once during the {% scaladoc NCModelClient NCModelClient %} initialization and cannot be re-compiled. Model logic, however, can affect the intent behavior through {% scaladoc NCModel NCModel %} callback methods and metadata all of which can change at runtime and are accessible through IDL functions.
Here's few of intent examples with explanations:
Example 1:
intent=a term~{# == 'x:type'} term(nums)~{# == 'num' && lowercase(meta_ent('num:unittype')) == 'datetime'}[0,2]
NOTES:
a
.
true
) and default order (false
).
~
) that have to be found for the intent to match. Note that second
term is optional as it has [0,2]
quantifier.
x:type
.
num
with
num:unittype
metadata property equal to 'datetime'
string.
lowercase
used on num:unittype
metadata property value.
nums
) it can be references by @NCIntentTerm
annotation by the callback formal parameter.
Example 2:
intent=id2 flow='id1 id2' term={# == 'myent' && signum(get(meta_ent('score'), 'best')) != -1} term={has_any(ent_groups, list('actors', 'owners'))}
NOTES:
id2
.
'id1 id2'
. It expects the sequence of intents id1
and
id2
somewhere in the history of previously matched intents in the course of the current conversation.
=
). Both terms have to be present only once (their implicit quantifiers are [1,1]
).
myent
and have metadata property score
of type
map. This map should have a value with the string key 'best'
. signum
of this map value
should not equal -1
. Note that meta_ent()
, get()
and
signum()
are all built-in IDL functions.
actors
or owners
group.
IDL provides over 100 built-in functions that can be used in IDL intent definitions.
IDL function call takes on traditional
fun_name(p1, p2, ... pk)
syntax form. If function has no parameters, the brackets are optional.
IDL function operates on stack - its parameters
are taken from the stack and its result is put back onto stack which in turn can become a parameter for the next function
call and so on. IDL functions can have zero or more parameters and always have one result value. Some IDL
functions support variable number of parameters.
Special Shorthand #
The frequently used IDL function ent_type()
has a special shorthand #
. For example,
the following expressions are all equal:
ent_type() == 'type' ent_type == 'type' // Remember - empty parens are optional. # == 'type'
When chaining the function
calls IDL uses mathematical notation (a-la Python) rather than object-oriented one: IDL length(trim(" text "))
vs. OOP-style " text ".trim().length()
.
IDL functions operate with the following types:
JVM Type | IDL Name | Notes |
---|---|---|
java.lang.String | String | |
java.lang.Long java.lang.Integer java.lang.Short java.lang.Byte
|
Long |
Smaller numerical types will be converted to java.lang.Long .
|
java.lang.Double java.lang.Float
|
Double |
java.lang.Float will be converted to java.lang.Double .
|
java.lang.Boolean | Boolean | You can use true or false literals. |
java.util.List<T> | List[T] | Use list(...) IDL function to create new list. |
java.util.Map<K,V> | Map[K,V] | |
{% scaladoc NCEntity NCEntity %} | Entity | |
java.lang.Object | Any | Any of the supported types above. Use null literal for null value. |
Some IDL functions are polymorphic, i.e. they can accept arguments and return result of multiple types. Encountering unsupported types will result in a runtime error during intent matching. It is especially important to watch out for the types when adding objects to various metadata containers and using that metadata in the IDL expressions.
Unsupported Types
Detection of the unsupported types by IDL functions cannot be done during IDL compilation and can only be done during runtime execution. This means that even though the model compiles IDL intents and {% scaladoc NCModelClient NCModelClient %} starts successfully - it does not guarantee that intents will operate correctly.
All IDL functions are organized into the following groups:
Description:
{{fn.desc}}
Usage:
{{fn.usage}}
Description:
{{fn.desc}}
Usage:
{{fn.usage}}
Description:
{{fn.desc}}
Usage:
{{fn.usage}}
Description:
{{fn.desc}}
Usage:
{{fn.usage}}
Description:
{{fn.desc}}
Usage:
{{fn.usage}}
Description:
{{fn.desc}}
Usage:
{{fn.usage}}
Description:
{{fn.desc}}
Usage:
{{fn.usage}}
Description:
{{fn.desc}}
Usage:
{{fn.usage}}
IDL declarations can be placed in different locations based on user preferences:
NCIntent
java annotation
takes a string as its parameter that should be a valid IDL declaration. For example, Scala code snippet:
@NCIntent("import('/opt/myproj/global_fragments.idl')") // Importing. @NCIntent("intent=act term(act)={has(ent_groups, 'act')} fragment(f1)") // Defining in place. def onMatch( @NCIntentTerm("act") actEnt: NCEntity, @NCIntentTerm("loc") locEnts: List[NCEntity] ): NCResult = { ... }
*.idl
files contain IDL declarations and can be imported in any other places where
IDL declarations are allowed. See import()
statement explanation below. For example:
/* * File 'my_intents.idl'. * ====================== */ import('/opt/globals.idl') // Import global intents and fragments. // Fragments. // ---------- fragment=buzz term~{# == 'x:alarm'} fragment=when term(nums)~{ // Term variables. @type = meta_ent('num:unittype') @iseq = meta_ent('num:isequalcondition') # == 'num' && @type != 'datetime' && @iseq == true }[0,7] // Intents. // -------- intent=alarm fragment(buzz) fragment(when)
IDL intents must be bound to their callback methods. This binding is accomplished using the following Java annotations:
Annotation | Target | Description |
---|---|---|
@NCIntent |
Callback method or model class |
When applied to a method this annotation allows to define IDL intent in-place on the method
serving as its callback.
This annotation can also be applied to a model's class in which case it will just declare the intent
without binding it and the
callback method will need to use
This method is ideal for simple intents and quick declaration right in the source code and has
all the benefits of having IDL to be part of the source code. However, multi-line IDL declaration can be awkward
to add and maintain depending on Scala language, i.e. multi-line string literal support. In such
cases it is advisable to move IDL declarations into separate |
@NCIntentRef |
Callback method |
This annotation allows to reference an intent defined elsewhere like an external *.idl file,
or other @NCIntent annotations. In real
applications, this is a most common way to bound an externally defined intent to its callback method.
|
@NCIntentObject |
Model class field | Marker annotation that can be applied to class member of main model. The fields objects annotated with this annotation are scanned the same way as main model. |
@NCIntentTerm |
Callback method parameter | This annotation marks a formal callback method parameter to receive term's entities when the intent to which this term belongs is selected as the best match. |
Here's a couple of examples of intent declarations to illustrate the basics of intent declaration and usage.
An intent from Light Switch example:
@NCIntent("intent=ls term(act)={has(ent_groups, 'act')} term(loc)={# == 'ls:loc'}*") def onMatch( @ctx: NCContext, @im: NCIntentMatch, @NCIntentTerm("act") actEnt: NCEntity, @NCIntentTerm("loc") locEnts: List[NCEntity] ): NCResult = { ... }
NOTES:
@NCIntent
annotation.
act
has two non-conversational terms: one mandatory term and another
that can match zero or more entities with method onMatch(...)
as its callback.
'~'
and non-conversational if it uses '='
symbol in its definition. If term is conversational, the matching algorithm will look into the conversation
context short-term-memory (STM) to seek the matching entities for this term. Note that the terms that were fully or partially matched using entities from
the conversation context will contribute a smaller weight to the overall intent matching weight since these terms are less specific.
Non-conversational terms will be matched using entities found only in the current user input without looking at the conversation context.
onMatch(...)
will be called if and when this intent is selected as the best match.
min=1, max=1
quantifiers by default, i.e. one and only one.
act
. Note that model elements
can belong to multiple groups.
act
and loc
) that are used in onMatch(...)
method parameters to automatically assign terms' entities to the formal method parameters using @NCIntentTerm
annotations.
In the following Time example
the intent is defined model class and referenced in code using @NCIntentRef
annotation:
@NCIntent("fragment=city term(city)~{# == 'opennlp:location'}") @NCIntent("intent=intent2 term~{# == 'x:time'} fragment(city)") class TimeModel extends NCModel( ... @NCIntentRef("intent2") private def onRemoteMatch( ctx: NCContext, im: NCIntentMatch, @NCIntentTerm("city") cityEnt: NCEntity ): NCResult = ...
NOTES:
line 2
.
@NCIntentRef("intent2")
with method onMatch(...)
as its callback, line 5
.
'~'
and non-conversational if it uses '='
symbol in its definition. If term is conversational, the matching algorithm will look into the conversation
context short-term-memory (STM) to seek the matching entities for this term. Note that the terms that were fully or partially matched using entities from
the conversation context will contribute a smaller weight to the overall intent matching weight since these terms are less specific.
Non-conversational terms will be matched using entities found only in the current user input without looking at the conversation context.
onMatch(...)
will be called when this intent is the best match detected.
min=1, max=1
quantifiers by default.
min=1, max=1
) entity with ID x:time
whose element is defined in the model.
min=1, max=1
) entity with entityopennlp:location
.
What time is it now in New York City?
Show me time of the day in London.
Can you please give me the Tokyo's current date and time.
{% scaladoc NCPipeline NCPipeline %} processing result is collection of {% scaladoc NCVariant NCVariant %} instances.
{% scaladoc nlp/parsers/NCSemanticEntityParser NCSemanticEntityParser %} is used for following example configured via JSON file.
Let's consider the input text 'A B C D'
and the following elements defined in our model:
"elements": [ { "id": "elm1", "synonyms": ["A B"] }, { "id": "elm2", "synonyms": ["B C"] }, { "id": "elm3", "synonyms": ["D"] } ],
All of these elements will be detected but since two of them are overlapping (elm1
and
elm2
) there should be two parsing variants at the output of this step:
elm1
('A', 'B') freeword
('C') elm3
('D')freeword
('A') elm2
('B', 'C') elm3
('D')Note that initially the system cannot determine which of these variants is the best one for matching - there's simply not enough information at this stage. It can only be determined when each variant is matched against model's intents. So, each parsing variant is matched against each intent. Each matching pair of a variant and an intent produce a match with a certain weight. If there are no matches at all - an error is returned. If matches were found, the match with the biggest weight is selected as a winning match. If multiple matches have the same weight, their respective variants' weights will be used to further sort them out. Finally, the intent's callback from the winning match is called.
Although details on exact algorithm on weight calculation are too complex, here's the general guidelines on what determines the weight of the match between a parsing variant and the intent. Note that these rules coalesce around the principle idea that the more specific match always wins:
Whether the intent is defined directly in @NCIntent
annotation or indirectly via @NCIntentRef
annotation - it is always bound to a callback method:
@NCIntentTerm
annotation.
@NCIntentTerm
annotation marks callback parameter to receive term's entities. This annotations can
only be used for the parameters of the callbacks, i.e. methods that are annotated with @NCIntnet
or
@NCIntentRef
. @NCIntentTerm
takes a term ID as its only mandatory parameter and
should be applied to callback method parameters to get the entities associated with that term (if and when
the intent was matched and that callback was invoked).
Depending on the term quantifier the method parameter type can only be one of the following types:
Quantifier | Scala Type |
---|---|
[1,1] |
NCEntity |
[0,1] |
Option[NCEntity] |
[1,∞] or [0,∞] |
List[NCEntity] |
For example:
NCIntent("intent=id term(termId)~{# == 'my_ent'}?") private def onMatch( ctx: NCContext, im: NCIntentMatch, @NCIntentTerm("termId") myEnt: Option[NCEntity] ): NCResult = { ... }
NOTES:
termId
has [0,1]
quantifier (it's optional).
Option[NCEntity]
because the
term's quantifier is [0,1]
.
NCRejection
and NCIntentSkip
Exceptions There are two exceptions that can be used by intent callback logic to control intent matching process.
When {% scaladoc NCRejection NCRejection %} exception is thrown by the callback it indicates that user input cannot be processed as is. This exception typically indicates that user has not provided enough information in the input string to have it processed automatically. In most cases this means that the user's input is either too short or too simple, too long or too complex, missing required context, or is unrelated to the requested data model.
{% scaladoc NCIntentSkip NCIntentSkip %} is a control flow exception to skip current intent. This exception can be thrown by the intent callback to indicate that current intent should be skipped (even though it was matched and its callback was called). If there's more than one intent matched the next best matching intent will be selected and its callback will be called.
This exception becomes useful when it is hard or impossible to encode the entire matching logic using only declarative IDL. In these cases the intent definition can be relaxed and the "last mile" of intent matching can happen inside of the intent callback's user logic. If it is determined that intent in fact does not match then throwing this exception allows to try next best matching intent, if any.
Note that there's a significant difference between {% scaladoc NCIntentSkip NCIntentSkip %} exception and model's {% scaladoc NCModel#onMatchedIntent-946 NCModel#onMatchedIntent %} callback. Unlike this callback, the exception does not force re-matching of all intents, it simply picks the next best intent from the list of already matched ones. The model's callback can force a full reevaluation of all intents against the user input.
IDL Expressiveness
Note that usage of NCIntentSkip
exception (as well as model's life-cycle callbacks) is a
required technique when you cannot express the desired matching logic with only IDL alone.
IDL is a high-level declarative language and it does
not support a complex programmable logic or other types of sophisticated matching algorithms. In such cases, you can
define a broad intent that would broadly match and then define the rest of the more complex matching logic in the callback
using NCIntentSkip
exception to effectively indicate when intent doesn't match and other
intents, if any, have to be tried.
There are many use cases where IDL is not expressive enough. For example, if your intent matching depends
on financial market conditions, weather, state from external systems or details of the current user geographical
location or social network status - you will need to use NCIntentSkip
-based logic or model's callbacks to support
that type of matching.
NCContext
Trait {% scaladoc NCContext NCContext %} trait passed into intent callback as its first parameter. This trait provide runtime information about the model configuration, request, extracted tokens and all entities variants, conversation control trait {% scaladoc NCConversation NCConversation %}.
NCIntentMatch
Trait {% scaladoc NCIntentMatch NCIntentMatch %} trait passed into intent callback as its second parameter. This trait provide runtime information about the intent that was matched (i.e. the intent with which this callback was annotated with).
Entity
FunctionsText
FunctionsMath
FunctionsCollection
FunctionsMetadata
FunctionsDatetime
FunctionsRequest
FunctionsOther
Functions