Text extraction - language issue
I am defining an Entity extraction with a Ruta script. I defined my entity with support of ALL languages.
My rule detect a pattern like 12-1234-1234567-12. This work fine (in the test the entity is detected) only if in my test I add some words. If I just test the patter, the entity is not recognized.
As an example:
Test 1: "12-1234-1234567-12" -> entity not detected
Test 2: "Account 12-1234-1234567-12" -> entity detected
This Entity Extraction will be used in an email channel and users might just sent and email with the bank account without any word.
Let me know if it's clear or you need more information.
***Edited by Moderator: Lochan to update platform capability tags***
Yes, that is correct behavior. We cannot expect natural language processing to detect language from just numbers. It has to be a semantic statement in a given language.
However, there is a workaround for your problem. In your case, Text analyzer fails to detect language. You can force Text Analyzer to fallback to a language if language is undetected. This setting is found on 'Advanced' tab on Text Analyzer - Enable fallback language if the language is undetected
Keep up to date on this post and subscribe to comments