Question

Elastic Search for reporting

Hi All in response to this posting https://community1.pega.com/community/product-support/question/elastic-search-alternative-exposing-properties-reporting which I can't comment on, hence this post.

I have managed to get elasticsearch working on one of our more expensive reports. However it's not working the way I want.

Basically it keeps using the option contains for pySearchMethod regardless of what I set it to.

I really want to use exact or failing that starts.

I am passing in param.pySearchMethod = "starts" or "exact" but I am still returning results found with "contains".

Anyone got any ideas how to remedy this?

The report filter conditions are quite complex :-

E AND (A OR B OR (B AND C AND D) OR (B AND C AND F) OR (B AND D AND F) OR (C AND D AND F))

But it does appear to work, except like I have mentioned, uses contains.

Thanks
Craig

***Edited by Moderator: Lochan to tag SR***

Group Tags

Comments

Keep up to date on this post and subscribe to comments

July 9, 2019 - 3:26am

Has no one got an answer to this issue?

July 11, 2019 - 4:26am

UPDATE

Seems the pySearchMethod gets ignored if relying on the filter criteria in the report definition and don't use or rather pass pySearchString as empty.  If we pass say a concatenation of the params from the report definition filters to pySearchString then the pySearchMethod is used correctly.

e.g.

param.CaseID + " + " +  param.InvestorID + " + " + param.PostCode + " + " + param.LastName + " + " + param.FirstName + " + " + param.Reference + " + " + param.NINo

Is this a bug?  Should there be a better way of doing this?

July 14, 2019 - 7:09am

Seems that if pySearchString is not used then the filter criteria in the report definition isn't used correctly.

For example, as per the above filter criteria E AND (A OR B OR (B AND C AND D) OR (B AND C AND F) OR (B AND D AND F) OR (C AND D AND F))

When they all have a value except A, and there are combinations of them, I am getting results where only 2 params match, say D AND F match but B and C don't.

What gives?

Surely I don't need to create a string to cater for these and pass that to pySearchString!

July 18, 2019 - 11:46am

Further UPDATE

Having changed PegaSearch.Searcher.ESQueryGenerator to Debug I can now see the ElasticSearch generated query string.

Seems the problem is related to spaces in parameters that are passed into the query definition.

So in my example :-
E AND (A OR B OR (B AND C AND D) OR (B AND C AND F) OR (B AND D AND F) OR (C AND D AND F))

If C was say a UK Postcode similar to M1 5EA it's treated as 2 separate strings not one.

Therefore we end up with this elasticsearch query string :-

(  (  ( ( C:m1 OR C:5ea ) AND D:jones )  OR  ( ( C:m1 OR C:5ea ) AND F:mike )  OR  ( D:jones AND F:mike )  OR  ( ( C:m1 OR C:5ea ) AND D:jones AND F:mike )  )  )  AND pxObjClass:Work-Svc*

Which is clearly wrong.  Is this a bug?  When this query is sent to the database the space in the parameters are not separated like this they are wrapped in quotes.  This is how elasticsearch should be.

Anyone?
 

UPDATE
I tried circumventing this behaviour by wrapping the parameter in quotes both single and double.  But they are stripped away before creating the query string.  Again this seems like a deliberate thing to do and from my perspective is a bug.

 

July 19, 2019 - 8:37am
Response to CraigA52

Have raised SR-D31617

Pega
August 2, 2019 - 8:58am
Response to CraigA52

Hi Craig,

This is and expected behavior of elastic search as the design of filtering is that it first tokenize a parameter and then construct expression with OR for all tokens. 

Thank You,

Pega
August 2, 2019 - 10:27am
Response to CraigA52

Hi Craig,

Observations: 

1. Each report definition define its filters

2. Every filter does NOT set property pyUseTokenizer. It is equal to null for each filter.

3. When filters are analyzed by search engine each filter is set to be analyze by Pega tokenizer if pyUseTokenizer is null.

4. If value has more tokens, (ex. "M1 5EA") then a query is created with OR operator between tokens (pyCategory:M1 OR pyCategory:5EA).

Solution:

We cannot change default behavior because it can destroy other working apps.

Instead we propose a workaround which will be presented in the documentation. A short note about workaround:

Before sending a report to pxRetrieveSearchData activity: 

1. Load a report definition

2. For all filters set pyUseTokenizer=false

3. Send changed report to pxRetrieveSearchData

Thank You,

August 7, 2019 - 6:46am
Response to dasn1

Hi Nandita,

That's an interesting work around.  Care to explain how I am supposed to do that exactly?

Thanks
Craig

Pega
August 7, 2019 - 7:31am
Response to CraigA52

Hi Craig,

Please set pyUseTokenizer=false for pyFilter with name=pyCategory in the virtual report definition
And use that virtual report definition in pxRetrieveSearchData by setting the property pyReportPageName
Then a query will be created without OR operator
And you will get correct results

Thank You,

August 7, 2019 - 8:00am

Hi Nandita,

Property pyUseTokenizer doesn't exist.  I see that activity pzSearchResolveReferences sets this in a java step.  Is there no other way of setting this?

Thanks
Craig