# Indexing Fields As many have known, ElasticSearch is a document-based database. Its metadata is stored in indexed fields, and is regularly updated by the indexing process. Open up a document and check out the variety of fields ``` elastic "HASBROWSERCHILDREN_BOOL_INDEXED": false, "DISPLAYTEMPLATE_STRING_INDEXED": "", "BINNED_BOOL_INDEXED": false, "CANMOVE_RID_INDEXED": [ "_" ], "WORKFLOWSTATUS_STRING_INDEXED": "CUSTOM_ST_NT", "INDEXCOMPUTEDATE_DATE_INDEXED": "2024-06-27T02:52:06.300Z", "PURPOSE_STRING_INDEXED": "OR1ND000001488161", "CANDELETE_RID_INDEXED": [ "_" ], ``` ## Data types A field’s data type determine how Elasticsearch indexes and searches the data * **BOOL**: for boolean value * **STRING**: for whole string value (eg., ID, Type…) * **DATE**: for Datetime value * **TEXT**: for processed text values. These texts would be tokenized / stemmed / normalized according to the attached analyzer * Used for fields that support freetext search * Usually come with LANGUAGE suffix (eg., TITLE_TEXT_INDEXED_ENGLISH) to support Translation * **RID**: for lis-structured value * **INT** / **LONG**: for numeric value * **NGRAM**: for autocompletion purposes * **FACET**: for filters * Others How each data type is processed? Check out the mappings & settings in * GET `cortex_doc/_mappings` * GET `cortex_doc/settings` ``` elastic "_STRING_INDEXED": { "match": "*_STRING_INDEXED", "mapping": { "normalizer": "ol_lowercase_normalizer", "type": "keyword" } ``` ``` elastic "ol_lowercase_normalizer": { "filter": [ "lowercase" ], "type": "custom" }, ``` ## CoreFields & Precomputed Fields Many CoreFields are and Precomputed fields (FT_DOCUMENTSFREETEXT_Common columns) are directly indexed in ElasticSearch. The mappings between them can be derived from the Full Indexing query: `SearchIndexing_BO.Data.SQL.IndexingQueries.Full` ```mermaid graph LR A(DO) ---> B((Indexing)) --->E[(ElasticSearch)] C(FT) --->B A-->C ``` Besides from the definition in Full Indexing query, CoreFields can also have **additional indexing flavors**, defined in the Field parameter structure ![Screenshot_1](https://hackmd.io/_uploads/B12UX_58A.png) * You want a field to be available for freetext search? Give it a TEXT_INDEXED flavor. The field content will be tokenized / stemmed / normalized and ready for the freetext search ## Custom fields Quick recap: there is a variety of custom fields * **Standard fields**: straightforward - index the values * **Authority fields**: based on `KEYWORDSNATIVERECORDID_RID_INDEXED` * **Linked fields**: index the RecordID * **Computed fields / Smart fields**: no indexing | Field | SQL | SearchIndexing | Facets | | --------- | ------- | ------------------ | ------ | | Standard field | FV | `X_INDEXED: [Value]` | `X_FACET_INDEXED: [Value]` | | Linked field | FY | `X_INDEXED: [RID]` | `X_FACET_INDEXED: [RID]` | | Authority field | DK | `X_INDEXED: [Label]`, `KEYWORDSNATIVERECORDID_RID_INDEXED` | `FACETS_RID_INDEXED` | The indexing of custom fields are controlled by a handful of checkboxes in the field configuration panel. ### Searchable Individually * Index the content of the field => allow individual searching on this field * Format: `[FieldID]_[DataType]` | Field | UI | Index | | -------- | -------- | -------- | | Standard field | ![image](https://hackmd.io/_uploads/r1pjHd9I0.png)| `"X1VND000000019443_TEXT_INDEXED_ENGLISH": "xyz"` | | Linked field | ![image](https://hackmd.io/_uploads/BkhSS_qIA.png)| `"X1VND000000008002_TEXT_INDEXED_ENGLISH": "X1VCT0001301"` | | Authority field | ![image](https://hackmd.io/_uploads/S1JZ8OqLR.png)| `"GF3ND000000000038_TEXT_INDEXED_ENGLISH": "Persia"` | *If your Authority field is being textually indexed with RecordID, it could be due to language issue * Allow searching on Advanced Search ![image](https://hackmd.io/_uploads/S1-E8uqL0.png) * Allow searching via Search API ``` http query=Geography.Country:Persia AND test.TestString:xyz AND test.TestContact:X1VCT0001301 &fields=Geography.Country&fields=test.TestString&fields=test.TestContact ``` ### Searchable In Freetext * Index the display content of the field => allow freetext searching on this field ![image](https://hackmd.io/_uploads/B1L9p2jIR.png) * We have a dedicated indexing field `USERFIELDSFREETEXT_TEXT_INDEXED_ENGLISH` for all fields with SearchableInFreetext = true. * The display content is dependent on the nature of the field: * Authority field: use the label of the keyword(s) * Linked field / Related field: use the template of the object(s) ``` elastic "USERFIELDSFREETEXT_TEXT_INDEXED_ENGLISH": "xyz Beijing Persia Stephen abcd Grant" ``` ### Authority fields * Authority fields have a dedicated indexing field for them: `KEYWORDSNATIVERECORDID_RID_INDEXED` ``` elastic "KEYWORDSNATIVERECORDID_RID_INDEXED": [ "X1VND000000019351X1VKW000000001651", "GO1ND000000004246X1VKW000000004201", "GO1ND000000003330X1VKW000000004351", "GO1ND000000001892X1VKW000000004352", "GO3ND000000008816X1VKW000000004353", "GF3ND000000000056X1VKW000000004751", "GF3ND000000000038X1VKW000000005152", # authority field "X1VKW000000005259", # tag "X1VKW000000005260" ], ``` * Format: * Tag: simple index the `KeywordRID` * Authority field: `[FieldID][KeywordRID]` * Usage: allows user to search on different criteria * Search for a tagged keyword: ![image](https://hackmd.io/_uploads/ryVltYcI0.png) * Search for a keyword on a specific field: ![image](https://hackmd.io/_uploads/HJs3dYq80.png) ## Filters (aka Facets) ### How it works Concepts to understand: * SearchCriteriaContainer_VForm * SearchCriterion_VForm Filters worked based on `SearchCriterionFacet_VForm` ```mermaid classDiagram SearchCriteriaContainer_VForm <-- SearchCriterion_VForm SearchCriterion_VForm <|-- SearchCriterionFacet_VForm class SearchCriterion_VForm{ +SearchCriterion +GetCriterionInternal() } ``` * SearchCriterionFacet_VForm.Data.**FieldsToSearch** * SearchCriterionFacet_VForm.Data.**FieldsType** ![image](https://hackmd.io/_uploads/rJRO3FcU0.png) * Each facet apply its filter value on a certain field. ``` elastic { "query_string": { "query": "((FRONTENDMAINFACET_STRING_INDEXED:\"Containers>>Folders\"))", "default_operator": "AND" } }, ``` * Filter flow ``` mermaid sequenceDiagram SearchCriterionFacet_VForm ->>Search_BO: InjectFacetConfigs rect rgb(191, 223, 255) Search_BO->>Search_BO: BuildQueryTable Search_BO->>ElasticSearch : ExecuteQuery end ElasticSearch->>Search_BO:results Search_BO->>Search_BO:Extract Facets & Statistics SearchCriterionFacet_VForm ->>Search_BO: Get data for render ``` ### Adding a field to Filter A new `SearchCriterionFacet_XXX_VForm` will be generated, with its parameters (particularly **FieldsToSearch**) configured to match the field we just added ```mermaid classDiagram class SearchCriterionFacet_WorkflowStatus_AAA_VForm{ <<CoreField>> +FieldsToSearch: WORKFLOWSTATUS +FieldsType: _STRING_INDEXED } class SearchCriterionFacet_BBB_VForm{ <<Standard / Linked Field>> +FieldsToSearch: [FieldID] +FieldsType: _FACET_INDEXED } class SearchCriterionFacet_CCC_VForm{ <<Authority Field>> +FieldsToSearch: FACETS +FieldsType: _RID_INDEXED +KeywordLinkType: [FieldID] +KeywordCategoryCode: FilterC } ``` #### CoreFields * Adding a CoreFields to Filter is equal to turning ON an Indexing flavor in its configuration * This flavor will then be used in the newly created `SearchCriterionFacet_VForm` * Some facets have dedicated logic (eg., `SearchCriterionFacet_WorkflowStatus_VForm`). It allows these facets to collect / parse / present differently. #### Standard fields & Linked fields * Generate a new field with name `X_FACET_INDEXED` (eg., `X1VND000000019443_FACET_INDEXED`) #### Authority fields On filters, the Authority fields have a dedicated indexing field for them: `FACETS_RID_INDEXED`. This field structure is similar to `KEYWORDSNATIVERECORDID_RID_INDEXED`, but only includes fields that are added to Filter sidepanel. ``` elastic "FACETS_RID_INDEXED": [ "X1VND000000019621X1VKW000000005401", "X1VKW000000005401", "GF3ND000000000038X1VKW000000005152", "X1VKW000000005152", ], ``` Thus, all filter of Authority fields will have FieldsToSearch = `FACETS_RID_INDEXED`. They have 2 more parameters to help them group the keywords by FieldID * SearchCriterionFacet_VForm.Data.**KeywordLinkType**: contains FieldID * SearchCriterionFacet_VForm.Data.**KeywordCategoryCode**: allow an even more granular control ### Indexing status As described above, adding a field to a filter will create new indexing fields. Thus, we will have to wait for all assets to be reindexed (on the weekend full indexing - or whole week continous indexing) to have all the data available. - The indexing status is stored in `I2_INDEXINGFIELDSTATUS`. Its CRUD is controlled by `IIndexingFieldStatusRepository`. - FieldID is used for Core/Standard/Linked fields. FacetCategory is used for Authority fields. ![image](https://hackmd.io/_uploads/BJEHywGDC.png) - Fields that haven't been used for a long while will stopped being indexed # Questions 1. Q: How about **Inherited fields**? A: It is indexed similarly to a Standard field. 2. How about **Related fields**? A: It is indexed similarly to a Standard field, most cases in list structure if we combine values from multiple related assets 3. Q: I updated the field Status of a Document. I reindexed it. Index successful. But the document indexed content still does not have the updated value. Why? A: Each CoreField has a property **CheckExistenceBeforeIndexing**. If it is ON, that field will not be indexed if it does not exist on the subtype of the Document. 4. Q: Besides from Search & Filters, what other modules are dependent on indexing fields? A: Permission, Shares, Visibility (aka Purposes), Stack, CMS... 5. Q: How does **Seethru** work? A: Based on 2 indexing fields: * When ON: using `FOLDERALL@RECORDID_RID_INDEXED`. This field stores all ancestors of an assets. * When OFF: adding `LINKEDRECORDSWITHTYPEANDDIRECTION_RID_INDEXED`. This field stores all relations of an assets 6. Q: What control the **Sort Order** of search results? A: `View & Sort` provides the existing sort options. If we need a new sort order, 1st: make sure that field is indexed, 2nd: add new entries in DO_DOCUMENTS_TBO.SortOrders.TableContent 7. Q: If I **remove a field**, or even permanently delete it, what happen to the indexed content? A: Upon removal, a field is put to Legacy panel. All data is still intact. Thus we cans till search on this field, on both UI and API. Upon permanent deletion, the field is no longer available on both UI and API. Still, all data is still intact. 8. Q: **Linked / Related field conundrum**: object X has a linked field `X.MyField = [Document.Title]`, referring to object A. When object A is modified (eg., renamed), X should also be reindexed. A: the mechanism for this is **ExpandObjectsToObjectsToCompute**, which work based on OE queue. ![image](https://hackmd.io/_uploads/SktHzTo8C.png) 9. Q: Why add `X_FACET_INDEXED`? Why don't we just use the `X_TEXT_INDEXED_ENGLISH`? A: Cannot. `FACET_INDEXED` is similar to `RID_INDEXED`, it contains wholestring values. It allows grouping, while `TEXT_INDEXED` does not. And filter mechanism is based on grouping (aka **aggregations** in ElasticSearch) 10. Q: Mappings of `STRING_INDEXED` & `RID_INDEXED` is exactly the same. Why these 2 different field types? A: ElasticSearch detect the input values (**string vs array**) and automatically structure them. But basically these 2 field types are not different, sending an array to a `STRING_INDEXED` would also result in same result (indexed as array). Have more questions? Feel free to raise to the Foxes!!