엘라스틱 서치 형태소 검색

# 엘라스틱 서치 형태소 검색 ###### tags: `tech sharing` ### 한글 형태소 분석기를 사용하기위해 nori 를 설치 - elasticsearch에 nori를 설치하고 빌드하여 이미지 생성 ``` FROM docker.elastic.co/elasticsearch/elasticsearch:7.6.2 ENV ES_BIN=/usr/share/elasticsearch/bin RUN $ES_BIN/elasticsearch-plugin install --batch analysis-nori ``` ### 인덱스 설정 - 인덱스 setting를 다음과 같이 설정 ``` "analysis": { "analyzer": { "korean": { "type": "custom", "tokenizer": "nori_user_dict" } }, "tokenizer": { "nori_user_dict": { "type": "nori_tokenizer", "decompound_mode": "mixed", "user_dictionary": "userdict_ko.txt" } } }, ``` #### decompound_mode ![](https://i.imgur.com/Rf34yN0.png) #### user_dictionary - ES의 config 폴더 내에 위치해야 한다. - 사전의 단어들에는 우선순위가 있으며 문장 "동해물과" 에서는 "동해" 가 가장 우선순위가 높아 "동해" 가 먼저 추출되고 다시 "물" 그리고 "과" 가 추출되어 "동해"+"물"+"과" 같은 형태가 된다. - 만약 userdict_ko.txt 파일안에 해물이라는 단어를 추가한다면 해물이 가장 우선순위가 높게되어, "동", "해물", "과"로 분석된다 ### 형태소 분리 확인 ``` GET nori_sample/_analyze { "analyzer": "korean", "text": "세종시" } ``` - 결과 : 세종, 시, 세종시 ### 인덱스 맵핑 - 형태소 분석을 할 property에 analyzer에 위에서 정의한 korean을 설정해주면 해당 프로퍼티를 검색할 때 해당 프로퍼티의 형태소를 분석하여 검색 결과를 나타낸다. ``` "mappings" : { "properties" : { "title" : { "analyzer" : "korean" } } } ```