# Django Search Engine: Haystack + Elastic Search Haystack 是 Django 的第三方搜尋引擎整合套件,他可以利用類似 Model 的語法來建構 Search Index。並且在搜尋的時候,採用的方法類似 Django db.models 內建的 QuerySet 語法。 - Origin QuerySet ```python course_result = Course.objects.filter(condition) ``` - SearchQuerySet ```python sqs = SearchQuerySet().models(Course) sqs.filter(condition) ``` 在使用 Elastic Search 之前,必須確認有安裝 JAVA。官方推薦的版本: recommended that you use the Oracle JDK version 1.8.0_73 `java -version` `echo $JAVA_HOME` - [How to install JAVA with apt-get on Ubuntu 16.04](https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04) ## Download Elastic Search https://www.elastic.co/downloads/past-releases/elasticsearch-2-4-5 :::warning Haystack 對於 ElasticSearch(ES) 的支援目前只到 1.X, 2.X 尚未支援 5.X 的版本。 http://django-haystack.readthedocs.io/en/v2.6.0/backend_support.html#id2 ::: ## Install Python, Django 相關套件 - `pip install django-haystack` - `pip install elasticsearch` :::warning 要注意不要裝成 pip install haystack, 這會造成 ImportError ::: ## Execute Elastic Search `cd elasticsearch-2.4.5/bin` `./elasticsearch` #### Elastic Search 基礎設置 - PORT: 9200 Also note the line marked http with information about the HTTP address (192.168.8.112) and port (9200) that our node is reachable from. By default, Elasticsearch uses port 9200 to provide access to its REST API. This port is configurable if necessary. ## 操作步驟 - 在 settings.py 設定好 Haystack & Search Engine Backend ([reference](http://django-haystack.readthedocs.io/en/v2.6.0/tutorial.html#configuration)) - 開啟 Search Engine ``` $ cd elasticsearch-2.4.5/bin $ ./elasticsearch ``` - 替需要搜尋的 Model 撰寫 Index 物件 - 第一次使用 Search Engine 的時候要建立 Index,他是基於 DB 建立的,所以你換 DB 的時候也要做這個步驟。 ``` $ ./manage.py rebuild_index ``` - 便可以開始撰寫 Application ## Programming Example 1. 利用 Template, 與內建 search 功能: https://techstricks.com/django-haystack-and-elasticsearch-tutorial/ 2. 官方的Example Project:https://github.com/django-haystack/django-haystack/tree/master/example_project 這個 Project 重要的是有實作 ForeignKey 的 Index 該怎麼寫 3.利用 Template 與內建 search 功能: https://krzysztofzuraw.com/blog/2016/haystack-elasticsearch-part-two.html 在使用這個套件時,你可以為搜尋的頁面撰寫特製的 Template 來顯示結果:http://django-haystack.readthedocs.io/en/v2.4.1/tutorial.html#search-template 當然,你也可以使用 SearchQuerySet 來操作物件。我個人比較偏向這種使用方式,因為可以跟現有 API 結合,比較有彈性。 並且,你可能會透過特殊的 filter 條件來操作 SearchQuerySet。像是我們要不只從一個欄位中來搜尋資料,假設要從 Index 中的 `title` field & `materials` field 搜尋關鍵字 ```python search_indexes.py class CourseIndex(indexes.SearchIndex, indexes.Indexable): text = indexes.CharField(document=True, use_template=True) title = indexes.CharField(model_attr='title') author = indexes.CharField(model_attr='owner') materials = indexes.MultiValueField() category = indexes.MultiValueField() # Sorting Field upload_time = indexes.DateTimeField(model_attr='create_time') comment = indexes.IntegerField(model_attr='num_of_comments') collect = indexes.IntegerField(model_attr='num_of_collectors') def get_model(self): return Course def index_queryset(self, using=None): """Used when the entire index for model is updated.""" return self.get_model().objects.all() def prepare_materials(self, obj): return [material.material_name for material in obj.teachingmaterial_set.all()] def prepare_category(self, obj): return [category.id for category in obj.category.all()] ``` ```python # query_items 是一個 keyword 組成的關鍵字 list title_filter = reduce(operator.or_, (SQ(title__contains=x) for x in query_items)) materials_filter = reduce(operator.or_, (SQ(materials__contains=x) for x in query_items)) result_sqs = result_sqs.filter(title_filter | materials_filter) ``` ## Real-Time Search Real-Time Search 指的是當你指定的 Index Model 有變更的時候,Search Engine 就會自動的 update index 以便搜尋。 reference: http://django-haystack.readthedocs.io/en/v2.6.0/signal_processors.html?highlight=RealtimeSignalProcessor#realtime-realtimesignalprocessor