# Solr dev notes - Current release: 8.8.2 - Documentation: https://solr.apache.org/guide/8_8/index.html - Solr schema design: https://solr.apache.org/guide/8_8/documents-fields-and-schema-design.html :::info **Important:** Solr is not a database. It is a search index. Content in Solr should generally be treated as ephemeral. ::: ## General - Two operating modes: stand alone and cloud. - Certain types of advanced query operations are only available on cloud installs - iSamples installations will use Solr Cloud - Interaction is through a web API with messaging in JSON (preferred), or XML. - A "collection" is basically the same as a Database in SQL. - A "collection" is defined by a combination of service configuration and a schema. - The schema defines field types and fields that are available in a collection. Fields may be defined as created on demand, though this is generally discouraged for our work. ## Collections - Collections can be created using the [collections API](https://solr.apache.org/guide/8_8/collections-api.html) or from the command line - Collections may be spread over multiple shards (per server) and across multiple servers (cloud) - Collections may be replicated across data centers - Collection structure can impact certain types of query operations (e.g. joins require single shard collections) ## Fields - Generally stick with the [recommended field types](https://solr.apache.org/guide/8_8/field-types-included-with-solr.html) - `String` fields are handled literally, `text` fields have various NLP analyzer actions applied (tokenization, stop words, synonym filter, etc.) - Fields may be single or multiple values - Field values may be stored or not, indexed or not, default values or not, ... - Field types of initial interest to iSamples: - StrField (e.g. identifiers) - TextField (e.g. abstract text, descriptions) - DatePointField - DateRangeField - FloatPointField (possibly DoublePointField is high precision required) - LatLonPointSpatialField - Possibly also: - BBoxField - SpatialRecursivePrefixTreeFieldType - Fields may be [copy fields](https://solr.apache.org/guide/8_8/copying-fields.html). e.g. to store a literal title (`title_str`) and a text title (`title_txt`), the text field can be set as a copy field using `title_str` as source. ## Adding / editing documents (records) - Every document must have a unique identifier - Sent to Solr `update/` endpoint. - Multiple documents can be sent per request - Documents are stored immediately but searchable only after commit - Generally commits are best managed by the server (after certain elapsed time, number of docs, size of backlog) - See tutorials and some code in [isb_lib](https://github.com/isamplesorg/isamples_inabox/blob/main/isb_lib/core.py#L44) and also the `scripts/sesar_things.py` and `scripts/geome_things.py` scripts in the [iSamples in a box repo](https://github.com/isamplesorg/isamples_inabox) - Deletes can be per document or by matching query, [tutorial example](https://solr.apache.org/guide/8_8/solr-tutorial.html#deleting-data) ## Searching Many options are available for searching Solr collections. Most will be simple queries using the [common query parameters](https://solr.apache.org/guide/8_8/common-query-parameters.html) [Faceting](https://solr.apache.org/guide/8_8/faceting.html) will also be commonly used. Faceting basically provides the unique values and their occurrence counts for a field. Other types of searches likely to be important to iSamples include: - [Spatial search](https://solr.apache.org/guide/8_8/spatial-search.html) - [Graph traversal](https://solr.apache.org/guide/8_8/graph-traversal.html) - [Parallel SQL](https://solr.apache.org/guide/8_8/parallel-sql-interface.html) ## OS X install Use [homebrew](https://brew.sh/). Basically: ``` brew install solr ``` Solr admin user interface: http://localhost:8983/ Start / stop (`-f`: foreground terminal, `-p`: port, default to 8983): ``` solr start -f -c ``` Start / stop (as a service): ``` brew services start solr ``` OS X Solr startup properties, in: ``` /usr/local/Cellar/solr/8.8.2/homebrew.mxcl.solr.plist ``` Default configuration: ``` /usr/local/Cellar/solr/8.8.2/server/solr/configsets/_default/conf ``` Create a core or collection (solr cloud mode) using default configuration: ``` solr create -c collection_name ``` ## Ubuntu setup Solr config: ``` /etc/default/solr.in.sh ZK_HOST="localhost:2181/solr" ZK_CLIENT_TIMEOUT="30000" SOLR_HOST="127.0.0.1" SOLR_WAIT_FOR_ZK="30" SOLR_PID_DIR="/var/solr" SOLR_HOME="/var/solr/data" LOG4J_PROPS="/var/solr/log4j2.xml" SOLR_LOGS_DIR="/var/solr/logs" SOLR_PORT="8983" ``` Setup zookeeper: ``` bin/solr zk mkroot /solr -z localhost:2181 server/scripts/cloud-scripts/zkcli.sh \ -z localhost:2181 \ -cmd bootstrap \ -solrhome /var/solr/data ``` Create a core: ``` sudo su - solr /opt/solr/bin/solr create -c isb_rel ``` Then set autocreate fields off, with: ``` solr config -c isb_rel -p 8983 \ -action set-user-property \ -property update.autoCreateFields \ -value false ```