## Match Instructions
Here is a rough sketch of our model architecture when hosting

The main scaling targets, in order, are `QDRANT`, `EMBEDDING SERVER`, and `SERVER`.
## Docker images
| SERVICE | DOCKER HUB LINK |
| --- | --- |
| Embedding| https://hub.docker.com/repository/docker/arguflow/embeddings/
| Chat UI| https://hub.docker.com/repository/docker/arguflow/chat/
| Search UI| https://hub.docker.com/repository/docker/arguflow/search/
| Server| https://hub.docker.com/repository/docker/arguflow/server/
| Local AI LLAMA SERVER| https://hub.docker.com/repository/docker/arguflow/local-ai
## ENV Flags
ENV flags are evaluated at runtime for all, they can be evaluated via a .env file or by adding environment variables before the runtime.
### SERVER
| ENV | DESCRIPTION |
|-----------| ---- |
| REDIS_URL | REDIS URI including port |
| QDRANT_URL| QDRANT URI should be port on 6333 |
| QDRANT_API_KEY| API_KEY for qdrant |
| DATABASE_URL| DATABASE URI for postgresql |
| SENDGRID_API_KEY| NOT REQUIRED to be correct, used for password reset |
| SENDGRID_EMAIL_ADDRESS| The email address to send emails from|
| OPENAI_API_KEY| api key if using openai or localai |
| SECRET_KEY| A 64 character long string used for the argon2 password hashing algorithm|
| SALT|Used to salt the password in the database, should be a relatively secure string|
| S3_ENDPOINT| The url endpoint for s3 or s3 compatible service |
| S3_ACCESS_KEY| The access key for s3 or s3 compatible service |
| S3_SECRET_KEY| The secret key for s3 or s3 compatible service|
| S3_BUCKET| The name of the s3 bucket to store data files |
| COOKIE_SECURE| "true" or "false", if "true" cookies require https |
| ALERT_EMAIL| Email to send alerts |
| DOCUMENT_UPLOAD_FEATURE| "on" or "off" if off Documents will not be able to be uploaded from the UI |
| DOCUMENT_DOWNLOAD_FEATURE| "on" or "off" if off Download will be disabled for documents|
| ONLY_ADMIN_CAN_CREATE_CARDS| "on" or "off" if on only the admin user defined below will be allowed to create cards |
| RAG_PROMPT| OPTIONAL, by default this is `"Write a 1-2 sentence semantic search query along the lines of a hypothetical response to: \n\n"`. Feel free to change for your use-case.|
| N_RETRIEVALS_TO_INCLUDE| How many retrivals to fetch during RAG search |
| QDRANT_COLLECTION| The qdrant collection name |
| ALWAYS_REQUIRE_AUTH| "on" or "off" Requires that the user must be signed in to do a search. Server will return 401 on all routes except login routes |
| ADMIN_USER_EMAIL| Default account to be created |
| ADMIN_USER_PASSWORD| Default account password to be created |
| DUPLICATE_DISTANCE_THRESHOLD| OPTIONAL, number between `0` and `1`. This defaults to `0.95` if not set. We don't recommend setting it to be higher than `0.99` |
| PARSER_COMMAND| Command to run for the parser script, expects a relative path from the root. May need to extend Dockerfile if script is not nodejs |
| USE_EMBED_SERVER| 0 or 1, If 0 then the server will use the embedding server as described in `EMBEDDING_SERVER_CALL` |
| EMBEDDING_SERVER_CALL| Route to call the server, with our embedding server this is http://localhost:7070/encode |
| EMBEDDING_SIZE| Size of the embeddings |
| OPENAI_BASE_URL| BASE url for LLM inference, for local AI it is "http://localhost:8080/v1" |
|MINIMUM_CARD_CHAR_LENGTH | OPTIONAL, The min requirement length for a chunk to be accepted in characters. Default 0 characters |
|MAXIMUM_CARD_CHAR_LENGTH | OPTIONAL, The largest that a chunk can be in characters. Default 29000 characters |
|MINIMUM_CARD_WORD_LENGTH | OPTIONAL, The min requirement length for a chunk to be accepted. Default 0 words |
|MAXIMUM_CARD_WORD_LENGTH | OPTIONAL, The largest that a chunk can be in characters. Default 5000 words |
### SEARCH
*note: process takes ~7 seconds to startup after changing ENV's due to having to call `yarn build` to set ENV*
| ENV | DESCRIPTION |
|-----|-------------|
| PUBLIC_API_HOST|The URL for the API (ex: https://api.arguflow.ai/api), must include `/api`|
| PLAUSIBLE_HOST|OPTIONAL, if you are using plausible tracking then this would be the host name of your plausible domain|
| PUBLIC_FILTER_ITEMS|OPTIONAL, this is hard to set if you don't know what the metadata of a doc chunk looks like. Here is an example though `'[{"name":"Tag Set","comboboxItems":[{"name":"email"}]},{"name":"email_cc","comboboxItems":[]},{"name":"email_from","comboboxItems":[]},{"name":"email_to","comboboxItems":[]}]'`|
| PUBLIC_CREATE_EVIDENCE_FEATURE|can be `"on"` or `"off"`, we recommend setting this to `"off"`|
| PUBLIC_DOCUMENT_UPLOAD_FEATURE|can be `"on"` or `"off"`, we recommend setting this to `"off"`|
| PUBLIC_SEARCH_QUERIES|array of strings like `["hello world"]` use for typewriter and delete effect in search bar, we recommend setting this to `[]`|
| PUBLIC_LUCKY_ITEMS|array of strings like `["good", "bad"]`, we recommend setting to `""` to disable|
| PUBLIC_FRONTMATTER_VALS|keys of the JSON metadata for a document chunk (aka `card`) that you want to display in the UI, i.e. `["author,email_sent,email_subject,email_cc]`|
| PUBLIC_LINES_BEFORE_SHOW_MORE|this sets how many lines of a document chunk to show in results, we recommend `4`|
| PUBLIC_SHOW_GITHUB_STARS|this determines whether or not our "star us" button will show, we recommend setting this to `"off"`|
| GITHUB_TOKEN|OPTIONAL, you only need this is you want our "star us" button to work|
| DATASET|OPTIONAL, you likely want to set this to the name of your dataset i.e. `"enron"`|
| OG_DESCRIPTION|OPTIONAL, the opengraph description. Can likely set this to `""`|
| PUBLIC_ALWAYS_REQUIRE_AUTH|This determines whether or not SSR will apply, we recommend setting this to `"on"`|
| PUBLIC_IMAGE_RANGE_START_KEY|If you have a BATES range for a given document chunk, this is the key of the JSON metadata for where that range starts, i.e. `"doc_beg"` . The value of the key can be anything that contains a number like `MATCH00033`.|
| PUBLIC_IMAGE_RANGE_END_KEY|If you have a BATES range for a given document chunk, this is the key of the JSON metadata for where that range ends, i.e. `"doc_end"` . The value of the key can be anything that contains a number like `MATCH00036`.|
| PUBLIC_DATE_RANGE_VALUE|If your document chunks, have dates, this is the key of the JSON metadata for where that date is contained, i.e. `send_date`. The value of the key must have a value like MM/DD/YY for it to be picked up.|
### CHAT
*note: process takes ~7 seconds to startup after changing ENV's due to having to call `yarn build` to update ENVs*
| ENV | DESCRIPTION |
| --- | ----------- |
| VITE_FRONTMATTER_VALS|keys of the JSON metadata for a document chunk (aka `card`) that you want to display in the UI, i.e. `["author,email_sent,email_subject,email_cc]`|
| VITE_LINES_BEFORE_SHOW_MORE|this sets how many lines of a document chunk to show in results, we recommend `4`|
| VITE_DATASET|OPTIONAL, you likely want to set this to the name of your dataset i.e. `"enron"`|
| VITE_SEARCH_URL|The URL of the search UI, i.e. `search.arguflow.ai`|
| VITE_API_HOST|The URL for the API (ex: https://api.arguflow.ai/api), must include `/api`|
| VITE_YOUTUBE_EMBED_URL|OPTIONAL, if you have a demo video, you can place it here. We recommend setting to `""` if anything so it disables|
| VITE_PLAUSIBLE_HOST|OPTIONAL, if you are using plausible tracking then this would be the host name of your plausible domain|
| VITE_SHOW_GITHUB_STARS|This determines whether or not it shows our "star us" button, we recommend setting it to `"off"`|
## Additional Notes
### Server
The server can also store images if you are going to use `PUBLIC_IMAGE_RANGE_START_KEY` and `PUBLIC_IMAGE_RANGE_END_KEY`, the server expects for the images to be located mounted in /app/images.