# OCR - Implementation plan ## The goals of OCR implementation phase - Being able to search through uploaded documents and their attachments the same way as we do for CB native documents. - Have uploaded documents in text format, so we can later automatically analyze them and extract data points. ## Current state - we don't have OCR currently - we don't provide search for uploaded documents bodies - we don't provide search for uploaded documents' attachments bodies - we have full-text search mechanism implemented for CB native documents using PostgreSQL tsvector which we are going to reuse - we don't have data points extraction tool or solution implemented yet, but we should have it in mind during OCR implementation ## Acceptance criteria ### Phase 1 → MVP 1. OCR should work for newly uploaded and already stored: - Digital documents (.pdf and .docx) - Scanned documents (.pdf and images etc) - Annotated .pdf (already scanned and have some annotations) - Mix of text and images 2. The result of OCR should be good enough to make it possible to perform a full-text search and to use it later for data points extraction. 3. Adding OCR during document upload shouldn't extend this process by any significant time (the best solution to not extend this time at all). 4. Performing OCR shouldn't be too expensive, as we are going to have it enabled as a standard feature. 5. OCRed documents should have tsvector field generated, so we can perform a full-text search on them in the same way as we do for CB native documents bodies. 6. It should be possible to turn on/off OCR in support-tools (per user). 7. Users should be informed about OCR process progress and failure (simple indicator about "going to start"/"in progress"/"done or failure"). 8. Already stored documents should be also OCRed using migration. ### Phase 2 1. OCR should work for document's attachments: - Digital documents (.pdf and .docx) - Scanned documents (.pdf and .docx) - Annotated .pdf - Other types of files (images?) 2. It should be possible to search through attachments. 3. More advanced information about OCR process progress. ## Work plan 1. OCR document on upload save button (Denis and Danijel) - deeper research of shortlisted providers - sync/discuss with the rest of the company about provider we choose - connect provider API with our codebase (send, retry, return) - find a best way to store and fetch OCRed text 2. Setting up groundwork for search with attachments (Youenn and Michał) - feature flag - composing tsvector for OCRed documents (maybe concatenating bodies) 3. Frontend work (Jan) - informing user about OCR - adding OCR process information 4. FTS improvements (Youenn and Michał) - turn to combined search - improve quality of snippets (backend and frontend) Team split idea (5 developers + 1 on tech support): - OCR provider connectors (2 devs) - FTS finishers (2 devs) - Frontend beautifiers (1 dev) ## Roadmap [Untitled](https://www.notion.so/d34c6efb0d3840989fca6d50db2c28cb) ------------- # Implementation Plan # 1. First iteration - Create invoice ## General flow 1. User connects with QuickBooks integration through automation build page 1. Create a new company with `platformKey` param 2. Display Quickbooks OAuth popup window (link is in response payload from create new company) 3. Create an automation for automation builder (`Contractbook.Domain.Automation`) 4. QuickBooks connected with Contractbook 2. Create new automation flow with QuickBooks - Create an invoice 1. Create new trigger → Contract fully signed 2. Create new action → Create an invoice ## Frontend 1. Detect which automation is implemented through codat 1. Change the logic behind displaying Oauth popup 2. Implement Codat CreateInvoice Form 1. Requirements 1. Should have two views for a customer field 1. Find ![Zrzut ekranu 2022-05-10 o 13.52.08.png](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/fa81ddf7-c5ce-44eb-a264-2ee911538b69/Zrzut_ekranu_2022-05-10_o_13.52.08.png) ![Zrzut ekranu 2022-05-10 o 13.51.56.png](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/f5fd48b1-ad25-495e-97ea-08f958b20a8f/Zrzut_ekranu_2022-05-10_o_13.51.56.png) 2. Create ![Zrzut ekranu 2022-05-10 o 13.52.23.png](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/c458eef3-3e94-4357-aafc-850c6f3e75da/Zrzut_ekranu_2022-05-10_o_13.52.23.png) 2. Should have fields similar to the one in quickbooks for creating a invoice ![Zrzut ekranu 2022-05-10 o 13.52.35.png](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a41432bc-4c01-4606-bfb4-0763e26324df/Zrzut_ekranu_2022-05-10_o_13.52.35.png) 2. TBD after the designs ## Backend 1. Configure Codat authentication environment variables 1. As part of this task we should configure all needed environment variables on BE (`CODAT_API_KEY`) 2. Configure redirect_url in Codat portal → [https://docs.codat.io/docs/authentication-redirect](https://docs.codat.io/docs/authentication-redirect) 2. Add scaffolding for the QuickBooksCreateCustomer action in the backend 1. `get_variables` should initially return [], `is_configuration_valid` should initially return `true`and `execute` should initially not have its logic implemented. 2. Example: https://github.com/Contractbook/api/pull/3702 3. Add new table `codat_integrations` 1. Columns 1. profile_id 2. application 3. company_id 4. status 5. connection_id 2. Modules 1. `Contractbook.Domain.CodatIntegration` 2. `Contractbook.Domain.CodatIntegration.Loader` 1. `fetch_by_application_and_profile_id/2` 3. `Contractbook.Domain.Integration.Mutator` 1. `create/3` 2. `update/2` 3. `changeset/2` 4. Implement `Contractbook.Domain.CodatIntegration.UseCase` 1. `run_api` → similar as `run_with_access_token` in `Contractbook.Domain.Integration.UseCase` 2. `create_or_update_integration` 5. Implement `POST /codat-integrations` 6. Implement `GET /codat-integrations` 7. Implement action `is_configuration_valid?` function 1. The function should check if config is correct. 2. https://github.com/Contractbook/api/pull/3741 8. Implement `get_variables` function 1. TBD 9. Implement `UseCase.create_accounting_item` 1. [https://docs.codat.io/reference/createitems](https://docs.codat.io/reference/createitems) 10. Implement `UseCase.all_accounting_items` 1. [https://docs.codat.io/reference/listitemspaged](https://docs.codat.io/reference/listitemspaged) 11. Implement `UseCase.find_customer` 1. call Codat endpoint to find customer based on name (like in quickbooks) 2. return customers data → filter out with query param 3. [https://docs.codat.io/reference/listcustomerspaged](https://docs.codat.io/reference/listcustomerspaged) 12. Implement `UseCase.create_customer` 1. prepare params for create or find customer 2. call Codat endpoint to create customer 3. return customer data 4. [https://docs.codat.io/reference/createcustomers](https://docs.codat.io/reference/createinvoices) 13. Implement `UseCase.create_invoice` 1. prepare params for create invoice 2. For invoice creation we need a Customer object first 1. Use `UseCase.find_or_create_customer` 3. For invoice creation we need a list of accounting items 1. Use `UseCase.all_accounting_items` 2. Use `UseCase.create_accounting_item` 4. call Codat endpoint to create invoice based on the email 5. [https://docs.codat.io/reference/createcustomers](https://docs.codat.io/reference/createinvoices) 14. Implement action `execute` function 1. resolve params 2. create customer via Codat API 3. resolve the variables that should be available to further actions 4. Example: https://github.com/Contractbook/api/pull/3780 # Links