Wordwise Documentation

# Wordwise Documentation ## Overview Wordwise is a versatile product designed to facilitate communication with databases and documents using chat-like interactions. It enables users within organizations to interact seamlessly with their data and documents, leveraging advanced language models and machine learning capabilities without exposing sensitive information to external APIs. The product caters to both business and technical users, offering functionalities tailored to their respective needs. ### Key Features - **Informative and Analytical Questions:** Wordwise supports answering both informative questions, which retrieve data from uploaded documents or the company's data architecture, and analytical questions that involve data analysis or prediction using machine learning models. - **User Roles:** - **Business User:** Senior managers or executives who utilize Wordwise for data-driven decision-making. - **Technical User:** Personnel responsible for technical aspects such as integrating data sources, uploading machine learning models, and managing document uploads. - **Workflow Capabilities:** - **ML Model Upload:** Allows technical users to upload pre-trained machine learning models via a REST API interface, facilitating integration into Wordwise with Docker containerization. - **Database Integration:** Enables technical users to connect databases securely, extract metadata, and utilize the structured data within Wordwise. - **Document Upload:** Supports uploading and processing of documents, converting them into text for further analysis and integration. - **Granular Access Control:** Facilitates project-based access control, ensuring that business users only access resources relevant to their assigned projects. - **Virtual Data Analyst:** Utilizes advanced language models to generate Python code for answering analytical questions, incorporating rule-based natural language generation (NLG) for user-friendly outputs. - **Technical Stack:** - **Backend:** Developed using Django framework for robustness and scalability. - **Frontend:** Utilizes React for a modern and responsive user interface. - **Databases:** PostgreSQL for application data and Weaviate for vectorized document storage. - **Integration and Scripts:** Python scripts manage various backend functionalities. - **AI Models:** Powered by GPT-4-turbo for language understanding and code generation. - **Supporting Technologies:** Django Channels for WebSocket communication, Llama Index for PDF indexing, and Phi3 for semantic search. ## Technology Rationale In developing Wordwise, we carefully selected a suite of advanced technologies to ensure robustness, efficiency, and scalability. Below is an explanation of why we chose specific tools like GPT-4-turbo, Llama Index, Phi3, and Weaviate, comparing them with other competing technologies where relevant. ### GPT-4-turbo **Rationale:** - **Advanced Language Understanding:** GPT-4-turbo offers state-of-the-art natural language understanding, which is essential for accurately interpreting and responding to user queries. - **Code Generation Capabilities:** Its ability to generate Python code from natural language descriptions significantly enhances our Virtual Data Analyst functionality, allowing for complex data analysis and predictive modeling. - **Cost-Effectiveness:** Compared to other models, GPT-4-turbo provides a balanced mix of performance and cost, making it suitable for large-scale deployments within organizations. **Comparison:** - **Versus GPT-3:** While GPT-3 is powerful, GPT-4-turbo provides improved performance, especially in understanding context and generating precise code. - **Versus Other LLMs (like BERT or RoBERTa):** While these models excel in specific tasks, GPT-4-turbo’s versatility and capability to handle both language understanding and generation tasks make it more suitable for our diverse requirements. ### Llama Index **Rationale:** - **Efficient PDF Indexing:** Llama Index is specialized for indexing and querying large documents, making it ideal for handling the company policies and reports uploaded to Wordwise. - **Scalability:** It scales well with the size of the documents, ensuring quick and efficient retrieval of relevant information. - **Seamless Integration:** It integrates smoothly with our backend, enhancing the overall document processing workflow. **Comparison:** - **Versus Apache Lucene/Solr:** While Lucene and Solr are robust for general text search, Llama Index is tailored for complex document structures like PDFs, providing better performance in our specific use case. ### Phi3 **Rationale:** - **Local Semantic Search:** Phi3 is a locally deployed semantic search technology, reducing the need for multiple external API calls and thereby saving costs and reducing latency. - **High Precision:** It delivers high precision in identifying relevant information, which is crucial for accurately answering both informative and analytical questions. - **Customizability:** Being locally deployable, it allows for greater customization and fine-tuning to meet our specific search requirements. **Comparison:** - **Versus Elasticsearch:** While Elasticsearch is powerful for general search, Phi3’s specialization in semantic search provides a more nuanced understanding of user queries, enhancing the relevance of search results. ### Weaviate **Rationale:** - **Vector-Based Storage:** Weaviate’s vector database is optimized for storing and querying high-dimensional vectors, which is essential for effective semantic search and retrieval of document segments. - **Scalability:** It handles large volumes of data efficiently, ensuring quick access to relevant document paragraphs. - **Integration Capabilities:** Weaviate integrates well with our existing tech stack, facilitating seamless data flow and query execution. **Comparison:** - **Versus Traditional SQL Databases:** Traditional SQL databases are not optimized for vector-based storage and search, making Weaviate a superior choice for our needs. - **Versus Other Vector Databases (like Pinecone):** While other vector databases offer similar functionalities, Weaviate’s open-source nature and strong community support provide added flexibility and cost benefits. ## User Guide ### Getting Started To begin using Wordwise, users need appropriate credentials and access rights provided by their organization's administrators. Below are the steps for different user actions: #### Business User Actions 1. **Asking Questions:** - Log into Wordwise using provided credentials. - Select a project or let the system suggest relevant projects based on access rights. - Ask questions using natural language. Wordwise will determine if it's an informative or analytical question. 2. **Viewing Answers:** - Receive answers directly from Wordwise, which may involve retrieving information from documents or databases, or running analytical models. 3. **Feedback and Review:** - Provide feedback on answers to help improve future responses. - Request reviews for unsatisfactory answers, triggering action by technical users. #### Technical User Actions 1. **Integration Tasks:** - Integrate data sources by providing necessary credentials and configuring connections securely within the Wordwise UI. - Upload pre-trained machine learning models as zip files containing necessary code, Dockerfiles, and API request formats. 2. **Managing Documents:** - Upload documents through the UI, ensuring they are converted into searchable formats and stored in the vector database (Weaviate). 3. **Creating Projects and Managing Access:** - Define projects and assign resources (databases, documents, ML models) to each project. - Grant business users access to specific projects based on their roles and responsibilities. ### Advanced Features #### Virtual Data Analyst 1. **Automated Code Generation:** - Upon receiving analytical questions, Wordwise initiates the Virtual Data Analyst process. - Generates Python code using GPT-4-turbo based on the question, database metadata, and available ML models. - Incorporates rule-based NLG for generating human-readable outputs alongside code execution. 2. **Monitoring and Management:** - Allows business users to monitor the progress of code generation and execution. - Handles queuing of analytical questions requiring Virtual Data Analyst intervention. ### Support and Troubleshooting For technical support and troubleshooting, users can refer to the following resources: - **Documentation:** Detailed user manuals and technical documentation available within the Wordwise platform. - **Support Contact:** Reach out to designated support channels provided by your organization or system administrators. ## Conclusion Wordwise empowers organizations by enabling secure, efficient, and insightful interactions with their data and documents. Designed to meet the needs of both business and technical users, Wordwise leverages advanced technologies to deliver actionable insights and support informed decision-making.