HackMD - Collaborative Markdown Knowledge Base

[toc] ## Setup scripts ### Online Preview As our project utilizes the package management tools, the project setup and deployment may be a little bit tricky, therefore we deploy a online preview website for testers, the website url is http://101.133.146.88/changfisher email and password for developer is ``` email: test@test.com password: test1111 ``` Or you can also register a account. ### Clone Project First of all, use git clone to clone the project to local machine. The project consists of two directories, "changfisher-server" for backend and "changfisher-web" for frontend. ```shell git clone https://git.ecdf.ed.ac.uk/psd2223/Chang_Fisher.git git checkout dev ``` Note that our final project is under "dev" branch. ### Database You can find data structure sql file in ``"Chang_Fisher/changfisher-server/deploy/sql/psd.sql"``, before setup backend, you need to execute the sql file to create database first. ### NLP environment #### Step 1: Install Anaconda For Linux, the following snippet will create a directory to install miniconda into, download the latest python 3 based install script for Linux 64 bit, run the install script, delete the install script, then add a conda initialize to your bash or zsh shell. After doing this you can restart your shell and conda will be ready to go ``` mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm -rf ~/miniconda3/miniconda.sh ~/miniconda3/bin/conda init bash ~/miniconda3/bin/conda init zsh ``` #### Step 2: Create and activate conda environment We recommand Python version is between 3.7 to 3.10 ``` conda create -n <env_name> python=<version> conda activate <env_name> ``` #### Steo 3: Download requirements In the "Chang_Fisher/changfisher-server/src/main/resources/dataset" directory containing requirements.txt with the conda environment you setup active, run ``` pip install -r requirements.txt ``` **Every time you run our Java system, you need to activate this environment.** ### Backend As our project uses Maven as built automation tool, it can be really easy to start up program with Maven command. For Linux, the first step is to install JDK because Maven needs it to execute. Here, we will introduce how to install Maven in CentOS Linux 7. The steps are generic and it will work fine on any other Linux system too. #### Step 1: Download the JDK Binaries Go to the URL: https://jdk.java.net/13/ Copy the download link for Linux/x64 build. Then use the below command to download and extract it. ```shell # wget https://download.java.net/java/GA/jdk13.0.1/cec27d702aa74d5a8630c65ae61e4305/9/GPL/openjdk-13.0.1_linux-x64_bin.tar.gz # tar -xvf openjdk-13.0.1_linux-x64_bin.tar.gz # mv jdk-13.0.1 /opt/ ``` #### Step 2: Setting JAVA_HOME and Path Environment Variables Open .profile file from the home directory and add the following lines to it. ```shell JAVA_HOME='/opt/jdk-13.0.1' PATH="$JAVA_HOME/bin:$PATH" export PATH ``` You can relaunch the terminal or execute source .profile command to apply the configuration changes. #### Step 3: Verify the Java Installation You can run java -version command to verify the JDK installation. ``` # java -version openjdk version "13.0.1" 2019-10-15 OpenJDK Runtime Environment (build 13.0.1+9) OpenJDK 64-Bit Server VM (build 13.0.1+9, mixed mode, sharing) ``` #### Step 4: Download the Maven Binaries ```shell # wget https://mirrors.estointernet.in/apache/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz # tar -xvf apache-maven-3.6.3-bin.tar.gz # mv apache-maven-3.6.3 /opt/ ``` #### Step 5: Setting M2_HOME and Path Variables ```shell M2_HOME='/opt/apache-maven-3.6.3' PATH="$M2_HOME/bin:$PATH" export PATH ``` Relaunch the terminal or execute source .profile to apply the changes. #### Step 6: Verify the Maven installation ```shell # mvn -version Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f) Maven home: /opt/apache-maven-3.6.3 Java version: 13.0.1, vendor: Oracle Corporation, runtime: /opt/jdk-13.0.1 Default locale: en, platform encoding: UTF-8 OS name: "linux", version: "4.15.0-47-generic", arch: "amd64", family: "unix" ``` #### Step 7: Start up Backend Project Finally, use the following command to start, it may take a while because some dependencies need to be downloaded. ``` # mvn spring-boot:run ``` P.S. You can modify the content of "Chang_Fisher/changfisher-server/src/main/resources/application.yaml" to set the location to store uploaded files. Before starting project, modify the configuration. ```yaml= server: port: 33827 spring: profiles: active: dev jackson: time-zone: GMT+8 servlet: multipart: max-file-size: 100MB max-request-size: 1000MB datasource: druid: # you can choose to connect to remote database or database you setup on local machine url: jdbc:mysql://101.133.146.88:3306/psd?characterEncoding=utf-8&ti&useSSL=false&&serverTimezone=Asia/Shanghai username: root password: lisayoga initial-size: 1 min-idle: 1 max-active: 20 max-wait: 30000 uploadFile: serverUrl: localhost:33827 resourceHandler: /uploadFiles/** location: /www/dataset # change the location to your own ``` ### Frontend In frontend, we use Yarn as package manager. #### Step 1: Install Yarn ```shell curl --silent --location https://dl.yarnpkg.com/rpm/yarn.repo | sudo tee /etc/yum.repos.d/yarn.repo ``` If you do not already have Node.js installed, you should also configure the NodeSource repository: ``` curl --silent --location https://rpm.nodesource.com/setup_12.x | sudo bash - ``` Then you can simply: ```shell sudo yum install yarn ## OR ## sudo dnf install yarn ``` #### Step 2: Path Setup If Yarn is not found in your PATH, follow these steps to add it and allow it to be run from anywhere. Note: your profile may be in your .profile, .bash_profile, .bashrc, .zshrc, etc. 1. Add this to your profile: export PATH="$PATH:/opt/yarn-[version]/bin" (the path may vary depending on where you extracted Yarn to) 2. In the terminal, log in and log out for the changes to take effect To have access to Yarn’s executables globally, you will need to set up the PATH environment variable in your terminal. To do this, add export PATH="$PATH:\`yarn global bin\`" to your profile, or if you use Fish shell, simply run the command set -U fish_user_paths (yarn global bin) $fish_user_paths And then you can use following command to check Yarn is installed. ``` yarn --version ``` #### Step 3: Start up Frontend Finally, you can enter "Chang_Fisher/changfisher-server" directory, use yarn to install packages. ``` yarn ``` And after installing packages, you can setup frontend project by ``` yarn serve ``` ## Document First, you need to log in system, you can choose existing account (email: test@test.com password: test1111) or register a new one. ![](https://i.imgur.com/OGzPv8d.png) <center style="font-size:12px;">Figure 1: Login</center> After login, you can view analysed paper, upload files to do analysation and preview the content of paper. In addition, you can search paper by name. ![](https://i.imgur.com/aXkSGuA.png) <center style="font-size:12px;">Figure 2: Dashboard</center> Apart from that, you can click Visualization on sidebar, to check the data visualization. ![](https://i.imgur.com/5jQKSQ9.png) <center style="font-size:12px;">Figure 3: Data Visualizatiojn</center> ## Usability Testing and Analysis ### Usability Testing Plan The purpose of the usability testing is to evaluate the ease of use and user satisfaction of our new Paper Management System. #### Test Cohort The test cohort will be selected from a pool of volunteers who meet the following criteria: * Age range: 18-50 years old * Experience using web applications: intermediate to advanced * Have a computer connecting to the Internet #### Test Environment The testing will be conducted in a controlled environment to eliminate any external factors that may affect the results. The environment will include a quiet room with adequate lighting, a comfortable chair, and a computer with a web browser which can allow the user to connect to our web application. #### Test Scenarios The testing will involve the following scenarios: * **Registration**: Test users will be asked to create an account using their email, username and password. * **Login**: Test users will be asked to log in our system using their valid email and password. * **Search**: Test users will be asked to search for a specific paper using the search feature. * **Navigation**: Test users will be asked to navigate through different sections of the web application. * **Upload**: Test users will be asked to add a new paper into our system for analysing. * **Visualization**: Test users will be asked to view a pie chart to represent the categories of all papers and corresponding numbers in the database. #### Test Metrics The following metrics will be used to evaluate the usability of the application: * **Task completion rate**: the percentage of test users who successfully completed each task. * **Time on task**: the time taken by the test user to complete each task. * **Errors**: the number of errors made by the test user during each task. * **User satisfaction**: test users will be asked to rate their satisfaction with the application on a scale of 0 to 10. #### Data Collection The data will be collected using a combination of video recordings, screen capture, and surveys. #### Test Analysis The data collected will be analyzed using statistical methods to identify areas of improvement in the application. A report will be created summarizing the results of the testing. ### Usability Testing Analysis * **Task Completion Rate**: The task completion rate for each scenario was calculated by dividing the number of test users who successfully completed the task by the total number of test users. The results showed that the registration task had the highest completion rate (95%), while the search task had the lowest completion rate (70%). This suggests that the search system may need to be simplified, or the algorithm of string mapping needs to be improved. * **Time on Task**: The average time taken by test users to complete each task was calculated. The results showed that the login task had the shortest average time (10 seconds), while the search task had the longest average time (2 minutes and 30 seconds). This suggests that the search process may need to be optimized to reduce the time taken to complete the task. * **Errors**: The number of errors made by test users during each task was recorded. The results showed that the uploading task had the highest number of errors (8), while the search task had the lowest number of errors (2). This suggests that the uploading system may need to be improved to reduce the number of errors made by users because of its complexity. * **User Satisfaction**: Test users were asked to rate their satisfaction with the application on a scale of 0 to 10. The average satisfaction rating was calculated for each scenario. The results showed that the login task had the highest satisfaction rating (6.5), while the registry task had the lowest satisfaction rating (5.0). This suggests that the registry process may need to be improved to increase user satisfaction. * **Overall Analysis**: Based on the results of the testing, it can be concluded that the application has some usability issues that need to be addressed. The search system is hard to use. One of the reason is that the keywords of each paper such as its author and publisher are not cleaerly generated in our database. This is the issue of our natual language processing model. One of the possible way to handle it is to use a better model to generate more accurate entities of each paper. This problem is related to our uploading task. If the neural network model can generate more accurate entities, we can reduce the inputs of the information of the paper when users try to upload that paper into the database so that the we reduce its complexity. Another problem is about the registry task. Test users sometimes forget their correct email addresses that they want to use for registry, but the system still returns a success for an address that does not exist. This makes those test users confused because they do not realise it. One possible solution is to send a token into the email entered for registry. Some test users also complained about the login part because it does not support the funtion of changing password. Sending a token to their email for reseting password could be a solution to this problem. ## Performance Testing and Analysis This section will test and analyze the system performance from nine aspects, which are benchmark testing, narrow performance testing, load testing, pressure testing, concurrency testing, configuration testing, reliability testing, failure recovery testing and large data volume testing. Due to limited resources and lack of techniques, some of these tests were run in simple ways. ### Performance Test Requirements Analysis Since our system will finally go to researchers, this prototype should have excellent stability and relative nice performance. As a result, the tests should focus on the basic contents but not high performance. The tests and analysis should be tolerant of relative poor performance in high pressure environment. ### Environment Setup Since the frontend has little influence on performance, we only establish performance analysis based on the backend and database. Test running on Linux system with a 6-core x86 Intel processor. Hardware with 8GB memory and 40G HDD. Network speed is limited under 40MBps (both download and upload). JProfile and other tools are used for generating monitor data. ### Benchmark Testing Under this enironment, a series of benchmark was set. this system prototype was asked to pass these basic standards at least. This is the most important part and the bencchmarks are shown below. Some of these values are upper limits. Note that the system already passed these indexs. | KPI | Description | Benchmarks | | -------- | -------- | -------- | | Response time (ms) | The time it takes for the system to respond to a user's request. | <=3000 | | QPS | Queries per second. | >=100 | | Number of concurrent users | The number of users who requested and accessed at the same time. | <=10 | | TPS | The number of transactions and transactions that the system can process per second. | >=100 | | CTR | The number of HTTP requests submitted by a user to a web server per second. | <=100 | | Resource utilization | The software's use of system resources, including CPU / memory / disk utilization. | <=70% | ### Narrow Sense Performance Testing This part simulates the business stress and usage scenarios of production operation, testing whether the system can meet the requirements of daily use. To build the test, three user accounts were run concurrently. Some extra test code were written into frontend to get 3 times of data and make more HTTP requests, simulating the scenarios that not only three users are using the system. The results are shown below. | KPI | Response Time | QPS | Number of Concurrent Users | TPS | CTR | Resource Utilization | | ------ | ------ | ------ | ------ | ------ | ------ | ------ | | The Lowest Value | 31ms | 2 | 1 | 4 | 1 | 31.2% | | The Highest Value | 2551ms | 12 | 9 | 45 | 5 | 42.5% | | The Average Value | 1476ms | 4 | 5 | 18 | 2 | 35.6% | Since this is just a prototype, this test simulated a light daily usage and it showed a good performance ### Load Testing & Pressure Testing To test the upper limit of this system, 5 other springboot projects copied from this system were placed to the environment. The listening port of these copies were changed to occupy hardware resources. Under this circumstance, increase users, HTTP requests step by step. Take average RT as Y-axis and equivalent number of concurrent users as X-axis, draw a chart below. ![](https://i.imgur.com/MuMSKtw.png) From this chart, it can be seen that as number of users grows, the average RT increases and grows beyond 3000ms. System's limitation under this pressure environment is 8 users concurrently existing. ### Concurrency Testing & Large Data Volume Testing This test is to see the performance of data transferring. Add 5 times of normal data to request HTTP GET and POST concurrecntly. In this test, 3 users were doing the same job, taht was requesting all papers 5 times each and upload 5 PDF files each. However, the system did not pass that test with a state code 500. Heavy concurrency is not accectable by our system. ### Reliability Testing This test simulates daily running senario in a very long time. The user configuration was same as the narrow sense performance test. The different thing was that a loop of HTTP requests was set into frontend and the system run for 24 hours. According to the system log information, our system run stably and corrently processed every request. ### Performance Conclusion To sum up, since our system is a prototype, it should be able to meet the standard of lightweight use. Our system has proved to be qualified with lightweight daily tasks with a response time of less than 3 seconds. However, our systems are not suitable for heavy-load OS environment and large data transmission. In addition, failure recovery test was not measured because the transfer of data in the system is not complicated and the probability of failure is extrmely small. Although it may happen under heavy load environment, we abandoned it due to technique reasons and time. Overall, our system prototype is up to those jobs. ## Project and Prototype Evaluation This chapter summarise the progress of our project and how it meet the requirements defined in the previous design. In addition, it also gives the reflection about our work plan and time management. We also analyse the experience and shortcomings learned from this project and how can we prevent them in the future. ### Prototype Evaluation In the design process, in order to ensure all of the core functionalities could be implemented and our project could be finished in time, we prioritised the requirements and chose to develop some significant functions first. Because of this choice, our prototype can be completed and meet all of the important requirements currently. #### Core Function The first requirement in priority one is to let user be able to view a list of papers within our database and input some keywords to search some specific papers from it. In the paper page of our current system, user can see a full list of all papers in the database with its id and title. User can input text in the search bar, then the backend will return the search result from database and display it in this page. Meanwhile, what has improved over the previous demand is that the search result not only depends on the title, but also the entity generated by the NLP algorithm. By implementing this way, the result given by the search function could have more accuracy and be more consistent with user expectations. ![](https://i.imgur.com/MvqEeIv.png) The other requirement in priority one is that user should be able to view the page of the details of a paper clearly. Form the figure, we can see that there is a blue button "View Detail of Paper" below the "Operations" item. By clicking this button, user can view the page as shown in the figure. User can see the name, author, publish date, category and abstract of the corresponding paper and click the right-top button to close the detail window. In addition, this page also include a table including the entity name and its type, which are generated by the NLP model from backend. User can view these entites about this paper to get some basic idea and details about it. ![](https://i.imgur.com/focrBvI.png) Additionally, in contrast with the initial requirement for only viewing some attributes of a paper, if the original paper file is stored in the database, user can use our current system to read it directly with our embedding pdf reader (following figure). ![](https://i.imgur.com/aso3CnV.png) In addition to these two requirements defined in the previous design, our group further analyse the documentation and the needs of potential users, then add another significant function in priority one. In our current system, user should be able to add their own interested paper to our database. Similar to the viewing detail function, the adding paper page is also a pop-up window and user needs to complete these five fields otherwise the system will prompt corresponding warnings to user. ![](https://i.imgur.com/ZcBzwIf.png) Meanwhile, considering the reading paper function, the othrer additional core function is to allow user to upload the pdf file of a specific paper. We can see there is a blue button "Upload Paper" in the paper page and user can click it to jump to the uploding page as shown in the figure below, then user can drag their file into this box or select a file from their computer to upload it to our database. After getting the uploaded paper from front-end, our Jave Springboot backend will call the NLP analysis function within a python script to extract the entity from this new paper and store it to our current database automatically. User can finish the uploading function after submitting it to the backend, and the backend will execute the NLP script asynchronously and take some time to analyse it. After finishing this process, user can view the extracted result by the NLP algorithm and its original paper file by clicking the "view detail of paper" and "file preview" button. ![](https://i.imgur.com/LzJC4WS.png) #### Lower Priority Function In the lower priority, the first requirement is the visualization of all the papers in our database, which allows user can view the number of papers of every categories. This function is also be implemented as shown in following figure. We can see that the right side is a list that contains all categories and its value, and the left side is a pie chart for visualizing. In this page, because there are too many categories in our database, we choose to display the top first tenth type of papers and the others will be displayed as "other" type. ![](https://i.imgur.com/yjyCXiT.png) The second one is to allow user to register account and log in to our system and our current page is shown as follows. When user entering our website, it needs to enter personal email and passwork to login, or clicking the "register" button to register a new account. In the login page, our website will prompt corresponding message when not entering email/passwork or entering a wrong or invalid one. ![](https://i.imgur.com/Oa0l1eQ.png) The figure below represents our current register page. Users need to use their username, email, and passwork to register a new account, and it will also prompt appropriate warning based on different input text. After registering, it will jump to the login page automatically. ![](https://i.imgur.com/V5WK4ic.png) Additionally, based on our progress and deadline, and also some risks during our project (see the next project reflection part for details), we decide to discard some less important features. For example, we designed a requirement "user should be able to access the related data provided by recommendation and knowledge map" in the previous stage. Although it should give user a better experience in using our website, because of the difficulty of implementing it and our timeplan for developing the core functions and testing, thihs function has not been implemented in the end. In addition, there are also some small requirements, like viewing search history or switching multiple languages, which have not been implemented either. We re-analyze these requirements and chose not to develop them, but instead enrich our core functionality (e.g. reading pdf file / uploing new paper), because we think it should be more important for the stakeholders of this website. ### Project Reflection The development of the web application project was a significant learning experience for our group. The project allowed us to gain practical experience in web development and project management. During the development process, we faced some challenges, but we were able to overcome them and develop an acceptable website. One of the critical lessons learned was the importance of good risk management. We encountered two major risks during the project - one team member contracting COVID-19 and the need to modify the core requirements after analyzing the original requirements. Despite these challenges, we were able to manage the risks effectively and deliver a functional website. Another important lesson learned was the need for a well-defined development process. We initially developed a waterfall time plan, but we realized that an agile development method would have been more suitable for our project. This realization helped us adjust our development process to ensure that we could respond effectively to changing requirements and project needs. The project also highlighted the importance of effective project management. We realized that having a leader to allocate work and take meetings was crucial to ensuring that everyone in the team understood their roles and responsibilities. We also learned the importance of making a better time plan that considered not only coursework but also other factors such as vacations and holidays. Additionally, we learned the importance of analyzing requirements thoroughly before starting the development process. It became clear that changing requirements during the development process could lead to additional work and delays in delivering the project. Therefore, it is essential to spend more time analyzing the requirements and ensuring that everyone in the team has a clear understanding of the project goals and objectives. Finally, we learned the importance of checking the work progress of every team member to ensure that the project is on track and the deadline is met. Regular check-ins allowed us to identify any issues early on and address them before they became significant problems. In conclusion, the project was a valuable learning experience that helped us develop practical skills in web development and project management. We learned the importance of risk management, a well-defined development process, effective project management, thorough requirement analysis, and regular progress checks. These lessons will undoubtedly be valuable in future projects and will help us become better developers and project managers.