Alice Kham
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # 50.043 Database Project Documentation ## Group 22 **Chow Jia Yi | 1003597 Nan Shing Kham Shing Alice | 1003747 Tiffany Goh Yi Lin Xavier Tan De Jun** ## Contents 1. Tech Stack 2. How To Run Production Scripts (via Amazon CloudFormation) 3. How To Tear Down Production System 4. Application Features 5. Flask API 6. Scrapper 7. Production System Databases 8. Analytics System 9. Analytics Tasks 10. Automation Scripts ## Tech Stack This application consists of a React-Redux frontend and a Flask backend. To compile, change directory to Book-Review-Web-Application/website and run ```bash npm run production ``` to compile and minify the React code into a bundle. You can see the bundle files at `bookReviewAPI/src/templates`. When the Flask application is served, it will read the static files through that folder. ## Project Architecture on Github | **automationScripts** contains the scripts to setup the production and analytics systems. | - **hdfsScripts** contains the scripts to set up the Hadoop clusters. | - **mongodbScripts** contains helper code to set up MongoDB. | - **mysqlScripts** contains scripts to set up and load data into MySQL. | **bookReviewAPI** contains the Flask code for the APIs. | - **src** contains Flask code. | --- Directory **templates** contains the html pages. | --- Directory **static** contains the minified React code which will be run from Flask backend. This directory will be created when ```npm run production``` is run. | --- **config.py** contains the URIs of the databases. | --- **controllers.py** contains all of the API calls. | --- **error.py** contains all the helper functions to handle errors. | --- **models.py** contains the MySQL Database schema. | - **requirements.txt** contains all the dependencies needed for the Flask API. When setting up the production server, ```pip3 install -r requirements.txt``` will be run to install all required dependencies for the backend. | **website** contains the frontend React application, which will be compiled into static files in production. ## How To Run Production Scripts (via Amazon Cloud Formation) Get the full file URI of production_template.json and pass it as the file URI. Run the following: ```bash aws cloudformation create-stack --stack-name [stack name] --template-body [file URI] --parameters ParameterKey=InstanceType,ParameterValue=[instance type] ParameterKey=ImageId,ParameterValue=t2.medium ParameterKey=KeyName,ParameterValue=[name of private key] ``` Expect to wait for 5 to 10 minutes for the production system to be fully loaded. To access the webpage, type ``` aws cloudformation describe-stacks --stack-name [stack name] ``` When the stack has fully setup, it will output the IP address for the Backend Server(BEPublicIP), MongoDB, and MySQL. Copy and paste the BEPublicIP to access the application. ## How To Tear Down Production System 1. Run the following: ```bash aws cloudformation delete-stack --stack-name [stack name] ``` ## Application Features Website visitors can register an account to become a user. 1. **Home Page** On The Home Page, visitors can search for a book three ways: 1. By ++Genre section++ - visitor clicks on a genre provided - visitor will be redirected to the Genre Page 2. By ++Ratings++ - visitor clicks on a Rating checkbox (eg: `4 and higher` which indicates books with an average rating of 4 stars and above) - results will be displayed below on the same page 2. By ++Title++ - visitor enters the search keyword in the search bar and click `Search` button - results will be displayed below on the same page On the Navigation Bar, visitor can choose to register for an account or login into account. 2. **Login Page** User can login into account and view user details and bookshelf on the Profile Page. Logged in Users can also leave a book review on the Book Page of a particular book. 3. **Register Page** Visitors can register for an account to become a user. 4. **Genre Page** Genre Page displays books of the selected genre. To improve efficiency, the first 1000 books are displayed first. Visitor can click on a book to view the book; this will redirect visitor to Book Page. 5. **Book Page** Book Page displays book details and reviews of the book. A user can leave a review at the ++My Review++ section (if they are logged in). Logged in users can also save books that they are interested in under 'Want to Read', and books that they have read under 'History'. They can then view the saved books in the Profile page. 6. **Profile Page** Profile Page displays user's details and bookshelf. The bookshelf contains books user is currently reading or has read (did not manage to debug Bookshelf section, currently not working). 7. **Add Book Page** Users can submit a book to the website admin if book is not available on site. ## Flask API We used Flask, a web application framework written in Python based on Werkzeug WSGI toolkit and Jinja2 template engine to build our application. The files are present in /bookReviewAPI/src on our Github repository. controllers.py (`/bookReviewsAPI/src/controllers.py`) is the main file that connect to the backend; it contains routes and functions to fetch the Database instance and collection. The data is sent to the frontend in controllers.py, which also contains the main code that renders all the HTML templates present in the static folder. ## Scrapper There is limited data provided in the provided databases. Many of the book titles are not present. We conducted the scrapping as shown in `/web_scrapper_title.py` file. ## Production System Databases ### MongoDB Our MongoDB server is hosted on a separate EC2 instance. It contains two databases, one named book_metadata, and the other named web_logs. The book_metadata database stores metadata of all books, including the title, related books, price, asin number and so on. All metadata of the books are stored in a collection named metadata. The web_logs database contains the activities from users. Whenever a user makes an API request to the backend through the website, information about the request is stored in the web_logs database through the logs collection. Each log record has the following information: * Request method (GET, POST, etc) * Request path * Reponse status code (404, 200, 500 etc) * Duration * Timestamp * IP address * Host * Request parameters ### SQL Our MySQL server is hosted on a separate EC2 instance. The MySQL database contains the following tables: * User ( uid varchar not null primary key, password text ) * Review ( (uid, asin) primary key, helpful varchar, overall int, reviewText varchar, summary varchar, reviewTime datetime, unixReviewTime bigint ) * Book (asin varchar not null) * ReadHistory (uid varchar, asin varchar) * Interest (uid varchar, asin varchar) ## Analytics System ## Analytics Tasks We did not manage to get the ingestion tools from the respective databases to our hadoop clusters promptly. But we have finalised the analytics tasks script which can be run by following the commands in AnalyticsScript.txt on namenode ec2 instance with user hadoop. ### Pearson Correlation: Between price and average review length The value is calculated using the formula: ![](https://i.imgur.com/AZKsjzL.png) r = Pearson Correlation value n = total number of values x = values in the first dataset (the prices of the books) y = values in the second dataset (the average review lengths of the books) We have a class PearsonCorrelationCalculator defined in pearson_correlation.py. There are five map-reduce tasks defined in the `calculate_pearson_correlation` fuction. The correlation value is then calculated using the outputs of the map-reduce tasks and the formula. The calculated Pearson Correlation value is 0.0228853646. ### TF-IDF: Computing the term frequency inverse document frequency metric on the review text ## Automation Scripts ###### tags: `NodeJS` `React` `Redux` `Flask` MongoDB` `SQL` `Documentation` `HDFS` `Spark` `Javascript` `Python`

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully