Sasha Rudan
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    1
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # D015 - Tracking the video status/processing - the Project Proposal *draft 0.2* Origin: https://github.com/mprinc/terra-zontik/issues/37 We are starting the document with **Aspect/Strategic Ideas/Approaches** and later we suggest concrete **Tasks** according to those aspects # Aspects ## Development Paradigms We need to support healthy development paradigms: 1. ***`Unit-testing`*** - (i) to be sure that regular and some important boundary cases are covered, (ii) to avoid error regression in future development, and (iii) to provide confidence in future development and have documented examples of code usage 2. ***`E2E-testing`*** - a technique to test out the entirety of the software product from beginning to end from the users’ perspective so we know that all components fit properly together 3. **`Separation of concerns`** - like we achieved with the `backend-mockup` where frontend can be developed without the backend. Similarly, we want to support backend development without *"going through"* the frontend, etc, in a sense to provide a `frontend-mockup`. 4. ***`Modular developments`*** where each functional section of code is articulated as a ***`task`*** that can be better understood, tested, and manipulated inside the Terra business logic that nicely and beneficially maps into a visual workflow. 5. ***`Scalability & Load balancing`*** - the possibility that a request can be handled by more than one server or service instance. This provides resilience, scalability, and a better user experience. ## Monitoring **NOTE**: The following acronyms are not well-established ones ### M-C-A: Monitoring customer activities M-C-A would help us with tracking each unique customer activity and being able to (i) identify specific issues with services, (ii) improve user satisfaction, (ii) learn about customer behavior. ### M-S-R: Monitor Server Resources We need to track all the critical resources of our servers to not end up in either (i) a **non-responsive** scenario or (ii) a scenario where responsiveness is slow so the **user usability** is unacceptable. ### M-S-A: Monitor Services Availability We need to track all the Terra services to understand their availability, and either 1. send notifications on their unavailability or 2. recover/restart the service if possible ### M-S-F: Monitor Services Functionality This is *"deeper"* monitoring and investigation than the former M-S-A aspect. Here we challenge the **service correctness** with all the tasks it commits to provide. ### E2E-M: End-to-End Monitoring uploading a small video every few minutes and checking its status in DB or even if the email is received. This solution **deeply monitors every aspect** of user functionality. ### M-S-L: Monitoring-Server Load Monitoring if too many tasks/videos are in the processing queue, or if their processing gets too slow, allows us to: **(i)** **increase resources** dynamically, based on needs and/or **(ii)** **inform** users upfront to instead try later This leads us to ## Measure requests/resources ratio Measure processing speed and understand the ratio between the number of users (requests) and necessary resources. (It should also take in mind how heavy load each user brings! **How big videos** and how many of them) # Notifying This is a category of work (tasks) where we provide more information to the end user or administrator about the system (activity) status ## Notifying Users ### Upfront prior users’ actions + Set of messages on the production server that are displayed during the **maintenance period**. Possible notifications: + our website/service is being updated with new features, be aware of possible malfunctions/glitches during this process (in the period of XX till YY) + ... click here to be informed when it is UP + if the monitor detects that the number/frequency of issues (videos not being successfully processed) is above some threshold, we might even **AUTOMATICALLY** add a warning on the website (in addition to warning admins) of the temporary possibility of glitches. + if we experience/schedule longer or heavier maintenance/upgrading blackouts or the ones that **disable the whole website** (giving no option for putting up these messages on the website) we could go even with **informing users by emails upfront**. ### Upon user actions' failures + **RETRYING**: informing that s/he should retry + **POSTPONING**: informing that the service is down, but will be informed when is up to resend + **AUTOMATIC**: informing users that their video is saved and will be automatically retried when service is up and s/he will get success or permanent failure email. + we are already detecting some errors (like UNDEFINED, TIMEOUT). In these cases, users and admins might be informed of the failure. We could choose to retry later instead of sending a failure email after 30 mins (as it is now) ## Notifying Admins + Warning of individual/temporary processing failure + EMERGENCY ERROR: permanently down, reset required + Warning: service/server temporary down, with later UP notification # Recovery We can try to recover (retry) all the video processing requests that are recoverable + assuming the uploaded video happens to be successfuly saved at the beginning of the processing workflow, if the processing crashes, we can still redo automatically the same processing workflow # Resilience + In the future, we should have mirroring servers, where the 2nd one is up, while the main is under reconstruction/upgrading/crashed/under-hackers-attack # Tasks ## Existing Tasks TASK: **F016 - Storage management, delete old videos** + This task addresses a "fragile" balance between caching video material and its subproducts long enough to be available for all the processing tasks but still managing all garbage collection necessary to avoid disk space issues TASK: **D002 - Monitor servers availability** Support monitoring for: + SUBTASK: **resources** + currently we have installed server services that observe server resources and implemented dashboard to visualize some of them + we need to operationalize it send alarms when the resourceds get out of the preset boundaries + SUBTASK: **single components** + video-processing async tasks + high-level async tasks + 3rd part services (mail, DB, key-store, broker, ...) + SUBTASK: **high-level (e2e) - testing the system by performing "user-like" requests** + calling some short FFMPEG task to check if it finishes successfully + checking for an email confirmations TASK: **D003 - Install Video Processing Servers - non-elastic but manually scalable** This task will help with making both scalable, responsive and resilient infrastructure as we would have multiple servers separate from the main backend server. 1. SUBTASK: **install and provide the procedure for installing additional video processing servers** 2. SUBTASK: **install workflow support** - this will provide a *networking* infrastructure for handshaking tasks and results between the backend and video processing servers TASK: **D007 - beta.welcometerra.com** + Implement the beta server, as currently we have just staging and production server + Having them 3 set properly, we can have a safer terra scenarios: 1. `production` - final server, not used for testing except monitoring 2. `beta` - the next code release, ready for heavy testing inside the Terra organization together with beta users 3. `staging` - (i) demonstrates new features, (ii) testing new features in a "real world scenario" TASK: **D008 - Backup of code, data and database** + We need to backup `code` versions to be able to quickly role back if we notice an issue on the production server + We need to back up `data` (like original videos) to give a trust to users of terra as video management storage + We need to back up `database` to not risk loss of user's data (accounts, video info, ...) TASK: **D009 - Implementing Testing Infrastructure with some basic/crucial tests** + This is an old task that should be reorganized to address: 1. SUBTASK: **providing unit-test framework and guidance** 1. SUBTASK: **providing e2e framework and guidance** 1. SUBTASK: **cover some critical terra infrastructure with the initial unit-tests and e2e tests** 1. SUBTASK: **cover some critical terra infrastructure with the initial e2e tests** TASK: **D012 - Fix deployment scripts** TODO: Rename into "**D012 - Automatized DevOp scripts**" They will automatize handling: 1. SUBTASK: **resources (instances, volumes, backups)** 1. SUBTASK: **installation (servers)** 1. SUBTASK: **configuration (services, scaling)** 1. SUBTASK: **building (frontend, services)** 1. SUBTASK: **deployment (frontend, backend, services)** TASK: **D014 - Mock-up servers** 1. SUBTASK: **Mock-up frontend** - this will significantly speed up the backend development and testing cycle and demonstrate the API usage 1. SUBTASK: **Mock-up services** - this will speed up backend development TASK: **D015 - Tracking the video status/processing** TODO: extend with descriptions 1. SUBTASK: **Identify Failure Cases that are recoverable - document** 1. SUBTASK: **Create event checkpoints to track video failures - Backend implementation** + this utilizes ColaboFlow passively to audit the workflow progress 1. SUBTASK: **Set up an alert to ourselves when the server is down or process error** 1. SUBTASK: **Migrate away from Celery to ColaboFlow** + this utilizes ColaboFlow actively to control the workflow execution 1. SUBTASK: **If the system is down, alert S/S to restart ASAP (within 30 mins)** 1. SUBTASK: **If the system is fine, but certain videos are not processed, track the status and email the user to try again** 10. SUBTASK: **Identify and document the project proposal** + the outcome of this subtask is the current document ## New Tasks TASK: **E2E-M** automatic video uploading and testing success TASK: **PING-ing FFMPEG** calling some quick FFMPEG task to check if it finishes successfully (live test of async messaging and FFMPEG failures) TASK: **Proper error handling** + We should provide a systematic approach for reporting, handling and presenting errors in the system as the current platform very often doesn't even **react** to an error making user "***deaf***". We handled some of the most critical cases like *"network errors"*, *"access errors"*, etc. TASK: **Notifying errors** 1. SUBTASK: **Notifying Users Upfront** 1. SUBTASK: **Notifying Users Upon user actions’ failures** 1. SUBTASK: **Notifying Admins** TASK: **Modularizing code** We need to transform sections of current code into modular tasks to both audit and control it more properly through the Terra workflow TASK: **Recovery** We recover (retry) all the video processing requests that are recoverable TASK: **Resilience** + Provide mirroring servers that are immediatelly available or possible to install and boot on a server failure + provide autodetection and autoinitialization of such a scenario TASK: **Scalability and loadbalancing** We need to provide a scalability to our platform to 1. handle properly change in user demands 2. reduce service costs 3. support system resilience 4. support system recovery ### Finished Tasks (need an extension) TASK: **D001 - multitasking/multiusers monitoring infrastructure** + This task needs to be rewritten to support new workflow framework

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully