Try   HackMD

20180314 會議記錄

Development roadmap

「列出一個人傳過的文章」功能

送出文章時必須提供理由

「下列哪一篇是你查詢的文章」的相似度分析

判斷「下列哪一篇是你查詢的文章」時

  • 選 0 的使用者,當時使用者看到的最高相似度是多少
  • 選了文章的使用者,他按的相似度是多少

這也可以算 hit rate,(選了文章 / (選文章+選新增))

「文章相似」標準

讓編輯改暱稱

Landing page

  • Ref: No. But should be ready ASAP, there are in conferences in April and May
  • Web: (Orz)

Template: https://github.com/g0v/grants-landing-template
(Sample: https://g0v.github.io/grants-landing-template/ )

Build automation on Travis CI

在 schema revamp 時我有整理過 build process。由於新的 build process 會用到 --build-args 於是沒辦法再用 docker cloud 自動 build 了。不過,我們還是可以改用 travis 來幫 build

  • API: TBD
  • Website: TBD

Designing URL preview mechanisms

Reason

We have plenty of URLs in the text messages.

Extracting text from these URLs can enrich our data in the database,
thus increases query matching rate and adding materials to future machine learning possibilities.

Text summarization performance

https://docs.google.com/spreadsheets/d/1y1GGc04HBhpU76D6LvX5hqt877X_Rfatc1vSQNJmZ6Q/edit#gid=0

Should run a puppeteer instance in the API server so that we can resolve JS.

Implementation

No message queues are required. Use shell directly.

References

Proposal

New elasticsearch index "urls"

Includes a fetched record of the given URL.

Fields:

  • url: exact URL found in the articles
  • canonicalUrl: The canonical URL fetched from the page.
  • title: Title of the page
  • summary: Extracted summary text using Goose3
  • html: Fetched raw html input
  • topImageUrl: Image URL for preview. Optional.
  • fetchedAt: Timestamp

New fields in index "articles" and "replies"

Add hyperlinks field as a nested object in articles and "replies", which includes the following fields:

  • url: extact URL found in this article
  • canonicalUrl: canonical URL fetched from the page.
  • title: Copy of title from urls index, for query purpose
  • summary: Copy of text from urls index, for query purpose

User flow & data flow

  1. When the user queries an article using ListArticles's moreLikeThis filter, and the text contains URLs (check by url extraction)
    1. First check if the URL / canonical url already exists in urls index. If exist, add its "summary" to moreLikeThis text query.
    2. If URL is not found, fetch & render the page using rendertron, insert new entry in urls index. Invoke goose3 to fetch "summary" and add to moreLikeThis text query.
    3. Perform moreLikeThis query on articles' text and hyperlinks.summary and hyperlinks.title fields.

morelikethis query's like can specify different index's field, thus "adding to query" can be implemented by providing ID of the urls index:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html

  1. When a new article is submitted to the database, and the text contains URLs (check by url extraction)

    1. First check if the URL / canonical url already exists in urls index. If exist, add its "summary" and title to hyperlinks
    2. If URL is not found, fetch & render the page using rendertron, insert new entry in urls index. Invoke goose3 to fetch "summary" and title and add them to the nre article's hyperlinks.
  2. when a reply is being submitted, apply 2-1 and 2-2.

  3. On the website, url preview pop-ups containing the url's title and part of the summary is shown in a box in the article and reply. (Design TBD. Ask Luciennnnn)

  4. opt-in/opt-out options: ListArticles and CreateArticle can choose to always fetch latest page (which results in creation of new entries in urls), or only use the cached entries in urls (Speeds up the query)

Discussions

(silence)

Line

進度更新:

  • 目前Mission Sticker的功能無法開放。
  • 確定有官方帳號,再走一次一般的流程就可以完成。

待修正事項

  • 貼圖團隊合約修正作法
    • 由貼圖開發團隊上架,收益直接歸貼圖團隊
    • 合約新增:貼圖上架、收益歸屬
  • 貼圖贈送的方法
    • 註冊新帳戶,直接用禮物傳給編輯
  • 個人帳戶上架教學