判斷「下列哪一篇是你查詢的文章」時
- 選 0 的使用者,當時使用者看到的最高相似度是多少
- 選了文章的使用者,他按的相似度是多少
這也可以算 hit rate,(選了文章 / (選文章+選新增))
Template: https://github.com/g0v/grants-landing-template
(Sample: https://g0v.github.io/grants-landing-template/ )
在 schema revamp 時我有整理過 build process。由於新的 build process 會用到 --build-args
於是沒辦法再用 docker cloud 自動 build 了。不過,我們還是可以改用 travis 來幫 build。
We have plenty of URLs in the text messages.
Extracting text from these URLs can enrich our data in the database,
thus increases query matching rate and adding materials to future machine learning possibilities.
https://docs.google.com/spreadsheets/d/1y1GGc04HBhpU76D6LvX5hqt877X_Rfatc1vSQNJmZ6Q/edit#gid=0
Should run a puppeteer instance in the API server so that we can resolve JS.
No message queues are required. Use shell directly.
Includes a fetched record of the given URL.
Fields:
url
: exact URL found in the articlescanonicalUrl
: The canonical URL fetched from the page.title
: Title of the pagesummary
: Extracted summary text using Goose3html
: Fetched raw html inputtopImageUrl
: Image URL for preview. Optional.fetchedAt
: TimestampAdd hyperlinks
field as a nested object in articles
and "replies", which includes the following fields:
url
: extact URL found in this articlecanonicalUrl
: canonical URL fetched from the page.title
: Copy of title from urls
index, for query purposesummary
: Copy of text from urls
index, for query purposeListArticles
's moreLikeThis
filter, and the text contains URLs (check by url extraction)
urls
index. If exist, add its "summary" to moreLikeThis text query.urls
index. Invoke goose3 to fetch "summary" and add to moreLikeThis text query.articles
' text
and hyperlinks.summary
and hyperlinks.title
fields.morelikethis query's like
can specify different index's field, thus "adding to query" can be implemented by providing ID of the urls index:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html
When a new article is submitted to the database, and the text contains URLs (check by url extraction)
urls
index. If exist, add its "summary" and title to hyperlinks
urls
index. Invoke goose3 to fetch "summary" and title and add them to the nre article's hyperlinks
.when a reply is being submitted, apply 2-1 and 2-2.
On the website, url preview pop-ups containing the url's title and part of the summary is shown in a box in the article and reply. (Design TBD. Ask Luciennnnn)
opt-in/opt-out options: ListArticles
and CreateArticle
can choose to always fetch latest page (which results in creation of new entries in urls
), or only use the cached entries in urls
(Speeds up the query)
(silence)
進度更新:
待修正事項