7-1. News Use Case

# 7-1. News Use Case ## Introduction In this topic, we are going to introduce how to (1) parse XML, (2) use the file storage agent, and (3) emit and search events. To explain briefly, we are going to parse ++Headline XML++ through the file storage agent and emit these data as events. Afterwards, we will search these emitted events with respect to the category code of the headlines (queried by API) and extract the filtered file names from these events. With these file names, we are going to compose URLs and use the file storage agent again to retrieve the news contents from ++News Content XML++ as our final output. The reason why we only keep the headlines as events is because through this way we could reduce the burden to query the existing system whenever there is a need to search certain news. Therefore, there will be ++2 data processes++ to reach these goals. \ **++Explanations of 2 Data Process Design++** ![](https://i.imgur.com/zf6G0nI.png) :::spoiler **Additional Information** :::info There are several ways to put data into LOC, two main ways are through **HTTP Agent** & **File Storage Agent**. The difference between the file storage agent and the HTTP agent is that - the **file storage agent** is to directly get the information from the designated URL; - the **HTTP agent** is to simulate the behaviour of requesting the API to get the response/data input. ::: --- ## Design Data Process 1: Get ++Headline++ XML To begin with, you need to get the ++Headline++ XML through the file storage agent and keep these data in the event store. \ **++Logic Design of Data Process 1++** ![](https://i.imgur.com/J0dAoaV.png) ### API Design For Data Process 1, below is the proposed API spec for your reference, where you might need to use - **Get** method - **Async** mode for the purpose of simulating when in the real world, it is more common to have [*Scheduler*](https://en.wikipedia.org/wiki/Job_scheduler) to handle this process. Thus, this API response will be *202*. \ **++API Spec for Data Process 1++** ![](https://i.imgur.com/MNolZBs.png) *If you would like to ensure this data process runs successfully, you could go to the event store to check out the emitted events. ### Event Design: Record ++Headline++ XML In Data Process 1, this set of events will be emitted to record the necessary data from ++Headline++ XML. - Source DID: category code (gccd;gcnm) - Target DID: file name (filenm) - Label Name: action + time (action;time) - Meta: headline \ **++Event Schema++** ![](https://i.imgur.com/iHWoaPs.png) ### Generic Logic #### Get Headline XML In order to feed XML into LOC, this time we use the package called ++*fast-xml-parser*++ to implement it. We can install it through [npm](https://www.npmjs.com/package/fast-xml-parser) or [yarn](https://yarnpkg.com/package/fast-xml-parser), For more information to use it, please refer to [Github](https://github.com/NaturalIntelligence/fast-xml-parser). With the *fast-xml-parser* package installed, it is suggested to - import the *fast-xml-parser* package - define `parser` - use the file storage agent (`ctx.agents.fileStorage.simpleGet`) to acquire XML, and - decode and parse XML via `TextDecoder` and `parser`. Following the suggestion, you will get XML on board for this data process. :::spoiler **Sample Code Snippets of Parsing XML** ```javascript= import { XMLParser, XMLBuilder, XMLValidator } from "fast-xml-parser"; export async function run(ctx) { const parser = new XMLParser({ parseAttributeValue: true, ignoreAttributes: false, attributeNamePrefix: "", }); const receivedNews = await ctx.agents.fileStorage.simpleGet("http://eprofitfxsmartphone:f67v15Ue18gH38462c3@newsweb.ovalnext.co.jp/eprocx_xml_news/news_headline.xml") const decodeHeadlines = new TextDecoder('shift-jis').decode(receivedNews); const rawHeadlines = parser.parse(decodeHeadlines); await ctx.agents.sessionStorage.putJson("rawHeadlines", rawHeadlines); } export async function handleError(ctx, error) { ctx.agents.logging.error(error.message); } ``` ::: #### Record as Headline Event As soon as you obtain headlines, according to the event design in the beginning, you may want to use `ctx.agents.eventStore.emit` to emit events. Here are the pseudo-codes of emitting events for your reference: ```javascript= sourceDID: `${gccd};${gcnm}`, targetDID: `${filenm}`, labelName: `${action};${time}`, meta: `${headline}`, type: 'default' ``` ### Aggregator Logic: As usual, you can compile your result here. Yet, Data Process 1 will be designed with `Async` mode to simulate as if this data process is triggered by [*Scheduler*](https://en.wikipedia.org/wiki/Job_scheduler). :::spoiler **Sample Code Snippets of Aggregator Logic** ```javascript= export async function run(ctx) { let result = { Message: 'In progress.', }; ctx.agents.result.finalize(result); } export async function handleError(ctx, error) { ctx.agents.logging.error(error.message); let result = { status: 500, errorMessage: `An error occurs when calling API. Error: ${error.message}`, }; ctx.agents.result.finalize(result); } ``` ::: --- ## Design Data Process 2: Get ++News Content++ XML In Data Process 2, you need those events emitted from [Data Process 1](#Event-Design-Record-Headline-XML) so as to filter headlines of your interest. Afterwards, the filtered headlines will then be used to get the corresponding news content file names. Lastly, you can use the file storage agent with the filtered file names to acquire ++News Content++ XML. \ **++Logic Design of Data Process 2++** ![](https://i.imgur.com/yXzmnJp.png) ### API Design As for Data Process 2, below is the proposed API spec for your reference, where you might need to use - **POST** method - **Sync** mode for the purpose of simulating when in the real world, you need to input the search value (category code) to filter out your desired headlines, and thus this data process will return the corresponding news contents for you. \ **++API Spec for Data Process 2++** ![](https://i.imgur.com/sCUC5vL.png) *You can compile the news contents of your interest in the aggregator logic. So whenever you trigger this API, you will get the queried news contents. ### Generic Logic #### Parse API payload & Search Headline Events First thing first, you need to get 1. either `gccd1`+`gccd2` or `gcnm1`+`gcnm2`, and 2. (optional) `ddate` by parsing these values from the payload to trigger Data Process 2. :::info Only if ++`ddate`++ has an input, you will have the filtered headlines with respect to the input time-frame (`ddate`) and `gccd1`+`gccd2` or `gcnm1`+`gcnm2`. Otherwise, all headlines with respect to `gccd1`+`gccd2` or `gcnm1`+`gcnm2` will be returned. ::: :::spoiler **Sample Code Snippets of Parsing Payload** ```javascript= function UTF8ArrToStr(aBytes) { let utf8decoder = new TextDecoder(); return utf8decoder.decode(new Uint8Array(aBytes)); } const data_payload = JSON.parse(UTF8ArrToStr(ctx.payload.http.body)); let gccd1 = data_payload.gccd1 ?? ""; let gccd2 = data_payload.gccd2 ?? ""; let gcnm1 = data_payload.gcnm1 ?? ""; let gcnm2 = data_payload.gcnm2 ?? ""; let dateInput = data.payload.ddate ?? ""; let ddate = dateInput.replace(/(\d{4})(\d{2})(\d{2})/, '$1-$2-$3') ?? ""; ``` ::: \ With these parsed inputs, next you can search the [headline events](#Event-Design-Record-Headline-XML) accordingly. In other words, the parsed values from the payload will be used as key values to look up headline events. Here are the pseudo-codes of searching events for your reference: ```javascript= const searchReq = { queries: [], excludes: [], filters: [{ Wildcard: {field: "source_digital_identity", value: `${gcccd1}*`}}], from: 0, size: 1000, sorts: [ { field: "timestamp", orderBy: "Desc" }, ], }; ``` :::info In the `filters` of the provided pseudo-codes, it is suggested to use `Wildcard` for fuzzy search in the sense that the searched value is not exactly the same as the real value. To elaborate a bit more, since the event schema looks like this, when it comes to search events, you can use `Wildcard` in the filters (given that there is an input for`gccd` from the payload) to search all the events with the searched source DID `${gcccd}*`. On the other hand, if there is an input for `gcnm` from the payload, the searched source DID would become `*${gcnm}`. | Source DID | Target DID | Label Name | Meta | |:----------:|:----------:|:-----------:|:--------:| | gccd;gcnm | filenm | action;time | headline | ::: #### Get Content XML With headline events searched in the [previous logic](https://hackmd.io/QabxWp8RRauBGus9RfvFnA?view#Parse-API-payload-amp-Search-Headline-Events), you can extract target DID from the searched events to get file names `filenm`. Next, you can use `forEach` to 1. attach the searched file names like this, and `http://eprofitfxsmartphone:f67v15Ue18gH38462c3@newsweb.ovalnext.co.jp/eprocx_xml_news/${filtered-file-name}` 3. call the file storage agent `ctx.agents.fileStorage.simpleGet` to get the news contents. ### Aggregator Logic: Different from the `Async` mode used in Data Process 1, it is suggested to use `Sync` mode in Data Process 2. Thus, you can design a response to return desired results. For instance, you will be getting news contents through the file storage agent. The returned information is suitable to put in this aggregator logic. :::spoiler **Sample Code Snippets of Aggregator Logic** ```javascript= export async function run(ctx) { const newsContents = await ctx.agents.sessionStorage.get("newsContents"); let result = { Status: 200, Message: 'Successful', News: newsContents, }; ctx.agents.result.finalize(result); } export async function handleError(ctx, error) { ctx.agents.logging.error(error.message); let result = { status: 500, errorMessage: `An error occurs when calling API. Error: ${error.message}`, }; ctx.agents.result.finalize(result); } ``` ::: ## Summary Through these 2 data processes, you should be able to acquire news contents through querying ++Headline XML++ and ++News Content XML++. In the meanwhile, you could utilise some of our common agents to parse XML and extract information as events. Therefore, the expected events will look like this: ![](https://i.imgur.com/5s5zvyw.png) with the news contents returned in the API response shown below. ![](https://i.imgur.com/Yu70XRG.jpg) --- ###### tags: `Workshop`