Conversation Log for Analytics
===
**Author: Po-Wen (Steven) Fang**
---
###### tags: `Dragon Cloud AI`
[toc]
## Objective
To perform analytics on users' english speaking records, we should store any data that might be helpful for further analysis. It might includs but not limited to conversation scripts and voice files.
## Context
Users would have conversations with the chatbot everyday to improve their english speaking, and we should try to discover insights from those conversation logs. This database could not only serve as internal resources for improving relative ML models, but also provides teachers a much closer look on the performace of each student.
Although Lex itself provides conversation logs in CloudWatch and S3, but in the future we might develop our own chatbot so we should build the data storage independently.
Note: built-in text log for Lex is not working
## Goals
* Store every conversation from every users, index in DynamoDB point to S3
* userId as primary partition key, timestamp as sort key
* Design Global Seconday Index that allows analysis on certain group of users (this might be done in the user information table)
## Implementation
Currently we are using Lex as our chatbot service, and we connect Unity frontend with it via Lambda function **(abc123_From-Unity-To-Lex)**. In order to minimize the latency as much as possible, we should build another async Lambda function along with the Lambda function that connects frontend and chatbot. This could be implemented by:
1. invoke another async Lambda function in **(abc123_From-Unity-To-Lex)** [[instruction link]](https://stackoverflow.com/questions/31714788/can-an-aws-lambda-function-call-another)
2. Using AWS Step Functions to Chain Lambdas [[instruction link]](https://www.refinery.io/post/how-to-chain-serverless-functions-call-invoke-a-lambda-from-another-lambda)
## Example Table
| userName | timestamp | keyToS3 | inputTranscript | score |
| ------ | --------- | ---------------------------------------------------------------------------------------- | --------------- | --- |
| dragoncloud.test@gmail.com | 2020-05-18T14\:52\:19.598Z | [country]/[institute]/[year]-[month]-[day]/[userName]-[timestamp].ogg | hi how are you | 60 |
Primary Partition Key: **userName**
Sort Key: **timestamp**
We have limited all users to use their email acount as their username, so we could query their conversation log easily with their email.
If it is an individual user without institution name, we should manage all individual users into a certain group and assign a specific institution name for it.
### Analytics scenarios
1. To analyze the performance and behavior of an user
:point_right: We can query all data of this user within a certain period efficiently by utilizing the primary partition key and sort key.
2. Observe all users
:point_right: Utilize keyToS3 to limit the time we want, then we could query all users data.
## Future Development
1. **Score on each syllable**
In the future, the table should be able to store scores on each syllable pronounced by users. The attribute **score** can store a JSON format file which contains key-value pairs of each syllable. For example:
```json=
{
"words": {
"apple":{
"ap": 60,
"ple": 70
},
"banana":{
"ba": 80,
"na": 92,
"na": 53
}
},
"averageScore": 71
}
```
2. **English materials from user**
If the users are practicing their own english materials on our platform, the table should have another attribute containing the key to that text file in S3.
## References