Project Plan - HackMD

# Project Plan Little intro to add, plus content table and our student numbers and names. ## 1. Problem Analysis KPN's service iTV receives daily reviews from their website, PlayStore and AppStore, which come in the form of a rating and a text review. Moreover, there is a bunch of extra data inside the review including the operating system the user is using, the time at which they submitted the review, the browser used, and much more. With all these reviews KPN gets an indication of which services are performing well (many positive reviews) and which services are in need of attention (poor reviews). However, to achieve this KPN needs to spend hours categorizing all the different comments and deciding whether the comment left behind is a positive or a negative one. As such, KPN wants the data to be automatically represented in a way which makes it clear which categories are currently receiving the most attention, and if it is positive or negative. The necessary programming can be done in any programming language deemed fit. ### 1.1 Problem Statement The problem consists of two parts. 1. Identify the most reported topics/complaints - Identifying the topics for every review will be the most challenging task. Since we have no labels on the existing data identifying which category it belongs to, it is currently not possible for us to carry out any supervised learning techniques. In theory, we could manually label every review ourselves and use that to carry out supervised learning, however, the entire dataset consists of less than 5000 reviews, which is far too little for any kind of effective supervised learning. As such, we have to consider pre-trained models, data mining techniques, effective preprocessing and unsupervised learning. Attempting all these techniques we can find out which model or combination of models will perform the best. 2. Make a prioritization based on the sentiment - Sentiment analysis, although seemingly easier, still carries a number of challenges. Sentiment analysis will work most effectively using a pre-trained model, however these models are all trained on English, whereas our data is in Dutch, adding a whole layer of difficulty. Some things to consider are do the reviews lose a lot of their meaning when translated and how long does translating the entire dataset take. In terms of the prioritization, we should decide which factors will decide upon the significance of the review in the prioritization process. ### 1.2 Project Goals This is a tough task as it deals with Natural Language Processing which we have little experience with. However, with many of us being part of the Epoch Team which competes in Machine Learning competitions, we do have experience with many other models and are aiming to use this knowledge to create a very effective data representation. As such, our goals for this project are: 1. Categorize every review into multiple categories: - Achieving this is essential for the data to be well represented. Without having an assigned category for every review, we cannot find statistics such as the total count and average sentiment of each category, which is what KPN is ultimately after. Reviews with multiple categories (i.e. talking about audio and connection issues in one review) will also need to be seperated and analyzed as a seperate review. It is important that this process is done as accurately as possible, as it may impact the decisions KPN makes. 2. Apply Sentiment Analysis to every review: - Sentiment Analysis is another critical step, as without it we cannot judge whether certain topics are talked about positively or negatively. With sentiment analysis working, we will integrate it into our data representation to much more effectively display the resulting sentiment of the users. ### 1.3 Research on Existing Products This product falls primarily under Machine Learning, and, as such, we will research products relating to this. #### 1.3.1 Categorization - Latent Dirichlet Allocation Latent Dirichlet Allocation is an unsupervised method of Topic Modeling. **Pros:** This method will allow us to find clusters of words that will be identified as a specific topic without needing to label any data. Moreover, we are able to specify how many categories (clusters) we want this model to create, allowing us to select the preferred number requested by KPN. **Cons:** Since this is an unsupervised model, the categories returned by the model are not guaranteed to be the ones we are looking for. Moreover, it could be that certain topics we want will not even appear using this model, since it is too "niche" and specific. #### 1.3.2 Sentiment Analysis - PyTorch Transformers PyTorch Transformers allow us to make use of pre-trained models in order to achieve sentiment analysis on certain strings. **Pros:** Using this model is a relatively simple process, requiring us to install and utilize the provided pipeline, feed in a string of our review, and receive an output indicating positive or negative sentiment. **Cons:** The pre-trained model could be trained on a dataset that will not understand the sentiment behind our specific reviews on the iTV topic. Moreover, the sentiment analysis is trained on English data, and, as such, our data will need to be translated, risking losing valuable information to translation. Lastly, if we notice certain results not meeting our expectations, it is much more difficult, if not impossible, to alter the model in any way, since we are only given a function in which we can feed it a string. ### 1.4 Use Case Analysis When thinking of the use cases, we can think of a situation where a new update is pushed to the iTV. With every update, despite testing, there can be some amount of uncertainty as to whether something will work correctly in every situation. In the case that the update were to break the audio and cause video lag, many people will likely give this negative feedback in the form of reviews. In the current state, KPN will have to read through a vast number of comments manualy in order to judge whether this is a commonly occuring issue, or if it is just a handful of people with a potentially poor internet connection or a broken headset. With this new system, all the reviews of any period of time (for example after the update) can be automatically analyzed and return a visual representation of which topics are prioritized and are they positive or negative. In the event of audio and video lag topics being both high in the priority and having negative sentiment, KPN can instantly take action. ## 2. Feasibility Study After establishing the requirements with our client, we conducted research into a feasibility study of the proposed product. Our research will be divided into two sections: the feasibility of employing machine learning in this project, and the feasibility of our final product, assuming we have a functional model. The machine learning forementioned point is necessary since it is hard for a team to estimate how difficult it will be to build a model without experimenting with available resources. This will inevitably result in a blurry feasibility study if we only conduct it on the final product. However, an alternative is to consider them as separate and evaluate the feasibility analyses of the machine learning one as a living document that evolve and solidify as we learn more about the data and algorithms. [5] ### 2.1 Machine Learning Feasability We will therefore conduct our machine learning feasibility study according to Microsoft’s requirements list [6] and Jared Rand discussion post [7]: #### 2.1.1 ML problem definition and desired outcome The problem regarding machine learning is framed as such; we need to deploy a pipeline such that: 1) A comment from a KPN user is translated from Dutch to English. Because the model in Part 2 is pre-trained on English words, therefore it can only interpret English. 2) We need to classify the sentence into different categories of keywords, reflecting the general topics of the comments. 3) A model predicts the sentiment of satisfaction that is expressed on each keyword of that sentence. There is currently no alternative time-effective approach than utilizing machine learning-based models since it necessitates both knowledge of the content of phrases and meaning of the context of word order. As a result, any algorithm based on a list of keywords and rules will not be as effective as a trained machine learning pipeline. #### 2.1.2 Data As stated in Jared Rand article [7], the main questions regarding data should concern training data, predictive features, data sources and production. This is mainly because following Microsoft guideline for data analysis would be time-consuming which we cannot permit given the time limitations of this project. Therefore, we will focus on answering the following questions: 1. Training Data a. *Does training data need to be collected?* Yes, KPN will provide it to us but some of the data still needs to be collected b. *If so, how much time and money will it cost?* We are expecting to get all the data by next week, for free. 2. Predictive Features a. *According to domain experts, what factors are likely to predict the target variable?* According to the KPN development team it highly depends on the keywords they want to extract, it has yet to be determined which keywords to select and if that list is feasible or not. 3. Data Sources a. *What data sources will you need to gain access to?* We will need to get data for the translator from Dutch to English as well as the KPN data. b. *If internal, do you have support from data engineers?* Some help is provided for extracting the data; however, it is up to us to come up with a working pipeline. c. *If external, how much will vendor data cost?* On the internet, we were able to find some reliable translation data for free. 4. Production a. *What is the level of effort to develop, deploy, and maintain your model in production?* Because development and deployment will take ten weeks, we are certain that we will be able to create a successful basic pipeline. However, no one will update the code after that, so if a significant outbreak in Natural Language Processing occurs, our code will be outdated, perhaps causing KPN to make incorrect conclusions compared to competition. #### 2.1.3 Responsible AI I do not know yet #### 2.1.4 Summary of Machine Learning feasibility To summarize, the viability of machine learning will be determined by how excellent our baseline is, and if we are able to develop stronger models over time, this will be fully dependent on public resources. However, if this is not the case, we will present two alternatives to the client: 1. We re-evaluate an easier list of keywords with them such that the model can categorize them. 2. We let the model categorize the words by himself, this would result in an indication of key topics in the reviews, but we won’t be able to guarantee the relevance of them to the client. This model has already been created and therefore is evaluated as feasible; however, we consider this to be the last option. ### 2.2 Feasibility of the Final product Now, assuming we have a working model, we still need to render all the answers in a human-readable way. This is where we will make use of multiple technologies to display the information in an interactive webpage. #### 2.2.1 Technical Feasibility In terms of technological feasibility, we believe that the level of technology currently available for software development will be sufficient to produce the proposed product. The libraries we'll use for this project are open source and well-known among computer scientists. As a result, there will be lots of documentation available. Taking everything into account, we are certain that the present level of technology will be adequate to build the suggested product. #### 2.2.2 Schedule Feasibility Another factor in feasibility is the time constraint imposed by the course. With both the client and the developing party being aware of these constraints, we organised and prioritised the requirements accordingly with the MoSCoW method, creating a must-have subset of requirements which will receive priority in the upcoming quarter. Each team member is expected to work approximately 40 hours per week, which we believe is enough to at least deliver the minimal product. Furthermore, because most of us are already familiar with React and Django, this will eliminate the need for a learning phase between research and programming, compensating for the time spent investigating models indicated in the machine learning study. #### 2.2.3 Summary of Feasibility To summarize, our final product feasibility analysis, we believe that the product is technically and operationally feasible within the time restrictions if a functional machine learning model is available. ## 3. Risk analysis Here we will explain the potential risks our project's completion faces. We divided into content centered risks, which are risks concerning the project output itself, and organization centered risks, which are risks concerning the organization of the project ### 3.1 Content centered risks **Lack of existence of a model.** There might not exist a model which does specifically what we want, or if it does, it might not be open-source. Another thing to consider is that the data that we have is not labeled and we will have to find a way around this. **Lack of experience with software development.** Since we have not been active in the computer science world for a long time, we might struggle with finding and setting up the right frameworks and libraries. - A problem could be that we choose a bad framework that everyone learns, but then we realize the framework does not fit the project. This can be avoided by doing research on the frameworks beforehand. **The system takes a long time to execute our task.** If it is not possible to create a system which does what we want in a short time frame, that will cause problems with the client and possibly danger the feasibility of the product we are trying to implement. **Lazy code.** At the beginning of software projects we tend to idealize the way our code style will be. However when implementing, it might be hard for the developer to follow these rules. This can lead to the closing and slower integration of pull requests. [2] Jort: (**Data bases going private** Since our models heavely depend on public data sources, those soures might go private or be outdate with the new verisons of python, therefore being unable to run the code without some maintenance. **New models available** As mentioned in the feasability study new Natural Languages Proccessing techniques might arise theerefore making our model outdated, with litle or no mainteance this is a serious long term threat to the efficiency of the company since the product owner will make worse decision then their concurents. **The set of keywords being too hard** pretty self explanatory I would just write it better. ) ### 3.2 External risks **Covid unexpectedly makes a comeback.** This world is uncertain, and we are currently not anticipating a scenario where covid takes over again. If it comes back to our daily lives, and measures get introduced, we will have to go back to working online. This would significantly effect the morale and motivation of the team, however there is not real way around this. **The requirements change.** Managing requirements change is hard to deal with [1]. Thus even after a complete requirements elicitation phase, we still might have to reconsider them later. This should be expected and we should be flexible enough to deal with the problem. **Chance that KPN changes personnel.** If our clients at kpn decide to leave the company, that will leave us with no one to deliver the product to. **Not enough time.** In case we start to build the wrong thing, there might not be enough time to complete the project. This can also be a consequence of changing requirements or unexpected problems which take a lot of time to resolve. **People having to quit.** In case someone is forced to quit our project, we will have to continue with less people. If more team members quit, it will lead to the infeasibility of the project. Jort: **Resits** A lot of us will have the CPL resit meaning that a drop in efficiency might be noticed we will try to compensate for that loss **before** the resit. ## 4. List of Requirements After a focused meeting with the client, the requirements for this project were created using the MoSCoW process. Despite the fact that the client had no precise requirements for the project, we made sure to follow this technique by describing and expanding on what they desired. In addition to the classification of the requirements into "must have", "should have", "could have" and non-functional, the list also follows a prioritized order. This will not only save us time in the future by requiring us to prioritize only once, but it will also assist us in achieving more balanced sprints and a more planned schedule. It's worth noting that "wont have" will not be included in our list of requirements, due to the "could have" section already including all idealistic features that we believe to be useful for the problem but aren't in scope. ### 4.1 Requirements Please find below the actual list of requirements. >## Must have >- The user is able to pass to the algorithm a csv file containing the reviews and get in return a report. >- The user is able to visualize in the report the following: > - A general overview of the data containing: - Amount of negative comments per category - General average rating over time - Overall positive/negative ratio of all reviews - Average rating of the category over time > - Per category: - Current ratio of negative comments of the category - List of comments from the category - The user should be able to visualize a subset of the list of comments of that specific category, over the past month. > - Per platform: - Percentage of comments over time - Ratio of negative comments - Percentage of present categories >- The report is saved to a csv file. >## Should have >- The user should be able to visualize the static report in an interactive local webpage. >- In the webpage, the user should be able to: > - Filter the data > - Modify the graphs to be based on a desired period of time. > - Download the static version of the report directly from the webpage. >## Could have >- The user could be able to modify directly from the webpage the list of categories to which the comments are classified to. >- The user could be able to filter by iTV webpage. For instance, the user could be able to see which feedback was given in a 404 error page, or in the actual feedback section. >- The reviews are stored in a database. >- The reviews are automatically collected from the different sources. >- The visualization is updated live on the webpage. >## Non-functional requirements >- The model is able to translate the Dutch comments into an English language. In case of english comments, the translator must keep the comment the same. >- The model is able to give a sentiment about the comment. >- The model is able to classify comments, based on a hard-coded list of keywords. >- The framework used for the backend is Django. >- The framework used for the frontend is React. >- The backend and frontend should communicate through API calls. ## 5. Project Approach After discussing the project with our client and within our team, we have decided to adopt the following architecture to complete the project. This architecture is based on the current knowledge of the data (reviews), and as the timeline progresses, it is very possible our plan will have to pivot. As such, we will explain our current best envisioned solution, discuss the possible drawbacks and give alternative strategies where possible. We can divide the project into three main tranches: Model (the algorithm that will tackle the classification task), Back-End, and Front-End. ### 5.1 Model This will be an unsupervised learning model as the data is not labelled. We will use a mixture of text preprocessing techniques and embeddings to clean our data and extract patterns out of it. Subsequently, we will create a rule-based topic modeller that will categorize the reviews into the predefined classes we are interested in. As this model requires us to manually create conceptual rules that will map our sentences to the predefined classes, it will be difficult to capture the full complexity of the task. To tackle this we will be collaborating with the KPN experts, in the attempt to absorb as much field information as possible and implement it in our model's decisions. If this pipeline fails, we will try a mixture of unsupervised and supervised learning, or even ask for the data to be fully labelled and attempt a full supervised learning approach. If Machine Learning proves to not be the right approach, we will use different Data Mining techniques (for example TF.IDF) as a last effort to solve the problem. ### 5.2 Backend Our backend will host the model and serve as a connection point to the web platform. We will be using the Django framework (**fill in why**). ### 5.3 Frontend Our frontend will serve as the main user interface. This will be locally hosted and connected to the backed through API calls. The framework chosen is React, as our team is already comfortable with it. ## 6. Development methodology Quoting from .... (to add!!), a methodology is a system of practices, techniques, procedures and rules used by those who work in a discipline. Between all the different methodologies that suit our project and would have allowed us to work to reach a quality result, our team opted to follow the Scrum procedure based on Agile principles over all the other approaches, as it was not only the most well-known between all the team members but also the best fit for an eight-week project. ### 6.1 Scrum The goal of Scrum is to divide the work and organize the meetings of the team such that the self-management, communication, quality and speed of development are improved. It is based on an iterative plan of attack: at the start of each iteration (i.e. sprint) the team meets to choose the tasks for that next iteration (i.e. sprint planning) from the backlog (collection of requirements), and at the end of the iteration, a sprint retrospective will be completed to review all tasks and discussions covered during that time period and confirm that the Definition of Done has been met. More specifically, a Definition of Done will be produced after a task has been assigned. This definition outlines what engineers must strive for in order to successfully finish a task. For example, the definition of done for the task "Drafting a first design of the report" would include the following elements: - The report includes all of the client's requested visualizations. - The format of the draft is straightforward and understandable. - The report is contained in a file. By creating a DoD for every task, we make sure we can achieve the expected quality. Furthermore, in between iterations, daily scrums will help us ensure that everyone on the team is doing their part and is keeping up to date. Because we only have eight weeks to complete this project, our team's schedule will be broken into one-week sprints, with the sprint retrospective and planning taking place on the same day as the client meeting. As a result, client input can assist us in optimizing and improving sprint by sprint. For a more visual representation of our Scrum, please find below our weekly schedule. | Day | Schedule | |:--------- | ------------------------------------------- | | Monday | Halfway Sprint meeting 18:00 | | Tuesday | TA meeting 09:30, standup straight after | | Wednesday | Standup 09:30 | | Thursday | Sprint retrospective: before meeting with client, Client meeting at 11:00, sprint planning right after | | Friday | Standup 09:30 | ### 6.2 Agile methodology Furthermore, this proccess is completed following the Agile methodology, whose four core values are, as stated by the Agile Manifesto: 1. Individuals and interactions over processes and tools; as evidenced by our frequent working collaboration and making the Daily Scrum not just a matter of technical progress, but also a matter of ensuring that whoever requires assistance receives it. 3. Working software over comprehensive documentation. We didn't include this value as much in our project because we document our reflections and progress every week. We do, however, intend to accomplish both of these criteria at a similar degree. 4. Customer collaboration over contract negotiation; which is already included in our program by the project not being about money, but about delivering a high-quality product approved by the client. 5. Responding to change over following a plan; even though we did schedule a project planning, our team will have to adapt to unforeseen situations or adjust the planning based on the results of our models (see more section 3 "Risk Analysis"). We chose Agile over the Waterfall methodology (which is a one-iteration process that includes - in the following order - gathering requirements, designing, implementing, testing, and maintaining the product), since it allows us to parallelize work and not be completely reliant on the workability of one segment to begin the next. It's worth noting that, because we're using an iterative approach, testing will be taken care of sprint by sprint. Each team member will be responsible for testing their own code fragments, ensuring that mistakes are identified early on in the project. ### 6.3 General Project Plan We developed a representational planning, up to future adjustments, based on these values: ![](https://i.imgur.com/UxdhEl5.png) ## References [1] Jayatilleke, S., & Lai, R. (2018). A systematic review of requirements change management. Information and Software Technology, 93, 163-185. [2] Zou, W., Xuan, J., Xie, X., Chen, Z., & Xu, B. (2019). How does code style inconsistency affect pull request integration? an exploratory study on 117 github projects. Empirical Software Engineering, 24(6), 3871-3903. [3] https://thedigitalprojectmanager.com/project-management-methodologies-made-simple/ TO WRITE BETTER FORMAT [4] https://agilemanifesto.org/ TO WRITE BETTER [5] According to https://www.borealisai.com/en/blog/feasibility-study/ [6] https://microsoft.github.io/code-with-engineering-playbook/machine-learning/ml-feasibility-study/ [7]https://towardsdatascience.com/assessing-the-feasibility-of-a-machine-learning-model-ae36f4180f8 ## Appendix