# NLP Data Visualization Framework
Our Data Visualization framework has a domain covering media scripts, such as those from movies, plays, and shows. Using a customizable plugin for data import, a script is uploaded to the framework. The framework then performs Natural Language Processing on the script using the stanford-corenlp library. The framework dishes out an object that contains key information from the uploaded script, such as sentiment analysis, most notable topics, and gender breakdown. A customizable plugin for visualization can then take the analyzed data and display it in a meaningful way.
# Steps to create your own plugins
1. Create a new Gradle project for your plugins
2. Put this into your `build.gradle`
``` java
apply plugin: 'application'
mainClassName = 'edu.cmu.cs.cs214.hw5.framework.Main'
dependencies {
compile project(':framework')
}
```
3. Put this into your `settings.gradle`
``` java
include ':framework'
project(':framework').projectDir = file('PATH_TO_FRAMEWORK_PROJECT')
```
4. Create `resources/META-INF/services/edu.cmu.cs.cs214.hw5.framework.core.DataPlugin` and `resources/META-INF/services/edu.cmu.cs.cs214.hw5.framework.core.VisualPlugin`
In each of these should be the path/package of your DataPlugin and VisualPlugin implementations so the framework ServiceLoader can load them
5. You must get your own API key for a gender recognition API our framework uses.
1. Navigate to https://v2.namsor.com/NamSorAPIv2/index.html
2. Click on “Try it Out” and sign up for a free key
3. Inside the `GenderRecognition` class in the framework package, find the method `initializeGenderMap(Set<String> nameSet)` and add your API key to the line detailed `api_key.setApiKey(“YOUR API KEY”)`
6. Run the framework with your plugins by executing `gradle run` from your plugins project
# Plugin APIs
In order to write your own plugins you must implement two interfaces, `DataPlugin` and `VisualPlugin`. The main purpose of the DataPlugin is to create a `DataObject` and for the VisualPlugin to create a JPanel from a `ProcessedData` object. Our framework acts as an adapter converting from a `DataObject` to a `ProcessedData`. Here are the key data structures:
The DataPlugin needs to create a `DataObject`. This includes a String for the script & the name
``` java
public class DataObject {
private String scriptString;
private String name;
public DataObject(String script) {
this.scriptString = script;
this.name = "";
}
public void setName(String name) {
this.name = name;
}
// Accessors
public String getScriptString() {
return scriptString;
}
public String getScriptName() {
return name;
}
}
```
The VisualPlugin needs to take a `ProcessedData` in order to create a JPanel
``` java
public class ProcessedData {
String name;
Map<String, Integer> topicPrevalence;
Map<String, Integer> sentimentMap;
Map<String, Integer> genderBalance;
public ProcessedData(String name){
this.name = name;
topicPrevalence = new HashMap<>();
sentimentMap = new HashMap<>();
genderBalance = new HashMap<>();
}
public String getName() {
return name;
}
public Map<String, Integer> getTopicPrevalence() {
return topicPrevalence;
}
public Map<String, Integer> getSentimentMap() {
return sentimentMap;
}
public Map<String,Integer> getGenderBalance() {
return genderBalance;
}
public void setTopicPrevalence(Map<String, Integer> topicPrevalence) {
this.topicPrevalence = topicPrevalence;
}
public void setSentimentMap(Map<String, Integer> sentimentMap) {
this.sentimentMap = sentimentMap;
}
public void setGenderBalance(Map<String, Integer> genderBalance) {
this.genderBalance = genderBalance;
}
}
```
Our framework performs NLP analysis on the `DataObject` given by the `DataPlugin` and creates a `ProcessedData` which will be given to the `VisualPlugin`. You can see that certain maps can be extracted from the `ProcessedData` and then visualized in a `VisualPlugin`
**Data Plugin**
Here is the interface that must be implemented for a Data Plugin
``` java
public interface DataPlugin {
/**
* Creates a DataObject in a customizable way from the origin file location
* @param origin Location of script (File path, URL, etc)
* @return Created raw data object, null if origin invalid
*/
public DataObject getData(String origin);
/**
* The data prompt to display to user when inputting a data origin
* @return String to prompt user with
*/
public String getDataPrompt();
/**
* Name of data plugin shown to user in GUI when selecting
* @return Plugin name as String
*/
public String getName();
/**
* If the file origin is a local file path or not (e.g. HTTP URL would be false)
* @return True if local file false otherwise
*/
public boolean isLocalFile();
}
```
Here is one possible implementation that webscrapes https://www.imsdb.com
for a movie script using Jsoup:
``` java
public class HTMLPlugin implements DataPlugin {
private static final String TAG = "pre";
private static final String DATA_PROMPT = "Input a URL to an HTML Script";
private static final String NAME = "HTML";
@Override
public DataObject getData(String origin) {
try {
Document doc = Jsoup.connect(origin).get();
Element script = doc.select(TAG).first();
String strScript = script.text();
DataObject d = new DataObject(strScript);
return d;
}
catch (Exception e) {
return null;
}
}
@Override
public String getDataPrompt() { return DATA_PROMPT; }
@Override
public String getName() { return NAME; }
@Override
public boolean isLocalFile() { return false; }
```
**Visualization Plugin**
Here is the interface that must be implemented for a Visual Plugin
``` java
public interface VisualPlugin {
/**
* Visualizes the ProcessedData in a custom way and makes visualization as JPanel
* @param pd ProcessedData object to extract info from and visualize
* @return JPanel to display in GUI
*/
public JPanel visualize(ProcessedData pd);
/**
* Name of visualization plugin shown to user when selecting
* @return Plugin name as String
*/
public String getName();
}
```
Here is one possible implementation that creates a Word Cloud (Wordle) of all the important words in the imported movie script:
``` java
public class WordCloudPlugin implements VisualPlugin {
private static final String NAME = "Word Cloud";
@Override
public JPanel visualize(ProcessedData pd) {
Map<String, Integer> allWords = pd.getTopicPrevalence();
List<String> onlyNouns = new ArrayList<>();
for(String s : allWords.keySet()) {
for(int i = 0; i < allWords.get(s); i++) {
onlyNouns.add(s);
}
}
final FrequencyAnalyzer frequencyAnalyzer = new FrequencyAnalyzer();
List<WordFrequency> wordFrequencies = frequencyAnalyzer.load(onlyNouns);
final Dimension dimension = new Dimension(600, 600);
final WordCloud wordCloud =
new WordCloud(dimension, CollisionMode.PIXEL_PERFECT);
wordCloud.setPadding(2);
wordCloud.setBackground(new CircleBackground(300));
wordCloud.setColorPalette(new ColorPalette(new Color(0x4055F1),
new Color(0x408DF1), new Color(0x40AAF1), new Color(0x40C5F1),
new Color(0x40D3F1), new Color(0xFFFFFF)));
wordCloud.setFontScalar(new SqrtFontScalar(10, 40));
wordCloud.build(wordFrequencies);
BufferedImage bf = wordCloud.getBufferedImage();
JLabel label = new JLabel(new ImageIcon(bf));
JPanel panel = new JPanel();
panel.add(label);
return panel;
}
@Override
public String getName() {
return NAME;
}
}
```
# Framework
**Build Time and Run Time Notes**
When building the framework for the first time it may take an extended period of time. Gradle is importing the NLP libraries from Maven and this includes a large 350MB file that holds the machine learning models needed for NLP.
When running the framework tool on longer scripts the NLP analysis may take some time (up to 10 minutes possibly). This is unavoidable unfortunately.
**API**
The framework requires a DataObject from a corresponding DataPlugin for functionality. This plugin is loaded into the framework in FrameworkImpl’s `importData` as shown below:
```java
/**
* Allows user to input a filepath and plugin to import DataObject to the framework
* @param dp Data Plugin to import data with
* @param path file origin to get data from
* @param name name of this imported data object chosen by user
*/
public boolean importData(DataPlugin dp, String path, String name) {
DataObject newData = dp.getData(path); //import data with chosen data plugin
}
```
The framework then visualizes the generated `ProcessedData` and gets a JPanel which is sent to the GUI tool to be displayed
``` java
/**
* Gets the JPanel that the given visual plugin creates from the data object
* @param visual Visual plugin to create JPanel visualization with
* @param dataName Name of imported data to visualize
* @return The JPanel to display in the GUI
*/
public JPanel visualizeData(VisualPlugin visual, String dataName) {
DataObject d = dataObjects.get(dataName);
return visual.visualize(processedDataMap.get(d));
}
```
The ProcessedData object is the output of the framework as it contains all the analysis on the script provided by the DataObject. This ProcessedData object is then the main input to the VisualPlugin.
In short: DataPlugin creates a DataObject that is loaded into the framework. The framework uses this DataObject containing the script to analyze and and pass this analysis to a ProcessedData Object. The ProcessedData object now has the necessary analysis for the VisualPlugin to render.