Text Analysis Framework

# Text Analysis Framework This framework provides the ability to import documents from various sources, perform analyses on them, and display the analytics as visualizations with the use of three plugin types, described in detail below: `DataPlugin`, `AnalysisPlugin`, and `DisplayPlugin`. ## Usage 1. Use `gradle run` in the `plugins` directory to start the program. 2. Once on the GUI, you will notice a list of buttons to import documents on the top right. 3. Click a button to import documents. Once imported, you can select one or multiple documents from the document pane. 4. Finally, to display your documents and their analytics, you can select a visualization on the bottom right. ## Screenshots ![Importing tweets from Twitter](https://i.imgur.com/qZEGQDI.png) ![Creating a sentiment analysis time series](https://i.imgur.com/c6EuRqS.png) ![Checking analytics of different news media](https://i.imgur.com/L9ghJGa.png) ![Comparing word clouds of presidential candidates](https://i.imgur.com/2E23loR.png) ## Important Classes ### Document A `Document` class is the primary unit to store a body of plain text. Beyond the mandatory text field, a document can store more information by having certain attributes attached to it, such as a document's `title`, `author`, or even complex analysis such as its `word_count` or `sentiment_score`. In the GUI, a Document's name (refer to `toString`) will simply be a preview of the text *unless* the Document contains a `title` attribute, in which the value of that attribute will be used instead. ### DataPlugin Data plugins are in charge of adding `Document` classes to the framework by accepting user input and parsing the relevant source of data into plain text. If applicable, data plugins can also add certain attributes in the import step. Examples: `LocalFilePlugin`, `TwitterPlugin`. ### AnalysisPlugin Analysis plugins are the heart of the framework; they can take in any `Document`, analyze the text, and attach relevant attributes to the document. The framework attaches many useful analytics out of the box: * Word/Character/Sentence Count * Word Frequency * Sentiment Analysis (Score, Magnitude, Description) * Readability (Automated Readability Index, Flesch-Kincaid Grade Level) For examples on `AnalysisPlugin`, you can check out the built-in classes that implement it in `framework/analysis` (such as `WordFreqAnalysis` and `SentimentAnalysis`). ### DisplayPlugin Combined with attributes, display plugins are a powerful tool to create visualizations for a collection of documents. They can prompt the user for custom input as well as allow them to select from attributes in the framework that match a certain type of data (more below), allowing for type safe creation of data-driven charts and more. Examples: `Table`, `WordCloud`. ### DataType To facilitate data flow, the custom interface `DataType` opens up a whole world of extensibility. Internally, all documents store their attribute values as Strings. This class functions to keep track of which attributes are meant to store what type of value. It also provides the ability to check user inputs against a plugin's required parameter types. Each data type has a `canParse(String)`, which returns true if the provided value can successfully be parsed into the expected type and false otherwise. Feel free to use the pre-defined types in the enum `CoreType`: * `STRING` * `BOOLEAN`* * `INTEGER` * `NUMERICAL` * `DATE` * `FILE`* * `COLOR`* * `ANY` \* *These types have custom integration in our packaged GUI.* Taking it a step further, `DataType` can be implemented by developers to create and display custom, advanced types, from flattened JSON maps to text manipulation. Take a look at the example `WordFreqType.WORD_FREQ_MAP` for an example on how to create and parse an advanced type. ### Parameter The `Parameter` class is a container class that contains a string `displayName`, (a user-friendly identifer meant to be used by GUIs) and a data type `type`. This container provides two useful functions throughout the framework: 1. To facilitate user input through the framework interface's `getParams` method. A GUI can take advantage of the parameter's `type` and display different input methods, even though internally it will all boil down to a `String`. 2. To tell the framework which attributes are provided by an `AnalysisPlugin` (described in detail below). ### Attribute The `Attribute` class is an immutable class that can be attached to Documents to store information about those Documents. This "attaching" may occur when the Document is first imported OR when an analysis is run on a Document. Each `Attribute` has a `displayName` that is used in the GUI, a `value` that contains the stored information (as a String), and a `type` that describes what `DataType` the value String must satisfy. This `type` is also essential as when a `DisplayPlugin` wishes to specify a certain type (for example, a line chart plugin requesting a `CoreType.NUMERICAL` for the y-axis), the framework will only return compatible attributes. ### DocumentCollection The `DocumentCollection` class is simply a container class that pairs a `String` display name to a `Collection<Document>` collection of documents. Essentially, the name of a directory allowing the user to select multiple documents in the UI at once by simply selecting the given display name. If a `DocumentCollection` contains only one document, the GUI will instead defer to the name (`toString`) of the document. This means it is safe for a plugin to use `null` or an empty value for the name of a singleton `DocumentCollection`. ## Creating your own plugins ### Data Plugin * Create your key constants and a map that links the keys to parameters, as shown below. * The first argument passed to the `Parameter` constructor is the prompt label that the framework will display to the user. * The use of a `LinkedHashMap` ensures that the framework will display the parameters to the user in the order you define. ```java private static final String PLUGIN_NAME = "Twitter", USER_KEY = "username", NUM_KEY = "numTweet", private final Map<String, Parameter> parameters = new LinkedHashMap<>(){{ put(USER_KEY, new Parameter("Twitter Handle (e.g. @joshbloch)", CoreType.STRING)); put(NUM_KEY, new Parameter("Number of Tweets", CoreType.INTEGER)); put(INCLUDE_RTS, new Parameter("Display Retweets?", CoreType.BOOLEAN)); }}; ``` * Return the map created in the `getParameters()` function * The framework calls the `getParameters()` function to understand what user inputs are required by the plugin and what data types are expected by each parameters. ```java @Override public Map<String, Parameter> getParameters() { return parameters; } ``` * In the `getDocument` function, the plugin should first ensure that the `paramMap` given has all the keys you expect. Throw an IllegalArgumentException if there are missing keys. ```java // Make sure all parameters are provided for (String key : parameters.keySet()) if (!paramMap.containsKey(key)) throw new IllegalArgumentException("Missing parameter: " + key); ``` * In the `getDocument` function, the plugin should parse the `paramMap` argument to extract the user inputs it needs. The framework already ensures the user input is of the specified type, thus we can safely parse the input without worries. ```java String username = paramMap.get(USER_KEY); int numTweet = Integer.parseInt(paramMap.get(NUM_KEY)); ``` * The `getDocument` function should create a `DocumentCollection` object to return. Like so: `return new DocumentCollection(collectionName, docs);` * The collectionName will be used by the framework as a title to display this collection of documents * the docs is a collection of documents * The `toString` method provides the name of the plugin, this name should be unique among all the data plugins. ### Display Plugin * Create a `<String, Parameter>` map and a `<String, DataType>` map. * The `Parameter` map specifies any user inputs (e.g. "Chart Title", "Number of Columns") * The `DataType` map specifies which attributes the plugin requires and the types of those attributes (e.g. "y-axis" may require `CoreType.NUMERICAL`) * Note: It is possible for a plugin to not require parameters, attributes, or both. In this case, the maps should be empty rather than `null` * Return the `DataType` map in `getDataTypes` function, and the `Parameter` map in `getUserParameters` * The framework calls `getDataTypes` to create a list of compatible attributes by indexing each document's available attributes and the plugins loaded. * The framework calls the `getUserParameters` to understand user input parameters needed by the DisplayPlugin. ```java private static final TITLE_KEY = "title", COLUMN_ONE = "Attribute 1", COLUMN_TWO = "Attribute 2 (optional)", COLUMN_THREE = "Attribute 3 (optional)"; // The first argument given to Parameter will be displayed. private final Map<String, Parameter> userParameterMap = Map.of( TITLE_KEY, new Parameter("Table Name (optional)", CoreType.STRING) ); // The key of the maps will be displayed as well. // Using a LinkedHashMap preserves order in the GUI later. private final Map<String, DataType> dataTypeMap = new LinkedHashMap<>(){{ put(COLUMN_ONE, CoreType.ANY); put(COLUMN_TWO, CoreType.ANY); put(COLUMN_THREE, CoreType.ANY); }}; ``` * In the `visualize` function, the plugin should first parse the expected inputs like below. * The `col1`, `col2`, and `col3` are the data inputs. They contain the keys that the plugin can use to obtain a specific Attribute in a Document. * You can access the Attribute like so: `Attribute attr = document.getAttribute(col1)`, and you can get the value from the attribute. ```java String col1 = parameters.get(COLUMN_ONE), col2 = parameters.get(COLUMN_TWO), col3 = parameters.get(COLUMN_THREE); ``` * After obtaining all the values you needed, simply create the visualization in a `JFrame` and return that `JFrame` to the framework! The framework will handle displaying the `JFrame` (e.g. no need to use `setVisible(true)` on the frame). ### Analysis Plugin If you wish to perform other analysis that interests you, you may implement your own analysis plugin following these steps: * Create a `Collection<Parameter>` and return it in the `getParameters()` function. * The Framework calls `getParameters()` to understand what Attributes (and importantly, their DataTypes) this plugin will add to the Documents upon calling `analyze` * The Parameter's display name should also be the key when calling `putAttribute` on the Documents * The Parameter's type allows the Framework to show this Attribute as selectable for appropriate DisplayPlugins * Think about how you will store your information and use a CoreType if possible. If not, you can implement your own DataType as explained above ```java private static final String PLUGIN_NAME = "Sentiment Analysis", SCORE_NAME = "Sentiment Score", MAG_NAME = "Sentiment Magnitude", TAG_NAME = "Description of Sentiment"; // Collection of all the attributes this plugin adds private final Collection<Parameter> attributeList = List.of( new Parameter(SCORE_NAME, CoreType.NUMERICAL), new Parameter(MAG_NAME, CoreType.NUMERICAL), new Parameter(TAG_NAME, CoreType.STRING) ); @Override public Collection<Parameter> getParameters() { return paramList; } ``` * In the `analyze` function, the plugin should perform the analysis and attach all the promised Attributes to each Document in the Collection given. * Use the function `putAttribute(key, Attribute)` on the Document to add the Attribute to the document * To avoid doing double work, feel free to check if the Document already has the desired Attributes before performing computation since Documents retain Attributes throughout their lifetime in the framework (especially if the analysis is expensive) * Be sure that ALL of the promised Attributes (from `getParameters`) are present on ALL of the Documents when `analyze` completes, otherwise you will get an error ```java @Override public void analyze(Collection<Document> documents) { for (Document doc : documents) { Sentiment sentiment = null; // Only perform API call if the document does not already have the attributes. if (!doc.hasAttributes(Set.of(SCORE_NAME,MAG_NAME,TAG_NAME))) sentiment = getSentiment(doc.getText()); if (sentiment != null) { // attaching attributes to document String scoreStr = Float.toString(sentiment.getScore()); String magStr = Float.toString(sentiment.getMagnitude()); String tagStr = sentimentTag(sentiment.getScore(),sentiment.getMagnitude()); Attribute scoreAttr = new Attribute(SCORE_NAME, CoreType.NUMERICAL,scoreStr); Attribute magAttr = new Attribute(MAG_NAME, CoreType.NUMERICAL,magStr); Attribute tagAttr = new Attribute(TAG_NAME, CoreType.STRING,tagStr); doc.putAttribute(SCORE_NAME,scoreAttr).putAttribute(MAG_NAME,magAttr).putAttribute(TAG_NAME,tagAttr); } } } ``` * That's it! You should now be able to add your Analysis Plugin to the file: resources\META-INF\services\edu.cmu.cs.cs214.hw5.framework.core.AnalysisPlugin ## Getting Plugin Keys Some plugins may access third-party APIs that require credentials. The following plugins are listed here. ### SentimentAnalysis (Google NLP) * Go to the [Google Cloud Documentation](https://cloud.google.com/docs/authentication/production#manually) * Follow the instructions under "Passing credentials manually" 1. Create a service account 2. Obtain service account credentials file (`.json`) 3. Set your `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the file path of your credentials file ### TwitterPlugin (TwitterAPI) * Go to the [Twitter Developer Portal](https://developer.twitter.com/en/apply-for-access) and apply for a developer account * Enter the portal and find your credentials * Create a `twitterConfig.json` file such as below: ```json { "consumerKey": your API Key, "consumerSecret": API Secret, "accessToken": Access Token, "accessTokenSecret": Access Secret } ``` * Lastly, place the `twitterConfig.json` you created into relative file path`src/main/resources/config/`. In other words, the TwitterPlugin expects to read your API key in path `src/main/resources/config/twitterConfig.json`

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.