# Joint NLIWOD and PROFILES workshop @ ISWC 2020
Websites:
* https://www.nliwod.org/program
* http://profiles2020.l3s.uni-hannover.de/index.php/programme-2
## Program and Notes
* *NLIWOD* - **Keynote**: [Bhaskar Mitra](https://twitter.com/UnderdogGeek) - Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
* If you use proprietary datasets, no one can reproduce them
* Neural Models are now in 79% of SIGIR papers
* We still lack public IR benchmarks with large scale training data
* Even industrial Teams now use BERT on a day-to-day basis
* Challenges:
* Single-Deadline/Single-Submission Challenges (such as TREC)
* Leaderborad benchmarking lead to overfitting
* Approaches have to work on more than one dataset
* Bender-Rule: English is not the only Language
* Cross-flow between communities needed!
* Questions:
* Are we hitting a glass ceiling with current ML models? A: More general purpose models
* What advice would you do to researchers working on other languages, where the challenge for benchmark is even harder? A: Start building Benchmarks and gather a community around it
* *NLIWOD* -Chatbot For Interacting with SDMX Databases - Guillaume Thiry, Ioana Manolescu and Leo Liberti
* Ranking of queries/datasets can be supported by metadata
* Usage of DataCubes still relevant
* Real world use on the horizon - OECD
* Questions:
* What is important to your approach to generalize?
* *NLIWOD* -Verbalizing the Evolution of Knowledge Graphs with Formal Concept Analysis - Martin Arispe, Mayesha Tasnim, Damien Graux, Fabrizio Orlandi and Diego Collarana
* Formal Concept Analysis to find hierachies in real-world KGs
* Questions:
* Which verbalisation functions did you use? A: We are currently in the phase of trying out different ones.
* What is the performance? A: FCA can deal with big data already now.
----
* *PROFILES* - **Keynote**: Prof. Dr. Felix Naumann - Data Profiling in the Relational World
* Commercial tools are still not there yet
* How to efficiently find good dependencies? Algorithms!
* Questions:
* Have you considered how users can be involved to quickly reduce the search space? A: Show the results as they created and show them to users as early as possible.
* Databases usually follow the closed-world assumption. What to consider for your proposed algorithms if that is not given?
* What happens in the presence of NULLs? A: Algorithms can deal with the answer but there are challenges!
* *PROFILES* - An Architecture for Cell-Centric Indexing of Datasets - Lixuan Qiu, Haiyan Jia, Brian Davison and Jeff Heflin
* Table indexes are typically created on the table-level or column-level
* Usage of cell-centric index that involves metadata, cell values and other values (context) in the respective row
* Question:
* How flexible is your cell indexing approach towards enriching the set of indexed fields (title, context,...), in particular w.r.t. dataset profiles? A: ElasticSearch easily allows addition of further search fields.
* *PROFILES* - A Template-Based Approach for Annotating Long-Tail Datasets - Daniel Garijo, Ke-Thia Yao, Amandeep Singh and Pedro Szekely
* Table annotation typically requires expertise in semantic technologies
* Users add meta data to the table to support the transformation of the table into a KG
* Question:
* Which Wikifier do you use? How do you understand columns? A: External Service based on Wikidata, but that is not the bottleneck. For example, property linking.
* *NLIWOD* - Generating Knowledge Graphs from Unstructured Texts: Experiences in the eCommerce Field for Question Answering - Diogo Sant'Anna, Rodrigo Caus, Lucas Ramos, Victor Hochgreb and Julio Cesar Dos Reis
* Question Answering in GoBots can increase sales by 120%
* Entity and Intent based QA systems
* Question:
* What do you use for training? Propriatary data and the Rasa framework.
* Your precision is really high, how about the recall? We did measure it, please see paper.
* *NLIWOD* - Generating Grammars from lemon lexica for Questions Answering over Linked Data: a Preliminary Analysis - Philipp Cimiano, Basil Ell, Viktoria Benz and Mohammad Fazleh Elahi
* Question Grammar generation from a lemon lexicon based on LTAG grammars and LexInfo ideas
* Advantage: portability between domains without training data and auto-completion
* Question:
* Have you thought of combining word embedding model with lemon lexicon? We will look into it since it can add synonyms also in high-dimensional space?
## Community Discussion and Best Presentation Awards
For choosing the best presentation please go to www.menti.com and use the code 71 88 38 7
Feedback:
* More people can participate due to lower entrance barrier
* Split it in two events to ensure people from all time zones can particpate more easily
* How can we open the mic better to have more people ask more question?
* Live conference is preferred over pre-recorded videos in the workshops