<human> EXTRA wdt:P31 {
wdt:P22 @<father> *;
}
<father> EXTRA wdt:P31 {
wdt:P735 [<human>/P734/P31/PQ642] * ; # given name
wdt:P734 [<human>/P734] * ; # family name
}
El ejemplo con ShEx de verdad sería:
Background
hackmd-github-sync-badge
In this report, we describe the activities that we have been carrying on during the Biohackathon 2023, held in Shodoshima, Japan. The main goal of the project has been to identify approaches and issues that can be used to integrate large RDF datasets by creating subsets described by Shape Expressions [@EricSemantics2014].
We have recently submitted a publication on creating subsets from Wikidata [@SeyedWikidataSubsetting23].
Wikidata is a knowledge graph which is constantly in flux and getting to a size which makes it hard to locally replicate. By creating topical subsets we are able to dissect a managable subset that can be loaded in local RDF stores for further processing.
However, this subsetting app approach relies on Wikidata daily dumps, which are available in JSON format. For this hackathon
we specifically choose to extend the subsetting mechanisms to work on RDF dumps or SPARQL endpoints.
Enhancement and Reusage of Biomedical Knowledge Graph Subsets
Abstract
Knowledge Graphs (KGs) such as Wikidata act as a hub of information from multiple domains and disciplines, and is crowdsourced by multiple stakeholders. The vast amount of available information makes it difficult for researchers to manage the entire KG, which is also continually being edited and changing its content. It is necessary to develop tools that extract snapshots and subsets for some specific domains of interest. These subsets help researchers by reducing costs and ease accessbility to data of interest. In the last two biohackathons, we have identified this issue and created prototypes to extract subsets easily applicable to Wikidata, as well as to define a map of the different approaches used to tackle this problem. Building on those outcomes, we aim to enhance subsetting in both definitions using Entity schemas based on Shape Expressions and extraction algorithms, with a special focus on the biomedical domain captured by entity schemas like the one defined in the GeneWiki project. Our first aim is to develop complex subsetting patterns to cover subsetting based on qualifiers and references for enhancing credibility of datasets. Our second aim is to establish a faster subsetting extraction platform applying new algorithms based on Apache Spark and new tools like a document-oriented DBMS platform. During this biohackathon, we aim to explore reuse workflows of Wikidata subsets specifically with respect to drug repurposing. The biohackathon will assist in an evaluation of existing nodes and edges on drug-target interactions categories within Wikidata, and if these are in need of updates as well as deeper annotation. We would also aim to deliver machine readable schemas of drug-target interactions in Wikidata for future data reuse.
Report about project 11 at Biohackathon-Europe 2022
Links
HackMD notes
notes from 2021 (Barcelona)
carogc1396 changed 2 years agoEdit mode Like Bookmark
Abstract
Introduction (what is serverless, microservicios)
introduction what is serverless and microservicios
wether the initial state is a monolith maybe
Background and motivations (advantages and adoption of serverless -> migrate legacy software trouble -> serverless)
3.1 Advantages and adoption of serverless (it does not fit for all use cases)
3.2 Migrate legacy software trouble
3.3 Cloud provider options, why AWS and consecuences of choose one or other...
Links
The contents of this repo are created also on HackMD
Github repo: https://github.com/kg-subsetting/biohackathon2021
Notes of 8th Nov 2021
TODOs
List existing subsetting tools & methods
Create Gene Wiki subset from the eLife paper
Sabah Ul-Hasan changed 3 years agoView mode Like Bookmark