Neo4j Doc
===
:::info
We can test on Ken's demo @ http://35.187.240.245:2251/
[Set-Up](#Set-Up)
[Creation](#Creation)
[Update]
[Fundamental Query](#Fundamental-Query)
[Composite Query](#Composite-Query)
[Sample Queries and Translation](#Sample-Queries-and-Translation)
:::
### Set Up
#### DB visualisations
[Arcade Analytics](https://arcadeanalytics.com/)
[neo4j like querying](https://github.com/AdrianInsua/neo4j-dashboard)
[JQA](https://github.com/softvis-research/jqa-dashboard) with [demo]()
#### Start VM and Neo4j Server
Ensure Capstone ASKIE project
GCP -> VM instance -> start VM
GCP -> Depolyment Manager -> get address and password
Either:
Connect via web/local neo4j app using the address and password
Or:
```
pip install neo4j
```
```python=
from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://address:7687", auth=("username", "password"), encrypted=False)
with driver.session() as session:
result = session.run(query)
for record in result:
print(record.values()[0])
driver.close()
```
Result are returned as a BoltQuery Object
Detailed break down of this data structure is [here](https://neo4j.com/docs/api/python-driver/current/types/graph.html)
Useful package
[py2neo](https://py2neo.org/)
```
pip install py2neo
pip install neomodel
```
### Creation
Create Node with its property
Then Create relationship
```
load csv with headers from "file:///val.csv" AS row
MERGE (p1:Node {name: row.head})
MERGE (p2:Node {name: row.tail})
WITH p1, p2, row
CALL apoc.create.relationship(p1, row.relation, {}, p2) YIELD rel
RETURN rel
```
### Fundamental Query
#### 1st Degree by name
Return 1st degree link of a particular node by name
Output JSON, ```Return x``` for nodes instead
```
MATCH (n)-[r]-(x) WHERE (n.name = 'U.S.')
RETURN COLLECT({head:n.name ,tail:x.name,relation:type(r)}) AS jsonOutput
```
#### by relationship
Return node-node link with particular relationship
Output JSON
```
MATCH (n)-[r:conflict]-(x)
RETURN COLLECT({head:n.name ,tail:x.name,relation:type(r)}) AS jsonOutput
```
#### Multiple options for properties
```
MATCH (n)-[r]-(x) WHERE (n.name in ['U.S.','California'])
RETURN COLLECT({head:n.name ,tail:x.name,relation:type(r)}) AS jsonOutput
```
### Composite Query
#### 2nd degree
```
match (x)-[r]-(y) where (x.name = "United States")
with y
Match (y)-[r]-(z)
return COLLECT(distinct {head:y.name ,tail:z.name,relation:type(r)}) AS jsonOutput
```
### Ideas
https://github.com/FerreroJeremy/ln2sql
### UI Ideas
Left panel are the tools (collapsable) (create,delete,amend,search)
Middle panel for Main Graph(or largest cluster) - On create,delete,amend -> Zoom into appropriate nodes
Right panel for information Display(by default close) - onclick node in mid graph,open right panel and open new tab
Scrub CSS, pending massive improvement


## Sample Queries and Translation
Zhou Zhi & Qing Ze: Familiarize with current knowledge graph and work with Sam/Martin to educate/discover examples of queries useful to them
Eg 5 queries/questions for “Startup News” and “E-Commerce” respectively
Eg “How many entities invested in company X?”
Eg “For X brand, what is their cheapest product?”
Resources: “GraphPage” on streamlit demo (http://35.187.240.245:2251/)
It shows what relation types there are, eg Investor, FoundedBy, Price etc
It shows what attributes each node and each edge has (click “Schema”)
These are useful for what kind of queries are possible
Click “CustomQuery”: This allows to enter any cypher query and see result
Another option: Build docker system locally and use Neo4j browser to try queries
https://git.reddragon.ai/RedDragonAI/ASKIE/src/master/ent_link_flask_api/wikidata
1. Popularity of X item. Search for all instances X item appears
```
Match (n) Where n.label ="Google" Return * Limit 5
Or
MATCH (n)-[r]-(x) WHERE (n.label = 'Google') Return n,type(r),x
```
2. Aggregation of X items' entity/relations across multiple entities.
```
MATCH (n)-[r]-(x) WHERE (n.label = 'Google')
Return n,type(r),x,count(*)
```
3. Importance of X items, counting in-degrees and out-degrees
```
in-degree
MATCH (n)-[r]->() WHERE n.label = 'Google' RETURN COUNT(r)
out-degree
MATCH (n)<-[r]-() WHERE n.label = 'Google' RETURN COUNT(r)
```
4. Given a list of items, what is the most popular/important/best funded
```
MATCH(n)
With max(n.someproperty) as p1
//Where n.label in []
MATCH (b)
WHERE b.someproperty = p1 //AND b.label in []
RETURN b
```
5. Relative importance of X items, comparing aggregated in-degree and out-degrees
```
repeat qn3,i.e
MATCH (n)-[r]->() WHERE n.label = 'Google'
with count(r) as r1
MATCH (x)-[r]->() WHERE x.label = 'business'
return r1, count(r) as r2
```
6. Items most seen together, comparing the times items have the same entity/relation. Aids in clustering
```
// cluster
CALL gds.alpha.scc.stream({
nodeProjection: 'Entity',
relationshipProjection: 'InstanceOf' })
YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).label AS Name, componentId AS Component
ORDER BY Component DESC
// return largest cluster
MATCH (u:Entity)
RETURN u.componentId AS Component, count(*) AS ComponentSize
ORDER BY ComponentSize DESC
LIMIT 1
```
7. Given list of items, compare based on a specific entity/relation
```
Match n
Where n.label in ["a","b"]
```
8. Comparing similarity of entities. Checking which entities have the most similar kind of relations to other entities.
```
```
9. Reconstructing a site from sitelinks/data etc
```
```
10. Temporal comparison of relation/entity across past data
```
```
Goal is to have a generic plug-and-play set of queries
[CB Insights report: Venture Capital Funding Report Q2 2020
](https://www.cbinsights.com/research/report/venture-capital-q2-2020/)
**Types of queries**
- deal activity per quarter
- deals by geography, then compared
- number of IPO exits, compared QoQ
- highest quarter historically
[CB Insights report: 50 Future Unicorns](https://www.cbinsights.com/research/report/future-unicorn-startups-billion-dollar-companies/)

**Types of queries**
- company's financial health
- company product type (enterprise tools, search, customer protection etc)
- company by country
- type of industry (fintech, digital, security)
- type of market in country (emerging market, developed market)
- company by funding
- company by state
[CB Insights report: Here Are The Top AI Unicorns In Asia](https://www.cbinsights.com/research/asia-ai-unicorns-q1-20/)
**Types of queries**
- location of unicorns
- investor in unicorns
- type of product (computer vision for retail etc)
[CB Insights report: AI In Asia: The Impact Of Covid-19 On Funding, Exits, Valuations, And R&D](https://www.cbinsights.com/research/report/artificial-intelligence-asia/)
**Types of queries**
- acquisitions of specific companies (Intel spent X on Y)
- capital flows (companies from X country invested in Y country)
Arena - founded, last funded
Uber - what articles its taken from, then give the answer. or provide half answers/ that make up the eventual answer
how to add relationships that are needed, need to automatically add new relationships?