# Eh, What's up Doc?
## Objective Summary
Trying to get Doc Brown 'feature complete' (i.e. to do everything Raphtory does + more) straight away is biting off a bit more than we can chew. This Document breaks down the stages of development for us to aim for.
Note any features mentioned below are planned to be available within both python and rust unless specified otherwise. There is a discussion on each topic following this summary.
### 0.1
* No model change from what is currently available **-- DONE**
* Direct ingestion only (add_vertex, add_edge, etc.) **-- DONE**
* Basic windowed graph, vertex and edge views built on top of GraphDB. **-- DONE**
* Functions on the graph for:
* Basic metrics (i.e. number of nodes, number of edges, if a vertex exists, if any edge exists), **-- DONE**
* Access to individual vertices and edges **-- DONE**
* Access to collection of all vertices and edges **-- DONE**
* Functions on the vertex for:
* Basic helper functions already present i.e. degree **-- DONE**
* Access to Neighbours (including out_neighbours().out_Neighbours(), etc.) **-- DONE**
* Access to update history.
* Access to property history.
* Functions on the edge for:
* Accessing the source and destination vertex
* Accessing the update history.
* Accessing the property history.
* All collections of vertices/edges should be returned as iterators. **-- DONE**
* Saving and loading serialised graphs exposed to the user.
### 0.2
* Perspective API's from Raphtory ported over (range, at, walk, etc.) **-- DONE**
* Model updated to include meta data and multi-layer edges
* Direct ingestion API updated to reflect new model features.
* Global State/accumulators available for algorithms. Depending on how this goes having these hidden from Python is fine - i.e. I can call `connected_components(graphview)` and get an answer, but can't write it myself in python.
* Temporal neighbour/history functions ported from Raphtory - `get_out_neighbours_after()` etc.
* Algorithms requiring state i.e. connected_components, page_rank etc. currently in backlog benchmarked.
### 0.3
* Time API's from Raphtory ported over - Timestamp queries, increments and windows in String format
* If there was some difficulty providing accumulators in python, this should be solved.
I believe at this point we will be able to replace the Raphtory Repo with the docbrown code
### 0.4
* Dataframe based ingestion avaiable - both Polars and Pandas (Polars has a from_pandas() function so this can be just a wrapper)
* Window views on the graph reworked as the base view which can be extended with other views such as `undirected`, `reverse`, `layer selection`, etc. which require no additionaly functions over the basic view, just operate differently.
### 0.5
* If we have requests from customers for fast ingestors from specific sources we have started to build these and push them back to the repo - i.e. Neo with Bolt, nplan with Big Query.
* Complex views requiring new functions chainable with those previously defined - (temporal) multi-layer, signed, etc.
* PagedGraph picked back up - should give us some time to understand how people are using Doc to feed into this.
## Ingestion
We have discussed three ways to get data into Raphtory for the initial release of Doc Brown:
* Direct ingestion via calls to add_vertex(), add_edge(), etc.
* Ingestion via custom loaders - load_from_csv(), load_from_json(), etc.
* Ingestion via a dataframe - both Polars and Pandas
From these our current opinion is that the first option does the trick for testing both in Python and in Rust (fast enough to get a couple of gig of data loaded) and will allow users to start using Doc Brown.
For the second option we run into several API questions, for example: we need to allow users to specify how to parse the ID's (string or int), how to parse the time (epoch or timestamp), what types the properties should be, what is meta data and what are properties.
This lead us to a third option of relying on the heavy lifting for types to be handled within a dataframe (which has the secondary benefit of users being comfortable with it). We still have some of the API questions here, but it should just be a case of specifying which columns refer to what within the graph.
This third option also allows us to leverage the connectors these dataframes APIs have for different databases, file formats etc.
### Outcome
With this in mind the current plan is to pause work on anything other than direct ingestion. We will need to add functionality for metadata and edge layers. This allows users to start modeling with these concepts and give us feedback for the dataframe/loader API's.
Once we have some users modeling graphs with direct ingestion, we can continue work on ingestion via Dataframes. This will prpredominantly be investigating what sort of data is being ingested into raphtory, what the breakdown of files are (edge files vs vertex files, structural data vs metadata, etc) and if there are clear helper functions to be defined.
Finally, further down the line, once work in DocBrown is being scaled up on specific use cases we can look to build custom loaders as and when people require them/wish to pay for them.