csvs-nodejs post

# csvs This autumn I decided to dedicate a portion of my time to an open source project. I chose to work on a NodeJS [app](https://github.com/fetsorn/csvs-nodejs) for csvs databases because I previously participated in the project and know their stack well. csvs is a plain text database that uses git for storage and rollbacks. The NodeJS CLI utility synchronizes csvs databases with each other and with data in other formats - JSON, filesystem directories, social media backups. I took up the task of integrating the org-mode data format and providing reports on database statistics. ## org-mode export Early September I picked up the org-mode export feature. Internally in csvs-nodejs each database is represented as a JSON object. So this entry ``` | datum | uuid | actdate | saydate | actname | sayname | files | | some text | 48c475bd | 1599-01-01 | 1599-01-01 | william | mary | 7e7b8e | | files | uuid | filename | filehash | | 7e7b8e | b8c80611 | "firstpic.jpg" | 8236ea5a0 | | 7e7b8e | e92071cb | "secondpic.jpg" | 0aa477f97 | ``` is represented as this JSON object ``` { "_": "datum", "UUID": "48c475bd", "datum": "some text", "actdate": "1599-01-01", "saydate": "1599-01-01", "actname": "william", "sayname": "mary", "files": { "_": "files", "UUID": "7e7b8e", "items": [ { "_": "file", "UUID": "b8c80611", "filename": "firstpic.jpg", "filehash": "8236ea5a0" }, { "_": "file", "UUID": "e92071cb", "filename": "secondpic.jpg", "filehash": "0aa477f97" } ] } } ``` Another way to represent database entries is org-mode markup format. org-mode is a more complex alternative to markdown or YAML which allows storing multiple plain text entries along with front matter metadata for each. The same entry in org-mode looks like this ``` . * :PROPERTIES: :uuid: 48c475bd :actdate: 1599-01-01 :saydate: 1599-01-01 :actname: william :sayname: mary :files: (:_ 'files' :UUID '7e7b8e' :items ((:_ 'file' :UUID 'b8c80611' :filename 'firstpic.jpg' :filehash '8236ea5a0')(:_ 'file' :UUID 'e92071cb' :filename 'secondpic.jpg' :filehash '0aa477f97'))) :END: some text ``` According to the specification, this command has to read the database and output a valid org-mode file `csvs -i /path/to/database -t biorg` The import of a csvs database has already been handled and the cli returned a NodeJS stream of JSON entries. To support org-mode export I needed to add a WriteableStream that would turn each JSON entry into org-mode text and output that to stdout. ## org-mode import Now I needed to parse that org-mode file and import it back into the CLI. the `org-mode-parser` library did well at converting org-mode entries into JSON, and before long the first PR [the first PR](https://github.com/fetsorn/csvs-nodejs/pull/1) was ready. Arrays were stored as Emacs Lisp property lists so I had to parse them separately. I learned from reading around the net that parsing involves tokenization, and building an abstract syntax tree. I found functions for each step in [manila/node-lisp-parser](https://github.com/manila/node-lisp-parser) and adapted them to plists. I submitted the [second PR](https://github.com/fetsorn/csvs-nodejs/pull/2) and proceeded to statistics reports. ## csvs stats In October I worked on reporting database schema and stats. The schema for each csvs database is stored in a `metadir.json` according to [fetsorn/csvs-spec](https://github.com/fetsorn/csvs-spec). each key in the schema object represents a database entity, and if entity has a parent, its name is specified in the "trunk" field. A branch without a trunk is called a root. Multiple roots are allowed. A branch that has a trunk is called a leaf. ```json { "datum": { "type": "string" }, "saydate": { "trunk": "datum" }, "sayname": { "trunk": "datum" }, "files": { "trunk": "datum", "type": "array" }, "file": { "trunk": "files", "type": "object" }, "filename": { "trunk": "file", "type": "string" } } ``` Each branch has a type - “string”, “number”, “object” or “array”. Values of type “array” have multiple leaves of type "object". I took up the task to add a `--stats` flag that would show the schema in the form of a tree. ```sh csvs -i ./database --stats entries: 65 datum |- actdate |- actname |- files |- file |- filename ``` First I built a nested object representation of the schema, and then printed a tree line with indentation for each level of nesting. The [PR](https://github.com/fetsorn/csvs-nodejs/pull/5) was accepted soon after. ## biorg stats The org-mode format does not specify a database schema, but it can be inferred from the structure of JSON objects returned from the parser. In case entries omit metadata, I looked for the one with the largest number of properties and printed as a tree and submitted the the [PR](https://github.com/fetsorn/csvs-nodejs/pull/6). ``` csvs -i ./data.bi.org --stats datum |- actdate |- actname |- files |- file |- filename ``` ## Conclusion While working on this project I learned to interact with the filesystem, got to know how to deal with NodeJS streams, and wrote my first parser. Thanks to my team at NobleScript for supporting the initiative to develop open source and for the help with finding solutions.