# Hyperreal Enterprises Backlog
## Abstract
This is an informal outline of the tactic state of Hyperreal Enterprises. The “Previous Work” section describes things that have been achieved and save-points for things that were not yet finished. The “Roadmap” section outlines specific directions for further work.
Previous work
=============
## Preface (2020)
Many of the ideas we discussed about "PlanetMath" can be revisited in new contexts.
- Some of the comments on a reorientation around *projects* would apply at the level of the **Peeragogy project**, and affiliated projects.
- It is inessential to have a separate place for Q&A right now, since Stack Exchange works great for that. However, what would be valuable would be to have a way to connect Q&A to other material. And we are interested in exploring Q&A using computational methods within **Hyperreal Enterprises**
## Introduction (2013)
Previews of our ideas about things to develop for PlanetMath can serve several related purposes:
1. They can show members of the PlanetMath community what sorts of exciting and innovative features are in the works.
2. They can interest potential donors and volunteers by showing them what their contributions would help accomplish.
3. If the pace of progress on PlanetMath seems too slow, the previews can potentially indicate where we\'re stuck and what can be done to bring it back up to speed.
4. The previews can also help to explain to newcomers what PlanetMath \"is\" -- and what it can become.
For example:
- **PlanetMath is an peer-produced mathematics encyclopedia.** **mature**
- **PlanetMath is a math wiki that blends ownership with universal editability using a distributed revision control system.** **complete**
- **PlanetMath is a testbed for the Planetary software project.** **complete**
- PlanetMath is a collection of peer-produced exercise workbooks and user-created learning pathways. **evolving**
- PlanetMath is a place to ask questions and build a shared knowledge base. **evolving**
- PlanetMath is an incubator for websites using Planetary software. **evolving**
- PlanetMath is a place to get free or for-fee mathematics teaching and tutoring. **evolving**
- PlanetMath is a guide to the mathematics literature. **proto**
- PlanetMath is a collection of public domain or liberally licensed books. **proto**
- PlanetMath is a free/open research repository. **proto**
- PlanetMath is an integration hub for other free/open math projects. **proto**
- PlanetMath is an interactive mathematics MUD/MUSH. **proto**
- PlanetMath is the mathematics component of an \"indie education bundle\" **proto**
- PlanetMath.org, Ltd. is a nonprofit specializing in mathematical
software services, mathematical communication, hypertext research,
and consulting. **proto**
Several of the most interesting and important previews are described in
the sections that follow.
## "What's Next for PlanetMath/Planetary?\"
**Lifecycle stage**: proto---***evolving***---complete---mature
The main platform, including:
- [NNexus](https://github.com/dginev/nnexus)
- [Planetary](https://github.com/MathHubInfo/Legacy-planetary)
has been doing quite well, but work is going slower now that the \"lead dev\" Joe Corneli has finished the work required for completing his thesis.
Nevertheless, the trials in this thesis show that the user interface
needs many improvements and a thorough usability review.
Furthermore, board member Deyan Ginev pointed out that the system is not really as easy to install or use as, say, MediaWiki, so a reasonable effort should be put into making a quick-start dev / deployment strategy.
In order to progress to stage 4, we need a solid developer group making regular commits. If PlanetMath itself is going to remain relevant, we
need to keep it as an integration platform for work.
### See also
See [planetary\#63](https://github.com/KWARC/planetary/issues/63), [planetary\#381](https://github.com/KWARC/planetary/issues/381), [planetary\#88](https://github.com/KWARC/planetary/issues/88) There are a lot of other basic usability issues collected in this PM ticket: [planetmath-docs\#5](https://github.com/holtzermann17/planetmath-docs/issues/5) and these Planetary milestones, which are currently past due: [PlanetMath Community](https://github.com/KWARC/planetary/issues?milestone=6&state=open) and [PlanetMath Community 2](https://github.com/KWARC/planetary/issues?milestone=9&state=open).
## Books
Lifecycle stage: ***proto***---evolving---complete---mature
Sources of content: (a) Retrodigitization, (b) importing content from other CC-By-SA or more liberally licensed sources, (c) re-using PlanetMath\'s existing content.
In more detail:
- Library of Congress + US Copyright office + Archive.org + Infty + some manual labour yields newly typeset out-of-copyright works (we\'ve successfully run this workflow with our Calculus book)
- Importing and enhancing material from websites such as StackExchange and Wikipedia that use the same license is quite feasible.
- Reusing existing material from the PlanetMath encyclopedia by building \"collections\" (e.g., exploiting the NNexus autolinker or other automated tools to assist with content assembly)
We have done a lot of background research on this. In order to get things moving and progress to Stage 3, we need some hands-on-the keyboard time (mathematics background would be helpful throughout, programming experience helpful for b and c). By default, this will evolve slowly as we assemble new courses and make further experiments with NNexus.
However, an influx of vounteer time (or funding) could make things progress more rapidly. (Further details on the Books preview are presented in an Appendix at the end of this document. With a bit more work, any one of these previews could be expanded in a similar fashion.)
### Requirements
If we are going to work on the retrodigitization project, **we would need to get an OCR system up and running**. Ray has thought some about a specific approach to improving OCR for old books, though we don't currently know whether this approach is still relevant.
### See Also
- [PM Cross Index](https://github.com/holtzermann17/AsteroidMetaArchive/blob/e36ffa53b9a30d33655ad4be4d80c4dba2d500f1/org/PM-Xi.org)
- [Index of Category Theory](https://github.com/planetmath/18_Category_theory_homological_algebra/blob/master/18-01-IndexOfCategoryTheory.tex)
## PlanetMath Outline Series
Lifecycle stage: ***proto***---evolving---complete---mature
The idea of building approximate \"feature parity\" with the popular Schuam\'s Outline Series gives us a list of desirable topics to cover (see above).[^1] As a rough measure of what it would take to establish \"feature parity\", let\'s imagine that each expository section of one of these Outlines is equivalent to an encyclopaedia entry. One of the books selected at random has 97 sections and 877 problems, together with 420 worked solutions.
Assuming all the books are of roughly the same size, this comes to something like 6000 encyclopaedia entries and 60000 problems with 20000 solutions, in total. The number of entries is approximately half the number we currently have in the encyclopedia; however, I think we would
have to add a few thousand entries to be really comparable, since our coverage in most areas is somewhat spotty.
If we can draw additional material from old out-of-copyright textbooks,
that would help. We could also draw many problems and solutions from the questions and answers on StackExchange.
### Topics of the outlines
Abstract Algebra, Advanced Calculus, Advanced Mathematics for Engineers and Scientists, Astronomy, Basic Business Mathematics, Basic Electricity, Basic Mathematics for Electricity and Electronics,Basic Mathematics with Applications to Science and Technology, Beginning Calculus, Beginning Finite Mathematics, Bookkeeping and Accounting, Calculus, Calculus of Finite Differences and Difference Equations, College Algebra, College Mathematics, Differential Equations, Discrete Mathematics, Electronic Devices and Circuits, Elementary Algebra, Elementary Mathematics, Elements of Statistics, Essential Computer Mathematics, Financial Management, Formulas and Tables, Geometry, Intermediate Algebra, Introductory Surveying, Lagrangian Dynamics, Logic, Mathematics for Liberal Arts Majors, Mathematics for Nurses, Mathematics for Physics Students, Mathematics of Finance, Matrix Operations, Operating Systems, Partial Differential Equations, Physics for Engineering and Science, Precalculus, Principles of Accounting, Probability, Probability and Statistics, Software Engineering, Statistics, Statistics for Engineers, Thermodynamics.
### Update (2020)
This could give some direction to investigate contents of Math Stack Exchange.
- We could scan, OCR, Index/NNexusify contents of Schuams outlines
- We could then use these contents to query Math Stack Exchange, to see if we can reassemble the contents into similar outline structure
- If we can't do this, this would cast some doubts on whether we can use the material for any kind of educational contents whatsoever!
## Centralized Bibliographic Database
Lifecycle stage: ***proto***---evolving---complete---mature
Software: BibServer (Drupal and LaTeXML compatibility layers to be
added)
Content: Initial content to be imported from the Library of Congress
\"QA\" section (the library catalog is public domain since it is a US
Government publication; we have the MARC records on disk and should
hopefully have them imported into BibServer this summer)
In order to progress to Stage 3, we need some programmer time. We
potentially have some help with BibServer lined up at the Open
University.
### See also
See [planetary\#80](https://github.com/KWARC/planetary/issues/80),
## Virtual Classroom
Lifecycle stage: ***proto***---evolving---complete---mature
Software for Version 0.1: Skype or Google Hangout + MathIM (a math enhanced chat) + PlanetMath forums, articles, collections, groups, and questions. (**This is working now.**)
Future versions:
- Improvements to PlanetMath to make it easier for teachers and students to use
- Tutorial booking and payments system
- More content (books, outlines, syllabi)
In order to progress to Stage 4, we need to run the course we are curently developing to see how that goes. We plan to start this in September and run the course for 10 weeks. This project has the benefit of having a built-in business model, so we\'re able to prioritize it. If it works, we can scale to meet demand by recruiting more teachers (and branching out to one-off tutoring).
1. See also
See [planetary\#350](https://github.com/KWARC/planetary/issues/350).
## Re-orienting PlanetMath around \"Projects\"
Lifecycle stage: ***proto***---evolving---complete---mature
At the organizational level, the preview projects outlined here make a great start!
A full implementation would require significant implementation effort towards \"PlanetMath 3.0.\" One reasonable technical step would be to
add Git compatibility to Planetary, which will make it easier for collaborating groups of authors to build their own projects.
Further improvements, like issue tracking, etc., would come later. Some of the relevant research background is outlined in Joe\'s thesis, but
the design specifics still need to be worked out. In the conclusion to this thesis, the following table is presented, which summarizes a
tentative design (Entity-Relation diagram) for \"PlanetMath 3.0,\" which would add support for projects, project updates, forks, and outcomes,
conjectures and ephemeral discussions to the existing prototype.
```
--- --- --- --- --- -------------------------------------------------------
c c c c c Context & Feedback & Quality & Structure & Heuristic\
\$\\begin{array}{c}
--- --- --- --- --- -------------------------------------------------------
A ← A\
A `\xleftarrow{\ell}`{=latex} A\
X `\hookuparrow`{=latex} 𝒳\
\\end{array}\$ & \$\\begin{array}{c} X ← T\
S ← R\
\\, 𝒳 → 𝒳^♯^ \\end{array}\$ & \$\\begin{array}{c} X ← Q\
A ← C\
\\,X → X\\,\'\
\\,𝒳⊧ 𝒳^☆^ \\end{array}\$ & \$\\begin{array}{c} A ← P ← 𝒥 ← S\
L ← A, P\
M ← A\
Q ← A \\end{array}\$& \$\\begin{array}{c} G ↩ U\
S↩ H\
Q,T⇀ C, W, P\
G\\, ↪ ℰ \\end{array}\$\
\$\\begin{array}{r@{`\hspace{2mm}`{=latex}}l} A &*a**r**t**i**c**l**e*\
ℓ &*l**i**n**k*\
X &*o**b**j**e**c**t* \\end{array}\$ &
\$\\begin{array}{r@{`\hspace{2mm}`{=latex}}l} T &*p**o**s**t*\
S &*s**o**l**u**t**i**o**n*\
R &*r**e**v**i**e**w* \\end{array}\$ &
\$\\begin{array}{r@{`\hspace{2mm}`{=latex}}l} Q
&*q**u**e**s**t**i**o**n*\
C &*c**o**r**r**e**c**t**i**o**n* \\end{array}\$ &
\$\\begin{array}{r@{`\hspace{2mm}`{=latex}}l} P &*p**r**o**b**l**e**m*\
L &*c**o**l**l**e**c**t**i**o**n*\
M &*c**l**a**s**s**i**f**i**c**a**t**i**o**n* \\end{array}\$ &
\$\\begin{array}{r@{`\hspace{2mm}`{=latex}}l} G &*g**r**o**u**p*\
U &*u**s**e**r*\
W &*r**e**q**u**e**s**t*\
H &*h**e**u**r**i**s**t**i**c* \\end{array}\$\
$\begin{array}{r@{\hspace{2mm}}l}
\mathcal{X} & \mathrm{project}
\end{array}$ & $\begin{array}{r@{\hspace{2mm}}l}
\sharp & \mathrm{update}
\end{array}$ & \$\\begin{array}{r@{`\hspace{2mm}`{=latex}}l} \' &
*f**o**r**k* \\\\\[-4pt\] ☆ & *o**u**t**c**o**m**e*\
\\end{array}\$ & $\begin{array}{r@{\hspace{2mm}}l}
\mathcal{J} &\mathrm{conjecture}
\end{array}$ & \$\\begin{array}{r@{`\hspace{2mm}`{=latex}}l}ℰ
&*e**p**h**e**m**e**r**a* \\end{array}\$\
```
In order to progress to Stage 3, we need to get our preview series out to the public on PlanetMath, making it clear how they can add new projects or assist with one of these.
1. See also
See [planetary\#351](https://github.com/KWARC/planetary/issues/351).
## Internationalization
Lifecycle stage: ***proto***---evolving---complete---mature
We\'re not working on this at the moment but we recognize that it would be worthwhile! There\'s a technical component (internationalizing the
software) as well as a major content-porting initiative. This may intersect with the Books preview. There are possible connections with
current work at KWARC towards building an internationalized glossary. It could be useful to mirror internationalization at the organizational
level (with different computers + nonprofit orgs in different countries)
In order to progress to Stage 2 or 3 we would need some technical work, capacity building work, and hands-on-the-keyboard time. We can probably
import a lot of content from other sources that would give us a boost once we have the platform ready.
1. See also
See [planetary\#309](https://github.com/KWARC/planetary/issues/309).
## Computer Math and KRR
Lifecycle stage: ***proto***---evolving---complete---mature
*NB.* this set of features is \"proto\" on PlanetMath, but for our collaborators at [KWARC](http://kwarc.info) there are various projects in these areas, at all stages of completion!... accordingly, it would be great to integrate more KWARC tools into PM (e.g., make sTeX/OMDoc an optional format).
MathML that is produced by LaTeXML from regular TeX isn\'t particularly
meaningful! `$f(x)$` reads as \"f times x\" in MathML. sTeX could be
used to make a canonical (non-ad hoc) coding, to enable a regular TeX
parse. These features would enable improved semantic searching and
similar functionality. See <http://kwarc.github.io/> for a partial list
of other projects that build on this infrastructure.
In order to progress to stage 2 or 3, there are some specific Github
tickets describing sTeX integration to deal with. There are benefits to
using PM as an integration platform.
1. See also
See [planetary\#358](https://github.com/KWARC/planetary/issues/358), [planetary\#216](https://github.com/KWARC/planetary/issues/216), [planetary\#340](https://github.com/KWARC/planetary/issues/340), and \[\[<http://kwarc.info/kohlhase/papers/malog10.pdf>\]\[eMath 3.0: Building Blocks for a Social and Semantic Web for Online Mathematics & eLearning\]\].
Experimental Math
-----------------
Lifecycle stage: ***proto***---evolving---complete---mature
We\'ve done nothing about integrating SAGE or Maxima into PlanetMath so far, but we think it would be a good idea. SAGE has a nice web format
already - maybe we could re-use it in some way on PlanetMath.
In order to progress to stage 2 or 3, we would want to make contact with
some people in the SAGE community and find a concrete project to work
on. Deploying SAGE notebooks as an alternative content type or
sub-object that can be used within PlanetMath articles would be great.
1. See also
See [planetary\#417](https://github.com/KWARC/planetary/issues/417), [WorkingWiki](http://www.mediawiki.org/wiki/Extension:WorkingWiki).
Hypertext and Metamathematics
-----------------------------
Lifecycle stage: ***proto***---evolving---complete---mature
Often distinct from but sometimes overlapping with the ideas in the [Computer Math preview](https://github.com/holtzermann17/planetmath-docs/issues/76), we\'ve been actively looking at and working on issues like these, over
the past decade or so:
- Theoretical background work on representing mathematical knowledge as semantic hypergraphs and reasoning with it.
- Theoretical background work on linguistics and mathematics.
- Several versions of Arxana (an Emacs-based hypertext system) have been developed, which will eventually integrate with the above.
In order to progress to stage 4: (a) Round-tripping and exporting from
LaTeX documents and Emacs to/from PlanetMath; (b) finishing some full
interactive examples of network programming / network math inside Emacs.
An overview of a possible computational agents approach.
--------------------------------------------------------
One approach would be to formalize the theory of peer learning that Joe
Corneli developed in his thesis in a system of computational agents that
can collaborate to solve mathematical problems.
There is some precedent for this sort of work within the knowledge-rich
artificial intelligence tradition (e.g. Marvin Minsky\'s *Society of
Mind*), although not typically with a mathematics focus.
In his thesis, Joe observed that paragogy has structural similarities to
Imre Lakatos\'s description of mathematical argumentation, as embodied
in the dialogs in his *Proofs and Refutations*. There are also parallels
to Martin Nowak\'s work on the evolution of cooperation in a game
theoretic setting. Previous research on Lakatos-style computational
agents was carried out Alison Pease, a philosopher of mathematics, who
Joe interviewed during the requirements-gathering phase of the thesis.
However, this prior work does not fall within the knowledge-rich AI
tradition, but instead focuses on theory construction from axioms.
The plan would connects the new theory of paragogy with my earlier
writing on hypertext and knowledge representation for mathematics
([Corneli & Krowne](http://metameso.org/~joe/docs/sbdm.html) (2005),
[Corneli & Puzio](http://arxana.net) (2013),
[Corneli](http://metameso.org/~joe/math/Xi.pdf) (2003, unpublished),
[Corneli](http://metameso.org/~joe/math/100.pdf) (2004, unpublished)).
This work provides both technical and theoretical foundations, but full
articulation of the project will take considerable time and effort. The
results could transform mathematical practice, by offering automated
assistance at many points in the learning and discovery process.
1. See also
<http://arxana.net>, [planetary\#416](https://github.com/KWARC/planetary/issues/416), and the old [Hyperreal Dictionary of Mathematics](https://github.com/holtzermann17/AsteroidMetaArchive/blob/master/org/The_Hyperreal_Dictionary_of_Mathematics.org) wiki page.
Appendix: Further details for the books project
-----------------------------------------------
Mock-up/demo
------------
*Put in some artistic impression of what the books section might look
like.*
Detailed description
--------------------
The purpose of the book project is to make mathematical books in the
public domain accessible to the general public in the form of a
collaborative digital library. To accomplish this goal, we plan to
design and build a system comprised of three interoperating components.
The first subsystem is a retrodigitization toolchain. When complete,
this system will allow one to start with a phyiscal book on a library
shelf, scan it in to a computer, then subject the result to a series of
processing steps which result in a TeX representaion of the book\'s
contents. While the software for this already exists and has been
tested, there is room for improvement; by introducing image
preprocessing, clustering, and postprocessing, one should be able to
significantly improve the accuracy of the process. Given that
proofreading and correcting errors is a labor-intensive process, the
labor saved by improving the OCR process justifies the effort.
Since, even with these improvements, this process is not 100% accurate,
we need the next component, which is an editorial workflow. Based upon
the CBPP approach which has been in use for the last decade to produce
the PM encyclopaedia and inspired by predecessors such as the St. Pachomius Library and Project Gutenberg\'s Distributed Proofreaders,
this system will coordinate the proofreading of mathematical works by
members of the PM community. To participate, a member would start at a
page which lists the various works which have been processed but not yet
proofread. Upon picking a work, the member would be assigned a page. To
work on the page, there would be a webpage which displays the original
text, the computer output from the OCR suite, and the rendering of that
output. The proofreader\'s job is to ensure that the rendered output
agrees with the original text and, if not, to edit the output as
appropriate. Once this is done, an editor will double-check the result
and, once all pages have been satisfactorily edited, the system will
collect the results and collate them into a hypertext edition.
The third and final component is a reading room which makes the results
available to the reading public. To locate books, there will be a
catalogue, search facility, and recommender. Once one has located a
book, one can read it in several forms. The primary form is hypertext
enhanced with links to the encyclopaedia, cross links to other books,
notes, reviews, problem solutions, and the like. There will also be
files of the book available for downloading and viewing on an e-book
reader or printing out. In line with the philosophy of library as a
social space, there will be plenty of opportunities for readers to
interact with the text and each other by making notes, reviewing books,
and participating in discussions.
In addition to these three components, there will also be an area for
supporting the project and the PlanetMath organization by sponsoring
books and purchasing hard copies.
Roadmap (2013)
--------------
- Install OCR program and process a first book.
- Conduct preliminary research on OCR techniques.
- Collect suite of samples for OCR evaluation.
- Examine effects of preprocessing strategies.
- Write utility to extract graphic images of individual characters from scans according to XML OCR output.
- Compare effects of different feature vectors, metrics, averaging techniques, and clustering algorithms.
- Determine statistical distributions of features and metrics and develop statistical models of identification.
- Study how to feed output of clustering and average back into training.
- Study distributions on lines and techniques for isolating characters and combining fragments of characters.
- Study postprocessing techniques.
- Study how to convert the \"visual\" TeX markup produced by Infty to more \"semantic\" TeX markup.
- Develop techniques for automatically extracting structure and
metadata for books.
- Research techniques for combining symbols into equations such as, say, hierarchical clustering.
- Figure how to combine the various programs and techniques into a toolchain so as to maximize correctness.
- Improvise proofreading of first few books using the existing facility for editing encyclopaedia entries.
- Compare different strategies for presenting text to be proofread and highlighting questionable identifications.
- Revise 2005 specification from Noosphere to Planetary.
- Implement proper proofreading facility.
- Implement facility for keeping track of books and editorial workflow.
- Implement facility for outputting completed books.
- Test and document facilities for prooofreading books.
- Collect and write converters to produce versions of books in various file formats.
- Present the first few books using collections facility.
- Enter in math books from Project Gutenberg.
- Incorporate books into indexing and search.
- Study and compare algorithms for recommending books.
- Implement reading room.
- Test and document the reading room.
How to help
-----------
If you\'re a philanthropist, your donations will help move the research
and development process along:
- \$1000 will purchase an InftyOCR license.
- \$2000 will purchase a high-end computer for OCR and related processing.
- \$5000 will pay for an OCR research assistant.
- \$10000 will pay to implement the books section on PM
If you\'re a Drupal dude, you can help implement the proofreading
facilities and reading room.
If you\'re a script kiddie, you can help us build our toolchain.
If you\'re into statistics, you can help us with identifying characters
by clustering.
If you\'re an proofreader, you can help us prepare the first few texts.
Acknowledgements
----------------
*Thank people who have helped with the initial steps in the roadmap.*
# Roadmap (2020)
This is a repository that synthesises a [Roadmap](http://www.peeragogy.org/pattern-roadmap.html) for Hyperreal Enterprises, Ltd.
Method
The Roadmap is being written inside [Org Roam](https://github.com/org-roam/org-roam) (an
[Emacs](https://www.gnu.org/software/emacs/) package), and shared via Git on repo.or.cz.
1. Setup
Install Org Roam if needed (`M-x package-install RET org-roam RET`).
Clone the repo, using these instructions to switch to the mob branch
(which avoids the need for further permissioning).
<https://repo.or.cz/w/arxana.git/blob_plain/d187e244a8eb7d2208f1fe98db1ef895c6cd4a36:/README.mob>
Subsequently, add this to your Emacs configuration:
``` {.elisp}
(require 'org-roam)
(setq org-roam-directory (concat "/home/"
(getenv "USER")
"/arxana/org-roam/"))
(setq org-roam-completion-system 'helm)
(define-key org-roam-mode-map (kbd "C-c n l") #'org-roam)
(define-key org-roam-mode-map (kbd "C-c n f") #'org-roam-find-file)
(define-key org-roam-mode-map (kbd "C-c n b") #'org-roam-switch-to-buffer)
(define-key org-roam-mode-map (kbd "C-c n g") #'org-roam-graph)
(define-key org-mode-map (kbd "C-c n i") #'org-roam-insert)
(org-roam-mode +1)
```
2. Interaction
Use the `C-c n f` keyboard command to add new disconnected nodes to
the graph, or use `C-c n i` to create a page and insert a wiki-style
link, like `[[New Page]]`. Follow links with `C-c C-o`. Display the
graph structure with `C-c n g`.
Add and commit new files, along with `org-roam.db`, and push them to
the repo.
3. Log
You can review commits to the mob branch here:
<https://repo.or.cz/arxana.git/shortlog/refs/heads/mob>
Subgoals:
- [Top](20200810132653-top.org)
Top
- We want to make the knowledge economy accessible to everyone.
- Web 2.0 hasn't achieved that, though it has produced a large pool of
open data.
- We will use this data to bootstrap AI tools that support knowledge
workers.
- Our first product will be an AI tutor that helps people learn how to
program and connects them with practical projects.
- The next step will be an AI assistant for professional level teams.
- Our long-term vision is computational intelligence based on
collective intelligence.
## Representative Prior Work
1. PlanetMath
In our work on PlanetMath we came up with several previews. Some of
these may be concretely relevant.
- <https://github.com/holtzermann17/planetmath-docs/labels/PREVIEW>
- We wondered if there could be a specific opportunity with the
category theory community
2. Modelling the way mathematics is actually done
We looked at a relationship between storytelling
Even short sprints correlated as opposed to random behaviour
- \"Let\'s just say we will develop this course, because it\'s very
concrete.\"
- Could we agree that we want to develop some curriculum?
- Or is there a way to build up for this?
Customer
- This would come from expanding the group.
- Currently: read a lot, don\'t implement things enough.
- Z: If I implemented as I read things, it would be a pretty
interesting blog
- There would be a huge market of people interested in following
this, this would give a pool of people who know who we are
- This is a nice goal b/c it doesn\'t focus on the product... but
it\'s a deliverable, with smaller deliverables, and a benefit
Us
What if we say: Let\'s just for example, have a blog, and say, roughly
speaking we will try to develop the curriculum through the blog. Here\'s
our starting point in terms of resources. People might interject in ways
that aren\'t exactly a curriculum. But we would later see things can be
ordered.
- Rouseau: amour-propre / amor du sois
- Record discussions, post on a YouTube channel
- Try to record conversations, but don\'t constrain things to come out
in a purely structured curriculum.
\"In my next post I want to integrate something that I learned from you
about PL. I want to drive in the direction of synthesis, as hard as I
know how to right now. This depends on everyone having free time to
invest in this. Start a blog where we think about what\'s the overlap in
terms of learning?\"
Subgoals:
- [Research](20200811100157-research.org)
- [Small demos](20200810135103-small_demos.org)
- [Organisational
infrastructure](20200810135126-organisational_infrastructure.org)
- [Business development](20200814210243-business_development.org)
## Small demos
We talked about building several small demos that we could get in front
of users (to be specified).
1. In startups, there's a low burden of proof
- Just having a demo of some kind will get you a reasonable amount
of attention!
2. Subgoals
- [Visual Interfaces](20200810135457-visual_interfaces.org)
- [Arxana 2020](20200810135512-arxana_2020.org)
- [Visual code walk through](20200810135753-visual_code_walk_through.org)
- [Teach basic coding with IF](20200810135851-teach_basic_coding_with_if.org)
- [GPT trained on SO data](20200811185614-gpt_trained_on_so_data.org)
- [Generating small graphs](20200814195815-generating_small_graphs.org)
Build infra for generating graphs.
``` {.elisp}
(defun triangle (n)
(if (equal n 0) 0
(+ n (triangle (- n 1)))))
```
3. Related
- Possibly link this to [Visual code walkthrough](20200810135753-visual_code_walk_through.org) so that we generate graphs based on code flow.
## Continuous integration demo
### Introduction
We had at one time the idea of using PlanetMath as a continuous integration server to show off new technologies. In principle we could get back to that. However, there may be more appeal for working mathematicians in other kinds of demos. So PlanetMath is only one option here. We sketch a few other possible continuous integration demos at various scales.
### "Small" (Mathematics)
In the past, we put the HoTT book online via PlanetMath and made a survey of the category theory literature. We could start by pulling up what we did back then and gradually build up to a "HDM for category theory".
- Find the old HoTT files on the PM git repo.
- Get the files to compile and have correct internal references.
- Put this version of the book online and announce to the HoTT community. (Who showed interest in this.)
- Add links to nlab, etc. using NNexus or something similar and double-check by hand.
- Using sTeX for HoTT or some such, get the LaTeXML to produce corrrect parses.
- **Integrate with formalized version.**
We could potentially exhibit other small demos at the same level, like David Spivak's book.
### "Small" (Computer Mathematics)
There are corpora like Lean Mathlib. Could we in principle add any value there?
- We should see what ever happened with Formal Abstracts.
- "Abstract Wikipedia" is also worth following up on to see whether anything is happening there.
- And it's a good question to see whether our methods could add any value to this kind of work...
- Certainly we should have a look at the FARM paper and related projects where we talked about bridging between informal mathematics, argumentation, and formal mathematics.
- Computer algebra is another way to go here!
- Possible consumers of this exist in Ray's QFT class and what Cameron wanted to make in the modelling course
### "Medium"
Repeat similar to "Small" demo for all of category theory.
- Get the survey of the category theory literature into a reference manager.
- Download the last 5 years af arxiv CT papers.
- Get the MR listing of category papers.
- Download category books for our use.
- Pull out the indexes form the books and put them into some standard format.
- Run programs on the text of the book in order to see what information they could extract.
- Using some combination of humans and machines, come up with a cross index of category theory.
### "Large"
We selected tasks in category theory because it's about 1/100th of the size of the full mathematics corpus. It is a new enough topic that we don't have to worry about retroditization. Category theory is a foundational subject, and so it is relatively self-contained. Unlike, say, differential geometry, where we need to build up concepts like topology and real numbers to talk about manifolds, in category theory all you need is logic.
Eventually after we have some sufficiently well-put-together small and medium demos, it would be good to try to do similar things with broader coverage.
## Visual Interfaces for cli programs
Here\'s an idea:
assuming we have enough text mining pixie dust (on corpora of linux man
pages, and stack overflow questions/forum posts about linux commands),
it might be possible to do:
`user:~$ make-gui-for ls --output ls.py`
## Teach basic coding with IF
Ray is working on a rule-based system for teaching programming, starting
with rules to model subtraction.
- Uses production rules
## Arxana 2020
Revisit Arxana and turn it into something that we can actually use.
## Visual Interfaces for cli programs
Here\'s an idea:
assuming we have enough text mining pixie dust (on corpora of linux man
pages, and stack overflow questions/forum posts about linux commands),
it might be possible to do:
`user:~$ make-gui-for ls --output ls.py`
## GPT trained on SO data
Could we set up a simple version of **GPT** trained on Stack Overflow
data, just to get it working?
*Then think about how to get a learning loop set up to improve the
results...*
1. Ideas
- Could this at least help a human navigate the questions on Stack
Exchange?
- Rather than just answering the question, generate the answer and
use that to guide search (by combining generation with document
similarity)
- Use a distance to set up a margin of tolerance
2. Precedent
In Google Books, they use crappy OCR which is good enough for
search, but you wouldn\'t want to search it. They use something like
rewrite distance, finding something \'within 5 errors\'.
3. Analogue
In parsing, it\'s not just edit distance but has to involve the
grammar
4. Case against going too deep here:
- Code generation is hard
5. Case against worrying about that:
- Worry instead about generating learning packets
- E.g., learn everything there is to know about `git` from
Stack Overflow in a nicely organised way
6. Related
- [Display SO with similarity
graph](20200814202658-display_so_with_similarity_graph.org)
E.g., use generated answers to help process.
7. Downstream
- [RECOMMENDER SYSTEM](20200817172825-recommender_system.org)
## Research
We'll want to do some research.
- [Original Research](20200811100221-original_research.org)
- [Review articles](20200810135038-review_articles.org)
1. Original Research
We\'ll want to do some original research.
- [Replicating \"Scientific Statement Classification over arXiv\"](20200811100337-replicating_scientific_statement_classification_over_arxiv.org) **This has been largely explored and we're moving on to new directions.**
- **The other thing to do here is look through the Innovate UK grant and make sure to turn it into some stubs/tickets in this section! With the goal of writing a narrative blog post for Wednesday 2nd of August.**
2. Replicating \"Scientific Statement Classification over arXiv\"
Deyan and Bruce wrote a paper about classifying statements on ArXiv.
Joe has a paper about building AI for mathematics by analysing Q&A
and then building Q&A agents. Can we make progress towards that
long-term goal by initially pulling out some structured information
from Stack Exchange?
- [Zulip thread](https://zulip.metameso.org/#narrow/stream/3-Demo/topic/Replicating.20.22Scientific.20Statement.20Classification.20over.20arXiv.22/near/283)
1. Downstream
- [RECOMMENDER SYSTEM](20200817172825-recommender_system.org)
3. Review articles
We outlined a couple review articles to find and study, or to
write:
- (a) "Advances in tutoring systems for programming"
- (b) "Advances in knowledge mining from technical documents"
4. Subgoals:
- [Advances in tutoring systems for programming](20200810135325-advances_in_tutoring_systems_for_programming.org)
- [Advances in knowledge mining from technical documents](20200810135403-advances_in_knowledge_mining_from_technical_documents.org)
## Organisational infrastructure
We talked about the need to figure out the organisational infrastructure
itself: things like technologies for communication.
1. Possible way of organising things
- Maybe do this in small groups of people (e.g., 2 people working
for 2 weeks on something as a \"minimum unit\").
- Hand-coding of curriculum vs *making a general framework that
anyone can fill in*
- This would allow teachers to create curriculum items
2. Subgoals
- [OBS recordings](20200811185435-obs_recordings.org)
- [Discord server](20200810135619-discord_server.org)
- [Code sharing platform](20200814193042-code_sharing_platform.org)
- [Blog](20200814195259-blog.org)
- Would it be worthwhile to have a system for live co-editing Org Mode? Or any other real-time wiki?
- Should we investigate real-time editing in VS Code, maybe enhanced with Org Mode?
- Or export from Roam / [Foam](https://foambubble.github.io/foam/), if these work, and export them to Org Mode?
- Would it work well to use [CodiMD](https://github.com/hackmdio/codimd), quickly modified to use with Org files?
- We could in principle fix Rudel or Togetherly.el, but this is probably not a good use of time right now.
To what extent are any of these things actually relevant?
3. Discord server
Cameron proposed setting up a Discord server. This has been set up now, but only DG and CS are on it so far!
4. OBS recordings
We talked about creating asyncronous recordings (screencasts,
audio). We also talked about possibly putting the audio recordings
into a threaded voice mail forum but that\'s a somewhat different
application.
Public version of our experiments.
5. Related:
- [Code sharing platform](20200814193042-code_sharing_platform.org)
Cameron shared ideas about a code-sharing platform.
6. Comments
- Nextjournal is interesting.
- It\'s like a Jupyter notebook
- It\'s like Org Bable so you can run code in any language within
the same environment
- If I need to add a bash cell to a Julia notebook, it adds a
kernel as needed at the run time
- If I install a bunch of libraries, you can save the current
environment in a docker container, to import it
- It doesn\'t yet have an easy way to make an app?
7. What if you had a browser based version of Org Bable
- You could have your notebook, backed by the ability to use Emacs
8. Examples
- Setting up a data science experiment
- Wadler et al. course in Agda in NextJournal
- But you can\'t easily treat this as \'Org Roam\' (no
bi-directional things)
9. Consumer
- [DATA COURSE](20200814203551-data_course.org)
Business ideas:
- Develop a user interface on top of more advanced data analysis tools
- The focus is on the infrastructure that allows you to convert a graph into a neural network or whatever
10. Different kinds of users
- Advanced STAN users
- People who don\'t know how to do data analysis but who can make graphs
- Zans: General theory-informed algorithms (e.g., apply category theory to scientific models).
- K framework: Have transformations for any language you define in it.
- HtDP is similar applied to programming teaching. Start with PL theory and then find universal things.
- How can we define statistics in a general way and then derive things from it? (E.g., Anglican probabilistic programming?)
11. Related work
James Fairbanks (relate it to Betancourt)
## Business development
Look over all of the above from the point of view of business development, and prioritise them according to which ones are likely to pay out. If none of them are, then we need to rescope our approach.
Let's keep in mind that some forms of "traction" may not immediately result in money, and that can be OK!
- If we could help 10,000 people with their research, we should be quite willing to do that!
- If "18 months" is ambitious in scope for the Innovate UK grant with 3+ full time staff, what can we do in 24 months with 5 people doing .1 or .2 FTEs?
- If we have 18 monthly cycles where we can possibly get something in front of people, what might those look like?
- And we can ask, will some tasks like the GPT-2 analysis Tim was working on take 1 month? 3 months?
### Analysis
- Can this lead to £16,000 per annum wages? Would anyone actually want to work for that much?