# Using LaTeXML to convert your latex files to accessible html written by [Andy Tonks](https://www2.le.ac.uk/departments/mathematics/extranet/staff-material/staff-profiles/andy-tonks) and [Julia Goedecke](https://www2.le.ac.uk/departments/mathematics/extranet/staff-material/staff-profiles/julia-goedecke) (who are luckily no longer working at the University of Leicester) #### :bookmark_tabs: Table of Contents [TOC] ## Overview :::info :information_source: This page shows how to convert existing latex files to accessible html files. If you're writing new content, you may find it a lot easier to use a markdown language instead. ::: Before we start, you may want to look at this [example of the output you can obtain](https://blackboard.le.ac.uk/bbcswebdav/courses/MA1114_2020-21_Y/LALatexmlNotes/index.html) from your tex files using latexml. As you can see, it is only *nearly* perfect --- polish is still needed. Also, Matthew Towers gives a good overview in his [blogpost on his use of latexml](https://www.homepages.ucl.ac.uk/~ucahmto/elearning/latex/2019/06/14/latexml.html). If the $\LaTeX$ commands that you use in your notes are quite straightforward, then one simple command > `latexmlc notes.tex --dest=notes.html` could be enough to compile them first to XML and then to HTML. The aim, however, is to produce an **accessible** form of HTML, for screen-readers for the partially sighted for example. For this we should add a javascript option to produce HTML5/MathML: > `latexmlc notes.tex --dest=notes.html --javascript="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js?config=MML_HTMLorMML"` If this does not work, or you want more details on installing and using the software, read on. ## Installing LaTeXML :warning: The current version of LaTeXML is 0.8.4, but as version 0.8.5 will be released ~~this month~~ **soon**, this document may need to be updated. LaTeXML is written in Perl, and may use ImageMagick to convert images to different formats. It is recommended that you install: - Tex (via TeXlive, MacTeX, MikTeX, etc). You probably already have it installed. - ImageMagick via [imagemagick.org](https://imagemagick.org). - Perl (via [Strawberry](http://strawberryperl.com/) on Windows, via MacPorts or Homebrew on Apple, or via your package manager on Linux). - Some additional Perl packages, such as PerlMagick, that LaTeXML relies on (via the CPAN distribution network. CPAN is to Perl what CTAN is to TeX and CRAN is to R). - LaTeXML itself. Some installation instructions are available from the LaTeXML homepage at [dlmf.nist.gov/LaTeXML](https://dlmf.nist.gov/LaTeXML/get.html), or see below. ### Windows 10 There are at least three distributions of Perl available for windows: from [Strawberry](http://strawberryperl.com/), [ActiveState](https://www.activestate.com/) and [Chocolatey](). There is only one [source for ImageMagick](https://chocolatey.org/), and the current version explicitly indicates it is compatible with version 5.20 of Strawberry Perl. One possible installation process is therefore the following: 1. Download and then install * [StrawberryPerlv5.20]( http://strawberryperl.com/download/5.20.3.3/strawberry-perl-5.20.3.3-64bit.msi) [MSI installer, 64 bit] If you *do not have Administrator rights* on your PC (a university managed machine, for example) you may need to install from the [zip file](http://strawberryperl.com/download/5.20.3.3/strawberry-perl-5.20.3.3-64bit.zip) or the [portable edition](http://strawberryperl.com/download/5.20.3.3/strawberry-perl-5.20.3.3-64bit-portable.zip) which can even be run from USB. 2. Download and then install the current version of Image Magick: * [ImageMagick-7.0.10](ftp://ftp.imagemagick.org/pub/ImageMagick/binaries/ImageMagick-7.0.10-28-Q16-HDRI-x64-dll.exe) BUT make sure you tick the box to install PerlMagick too: ![](https://i.imgur.com/Z448Nkd.png =375x300) Again, there is a [portable zip edition](ftp://ftp.imagemagick.org/pub/ImageMagick/binaries/ImageMagick-7.0.10-28-portable-Q16-x64.zip) if you do not have admin rights on your PC. :::info :information_source: If you forget to tick that box you can re-run the installer, of course. *For future reference: if a newer version of ImageMagick is released, **check** the version of Perl it refers to*. ::: 3. Finally open the `CPAN Client` shell from the new Strawberry Perl folder in Windows Start, and use it to **`install LaTeXML`**: ![](https://i.imgur.com/JgBGGqx.png) ...etc --- this may take several minutes. ::: success An easy test that the install was successful: let us compile [an extract from Alice](http://www.cs.cornell.edu/Info/Misc/LaTeX-Tutorial/Solutions/running-12pt.html) ![](https://i.imgur.com/sfCbVa2.png) Immediately this produces an html file which you can open in your browser: ![](https://i.imgur.com/RJNHEP6.png) ::: See below for a [more realistic workflow](#Using-LaTeXML)! ### MacOS Installation of LaTeXML on MacOS should be done via Homebrew or Macports. Both of these environments rely on XCode (or just the XCode Command Line Tools) which you can get from https://developer.apple.com/xcode/resources/. Then choose one of the following routes: * either install [MacPorts](https://www.macports.org/install.php#installing), and then: * if you have MacTeX installed, then `sudo port install LaTeXML +mactex` * if you have TeXlive installed (or if you don't know) then `sudo port install LaTeXML` * or install [homebrew](https://brew.sh/) and then just `brew install latexml`. If you are unlucky (possibly just Catalina 10.15.4) and the installation runs into problems with a warning about `XML-LibXSLT` or `libxml` then follow [these instructions to get LaTeXML](https://github.com/brucemiller/LaTeXML/issues/929#issuecomment-621666222) and its dependencies. :warning: Andy says: *I have not actually done the install on a Mac, so let me know if there are changes to make to this document. Homebrew also runs on linux, but casks are not supported. You could try the cpanm method below if all else fails!* ### Linux Depending on your distribution, your package manager `XYZ` (where `XYZ` is `yum` or `apt` or ...) can probably install latexml and its dependencies in one command: - `sudo XYZ install latexml`. Check that your Linux distribution's package is current, though (version $\geq0.8.4$). If you prefer to install the most up-to-date github pre-release, first check you have installed `tex`, `perl` and `imagemagick` in your package manager, then * `git clone https://github.com/brucemiller/LaTeXML.git` and `cd LaTeXML`, (or download and unzip [LaTeXML-master](https://github.com/brucemiller/LaTeXML/archive/master.zip) and `cd LaTeXML-master`) then the standard Perl make (configure build, compile, test, install) procedure: ``` perl Makefile.PL make make test sudo make install ``` The `test` phase may take a long time, and it is probably safe to skip it (famous last words!) If you are using a machine which already has TeX and Perl installed but you do not have superuser rights to install other software, try: ``` # Download and install cpanminus curl -L http://cpanmin.us | perl - App::cpanminus # Setup a user directory in ~/perl5 to contain all perl dependencies ~/perl5/bin/cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib) # Install the current snapshot of LaTeXML directly from github: cpanm git://github.com/brucemiller/LaTeXML.git ``` I have successfully built and used LaTeXML on nyx/nyx2 using this installation technique. It failed one of the build tests on SPECTRE2 as our supercomputer still has the 2013 version of TeXLive installed, but did install with `cpanm --force git://github.com/brucemiller/LaTeXML.git`. --- ## Using LaTeXML In the following, we're assuming you are converting lecture notes. Of course, you can easily adapt this to convert any latex files you like. ### Set up your files If you have chapters included from separate tex files with `\include` or `\input`, then you can easily have two "main" files, say, `notes.tex` and `notes-latexml.tex`: - one as you had before which you can use to compile to PDF as usual; - one which you change a little so you can use latexml to compile to html. This way, if you change the actual content of the chapters, you only have to do it once, and can compile it to the two different formats. ::: info :information_source: Why keep both `notes.tex` and `notes-latexml.tex`? For many people, the PDF might actually be a lot easier. HTML is easier on phones (as the text reflows better), and for screen readers (as they understand mathML/mathJax). If we provide both formats, people can choose what they prefer/need in a given situation.  ::: An alternative to two "main" files is to outsource the preamble of your tex file. This can be useful e.g. if you have a shorter document with no chapters. Then you can let the compiler pick the appropriate preamble, e.g. with ``` \iflatexml \input commandsLatexml.tex \else \input commands.tex \fi ``` See [use of `\iflatexml`](#Slightly-more-advanced-iflatexml) below. ### Prepare your latexml main file. The vast majority of $\LaTeX$ and even plain $\TeX$ is understood by $\LaTeX ML$ but there are still a few things it struggles with. As discussed above, it is worth taking a copy of your main tex file --- and rename it to `notes-latexml.tex`, say --- and massaging the preamble a little. For example: * If you use the `mathabx` package, comment it out. * Remove any custom theorem styles: change to the standard `\theoremstyle{plain}` and `\theoremstyle{definition}`. * If you use `\includegraphics` to include PDF images, it may be necessary to convert these to GIF, JPG or PNG images.  * $\LaTeX ML$ does not understand `\xymatrix`, and sometimes has problems with `tikz` pictures. Many of us use these a lot, but there are work-arounds --- see below.  * Some other 'wacky' packages may not work. For example, Julia uses `mdframed`, which does not work. But you can replace it in `notes-latexml.tex` by a dummy environment definition: `\newenvironment{mdframed}[1][]{}{}` Now latexml will effectively ignore `\mdframed` while your original "main" tex file can still use it in compilation to PDF. You don't need to touch the latex of the actual chapters! If you do want some [coloured boxes](#Coloured-boxes) round things, see below. * Some [more advanced options](#Slightly-more-advanced-iflatexml) below. :warning: We're only listing things here that we have already found out about. You may use other things which $\LaTeX ML$ doesn't like. Hopefully our suggestions on what we've found will give you an idea of how to deal with them. If you think it's something many people might use, do email us and we can add it here. ::: info :information_source: Your `notes-latexml.tex` file does not have to compile without errors to pdf! It just has to compile to xml and then html, see below. ::: ### Get necessary css and javascript files For an excellent navigation menu to jump to chapters and sections: * Download [`andy-navbar.css`][navbarcss] and [`andy-navbar.js`][navbarjs]. For a cleaner layout and good fonts at a nice size (Sans Serif, recommended by Leicester Learning Institute): * Download [`normalize.css`][normalize] :warning: updated link! With all these files: * Put them in the folder with your latex files. Here are the links again (right click and `Save link as...`): | File | Link | |-----------------|:-----------------------| | andy-navbar.js | [navbar js][navbarjs] | | andy-navbar.css | [navbar css][navbarcss] | | normalize.css | [mormalize css][normalize]| [navbarjs]: https://blackboard.le.ac.uk/bbcswebdav/courses/MA1114_2020-21_Y/LALatexmlNotes/andy-navbar.js [navbarcss]: https://blackboard.le.ac.uk/bbcswebdav/courses/MA1114_2020-21_Y/LALatexmlNotes/andy-navbar.css [normalize]: https://blackboard.le.ac.uk/bbcswebdav/courses/MA1114_2020-21_Y/LALatexmlNotes/normalize.css ### Compile latex to xml * Open a terminal or command line in the folder which has your notes. * Run `latexml notes-latexml.tex --dest=notes-latexml.xml`. (Obviously, change it to your actual file names.) * This may take a while. * If you get some errors, try the post-processing anyway, and see what comes out. It might be fine. ### Compile to html: post-processing * In the terminal/command line, run `latexmlpost notes-latexml.xml --dest=YourSubfolderName/notes.html --split --splitat=section --navigationtoc=context --javascript="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js?config=MML_HTMLorMML" --css=andy-navbar.css --css=normalize.css --urlstyle=file --timestamp=0 --javascript=andy-navbar.js ` * This will save all the html files and necessary images, java script files, css files etc into YourSubfolderName inside the folder which contains your latex files. :::info :information_source: Having a subfolder is useful, because then you can zip up that whole folder to upload it to Blackboard. (See our document [HowTo upload and host HTML files on Blackboard](https://hackmd.io/@UoL-IWG/BbHTML).) ::: Explanation of the different elements of this command (for more info, see [LaTeXML manual](https://dlmf.nist.gov/LaTeXML/manual/)): * `--dest=` sets the name of the "front" html page, i.e. the title page of your notes. You will then link to this on Blackboard (or whereever you want to use it). This could be`index.html`, or you could use `notes.html`, or `LAnotes.html` (for Linear Algebra say), or `index-split-by-section.html`, or what you prefer. Giving a subfolder is helpful as described above. You could use `NotesLatexml/` or some other name you prefer. * `--split` splits the html pages into several html pages rather than putting it all in one. Probably useful for easier reading. But you might also want to make one that is "all in one file", so it is automatically searchable. * `--splitat=section` tells it where to split. This could be chapter, section (the default), subsection or subsubsection. * `--navigationtoc=context` makes a navigation menu. On its own it's at the top and not so great, but with Andy's css and js files, it's excellent. * `--javascript='https...'` (the one which includes mathjax) is necessary so that the maths is actually made into mathjax which screen readers can read. This is the whole point of the conversion to html. * `--css=andy-navbar.css` and `--javascript=andy-navbar.js` make Andy's excellent navigation menu. * `--css=normalize.css` determines the look of the page. You could use different css files here if you want. But we chose this one for the fonts: sans serif is better for dyslexic people, and it is the recommended size. Be careful that the colours stay accessible: there should be enough contrast for colour-blind people. * `--urlstyle=file` is a slightly safer alternative to the default `--urlstyle=server` which might misinterpret `index.html`. * `--timestamp=0` removes the timestamp next to the "Generated by LaTeXML" (with cat picture) at the bottom of the each html page. ## Accessibility features * All maths content is made accessible to screen readers with this method, via mathjax. * Images should have alt-text. If you use captions in a figure environment, latexml will automatically put the caption as alt-text. :warning: It won't put any maths content (in `$ $` or `\( \)`) into the alt-text. It just leaves that bit out. Same for hyperlinks. All this still shows up in the caption, but not in alt-text. * (Still investigating whether there is another way to get alt-text from the latex, without using captions.) * You will have to check colour contrasts yourself. ## Known bugs and work-arounds :::warning :construction: We found a bug: some maths fonts, like `\(\mathbb{Z}\)` or `\(\mathcal{U}\)`, stop working when inside a passage of bold or italicised text. No warning or error occurs, but it gives the wrong output (an ordinary *Z* or *U*). :warning: This includes theorem environments, where all text is automatically italic. We have [flagged this bug](https://github.com/brucemiller/LaTeXML/issues/1321#issuecomment-670096896 "Github link") with the latexml authors, but we don't know when/if it will be fixed. :+1: **Good news**: the developers noticed our bug report... and it is now fixed in the github version which will soon (!) be released as LaTeXML 0.8.5. ::: :::success **Workaround** :hammer_and_wrench: Fix the mistakes in the XML file produced by latexml, before you run latexmlpost: * If you are using `mathbb` in an (italic) theorem environment, then search and replace "blackboard upright" by just "blackboard" in the XML file; * similarly for mathcal, mathscr, mathsf, mathtt, mathbf etc inside bold, sansserif, or italic text. The general rule is: whenever the XML tag `<XMTok font="...... ......." .....>` occurs, and the font has **two words one of which is "upright", "medium" or "serif"**, then remove that word. For example: * `blackboard upright` or `blackboard medium` :arrow_right: `blackboard` * `caligraphic upright` or `caligraphic medium` :arrow_right: `caligraphic` * `bold upright` or `serif bold` :arrow_right: `bold` * `serif italic` or `medium italic` :arrow_right: `italic` If you have the unix `sed` command (tested on GNU/linux and BSD/Mac dialects) you can do this automatically: ```bash sed -E -e 's/font=\"([a-z]*) (upright|serif|medium)\"/font=\"\1\"/g' -e 's/font=\"(upright|serif|medium) ([a-z]*)\"/font=\"\2\"/g' -i'-old' notes.xml ``` If you don't know how to do it automatically, we recommend leaving it while you work on your notes, and just fixing it when you are ready to release them to your students. (Though you'll have to repeat it when you update your notes.) If any windows ~~or mac~~(thanks Katrin) users want to tell us how they can do it automatically, we can add it here. ::: :::spoiler [obsolete workaround] Workaround: (old) * If you're actually using `\emph{}` or `\textbf{}` or similar, just leave the maths parts out of it. The changed font is not applied to the maths part anyway. * This is not possible for theorem-environments. ~~Two~~ Three options: * Put `\textup{ \(\mathbb{Z}\)}` or equivalent. Note the space at the front! That's crucial (for reasons we don't understand). This can be quite a lot of work. :information_source: This will not affect your pdf version, as the italic font is not applied to the maths part anyway. * We also had some success redefining the `mathbb` command as follows ``` \let\oldbb\mathbb \renewcommand{\mathbb}[1]{\!\!\!\!\mbox{ \upshape{ $\oldbb{#1}$}}} ``` A similar hack should work for `mathcal`. * Alternatively, just avoid situations where the bug occurs: change your theorem environment style to `\theoremstyle{definition}`, so the text is upright. Students won't know that italicised theorems are normal :slightly_smiling_face:. If you are worried about delineating the end of the theorem statement, see some options with [coloured background](#Coloured-boxes) below. ::: ## Slightly more advanced: `\iflatexml` If you have some things in your chapters (rather than the main latex file) which you want to be different in the pdf version and the html version (e.g. `\xymatrix`), you can do the following: * Put `\usepackage{latexml}` in the preamble of both your main files (`notes.tex` and `notes-latexml.tex`). * In the chapters, put `\iflatexml (commands only for latexml) \else (commands only for pdf version) \fi` Obviously the same works for any tex file you want to compile both to pdf and to html. ### Examples #### Images of different sizes in pdf and html You may need different scales for your images in pdf and html. So you could use ``` \iflatexml \includegraphics[scale = 0.6,keepaspectratio=true]{myimage.jpg} \else \includegraphics[scale = 0.9,keepaspectratio=true]{myimage.jpg} \fi ``` :::warning :warning: The shorter ` \includegraphics[\iflatexml scale=0.6 \else scale=0.9 \fi,keepaspectratio=true]{myimage.jpg}` doesn't seem to work: when you compile to pdf, latex doesn't like it and gives an error. ::: #### Work-around for `\xymatrix` (or other diagrams that might not work) * In your pdf, take a screen-shot of the diagram. Crop it close to the actual diagram. * Include the image, using `\iflatexml`, so that you still have the normal code for the pdf file. :warning: You do have to re-do the screen-shot and update the image when you change the latex code. So maybe do this as late as possible. :::spoiler Example 1: `\xymatrix` commutative diagram :::success ``` \iflatexml \begin{figure} \begin{center} \includegraphics[scale=0.3,keepaspectratio=true]{DiagramCompositionfg.jpg} \caption{Diagram of function f composed with function g, displayed as arrows in a triangle.} \label{fig-composition-f-g} \end{center} \end{figure} \else \[ \xymatrix{X \ar[r]^f \ar[dr]_{g\comp f} & Y \ar[d]^g\\ & Z} \] \fi ``` for ![Diagram of function f composed with function g, displayed as arrows in a triangle.](https://i.imgur.com/FypG0Gm.jpg "Function composition" =103x100), including captions to get alt-text. ::: :::spoiler Example 2: schematic for matrix multiplication :::success ``` \iflatexml \begin{figure} \begin{center} \includegraphics[scale=0.7,keepaspectratio=true]{DiagramMatrixXVector.jpg} \caption{Schematic of matrix times vector, using lines to represent rows of the matrix and the column vector.} \label{fig-matrix-vector} \end{center} \end{figure} \else \[ \begin{pmatrix} \raisebox{.7ex}{\rule{1.5cm}{0.5pt}}\\ \raisebox{.7ex}{\rule{1.5cm}{0.5pt}}\\ \raisebox{.7ex}{\rule{1.5cm}{0.5pt}} \end{pmatrix}\begin{pmatrix}\ \vline\ \\\ \vline\ \\\ \vline\ \end{pmatrix} = \begin{pmatrix} \raisebox{1.4ex}{\rule{.25cm}{0.5pt}} | \\ \raisebox{.7ex}{\rule{.25cm}{0.5pt}} |\\ \rule{.25cm}{0.5pt}| \end{pmatrix} \] \fi ``` for ![Schematic of matrix times vector, using lines to represent rows of the matrix and the column vector.](https://i.imgur.com/Dl8LbI7.jpg "matrix mult" =243x98), including captions to get alt-text. ::: #### Placement with `\hfill` As a webpage has no intrinsic width, `\hfill` will not work, e.g. to place comments on the right-hand-side of the page. A table can give the same effect in html: ``` \iflatexml \begin{tabular}{lr} \(\diamond\) \(0\in S\) & (zero vector is in the set)\\ \(\diamond\) for any \(u,v\in S\), \(u+v\in S\) & (closed under vector addition)\\ \(\diamond\) for any \(v\in S\) and any \(\lambda \in \R\), \(\lambda v\in S\) & (closed under scalar mult) \end{tabular} \else \begin{itemize} \item \(0\in S\) \hfill (zero vector is in the set) \item for any \(u,v\in S\), \(u+v\in S\) \hfill (closed under vector addition) \item for any \(v\in S\) and any \(\lambda \in \R\), \(\lambda v\in S\) \hfill (closed under scalar mult) \end{itemize} \fi ``` gives ![three lines with some text left aligned and other text right aligned in each line](https://i.imgur.com/e489bpr.png) #### Placement with `Minipage` Similar to `\hfill`, `\begin{minipage}` will have no effect in the html file. The content will just be one underneath the other. If you want it differently, you'll have to think creatively. You can use `\iflatexml` to keep your original placement in the pdf. ## Coloured boxes $\LaTeX ML$ doesn't (yet) understand `mdframed` or other common ways to make coloured boxes or frames round text. It does however understand `\begin{shaded}` from the `framed` package. :::info :information_source: The boxes will not be as nice as these ones on this page. We are using markdown to make this page. ::: This is [an example using these boxes](https://blackboard.le.ac.uk/bbcswebdav/users/jg418/ProofsLatexml/index.html). Here are some suggestions that will allow you to keep the latex of the content of your chapters the same, and lets latexml use `shaded` and the usual pdflatex use `mdframed` (or what you have chosen). :::success If you have the same colour background everywhere, you can use ``` \definecolor{shadecolor}{rgb}{1.0 1.0 0} \newenvironment{mdframed}[1][]{\begin{shaded*}}{\end{shaded*}} ``` in the preamble of your `notes-latexml.tex`. (You should have commented out `\usepackage{mdframed}` in this version of your preamble.) ::: :::success If you use, for example, ``` \newmdtheoremenv[style=highlight]{definition}{Definition} ``` in `notes.tex` for compiling to pdf, then you can put the following in the preamble of your `notes-latexml.tex`: ``` \newtheorem{defi}{Definition} \newenvironment{definition} {\definecolor{shadecolor}{rgb}{1.0 1.0 0} \begin{shaded*}\begin{defi}} {\end{defi}\end{shaded*}} ``` Then you can use `\begin{definition} ... \end{definition}` as normal throughout the content of your document. When you compile `notes.tex` to pdf it will still use `mdframed` around the definition environment, and when you compile `notes-latexml.tex` to html, it will use `shaded`. ::: :::info :information_source: You can see that you can do different colours for different environments. I found also the HTML colour codes useful, for example `\definecolor{shadecolor}{HTML}{FFCCCC}`. ::: ## Explore LaTeXML yourself You can read the [$\LaTeX ML$ manual](https://dlmf.nist.gov/LaTeXML/manual/) with lots more details and info. ## Extra: Templates for embedded lecture video pages If you use some pre-recorded videos for your lectures, you can see an [example of html pages with embedded videos](https://blackboard.le.ac.uk/bbcswebdav/courses/MA1114_2020-21_Y/SimpleVideoHTMLs/index.html), notes and related exercise questions. If you like it: download the [template files (as zip)](https://blackboard.le.ac.uk/bbcswebdav/courses/MAX042/SimpleVideoHTMLsTemplate.zip). They have comments in the html on where to change what. I also made one [structured by chapter](https://blackboard.le.ac.uk/bbcswebdav/courses/MA1114_2020-21_Y/StructuredVideoHTMLs/index.html), using my automatically generated latexml lecture notes as a base, but it's harder to translate to a different course.