Try   HackMD

Using LaTeXML to convert your latex files to accessible html

written by Andy Tonks and Julia Goedecke (who are luckily no longer working at the University of Leicester)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Table of Contents

Overview

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
This page shows how to convert existing latex files to accessible html files. If you're writing new content, you may find it a lot easier to use a markdown language instead.

Before we start, you may want to look at this example of the output you can obtain from your tex files using latexml. As you can see, it is only nearly perfect - polish is still needed.

Also, Matthew Towers gives a good overview in his blogpost on his use of latexml.

If the

LATEX commands that you use in your notes are quite straightforward, then one simple command

latexmlc notes.tex --dest=notes.html

could be enough to compile them first to XML and then to HTML. The aim, however, is to produce an accessible form of HTML, for screen-readers for the partially sighted for example. For this we should add a javascript option to produce HTML5/MathML:

latexmlc notes.tex --dest=notes.html --javascript="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js?config=MML_HTMLorMML"

If this does not work, or you want more details on installing and using the software, read on.

Installing LaTeXML

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
The current version of LaTeXML is 0.8.4, but as version 0.8.5 will be released this month soon, this document may need to be updated.

LaTeXML is written in Perl, and may use ImageMagick to convert images to different formats. It is recommended that you install:

  • Tex (via TeXlive, MacTeX, MikTeX, etc). You probably already have it installed.
  • ImageMagick via imagemagick.org.
  • Perl (via Strawberry on Windows, via MacPorts or Homebrew on Apple, or via your package manager on Linux).
  • Some additional Perl packages, such as PerlMagick, that LaTeXML relies on (via the CPAN distribution network. CPAN is to Perl what CTAN is to TeX and CRAN is to R).
  • LaTeXML itself.

Some installation instructions are available from the LaTeXML homepage at dlmf.nist.gov/LaTeXML, or see below.

Windows 10

There are at least three distributions of Perl available for windows: from Strawberry, ActiveState and Chocolatey. There is only one source for ImageMagick, and the current version explicitly indicates it is compatible with version 5.20 of Strawberry Perl. One possible installation process is therefore the following:

  1. Download and then install
  • StrawberryPerlv5.20 [MSI installer, 64 bit]
    If you do not have Administrator rights on your PC (a university managed machine, for example) you may need to install from the zip file or the portable edition which can even be run from USB.
  1. Download and then install the current version of Image Magick:
  • ImageMagick-7.0.10
    BUT make sure you tick the box to install PerlMagick too:
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    Again, there is a portable zip edition if you do not have admin rights on your PC.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
If you forget to tick that box you can re-run the installer, of course.
For future reference: if a newer version of ImageMagick is released, check the version of Perl it refers to.

  1. Finally open the CPAN Client shell from the new Strawberry Perl folder in Windows Start, and use it to install LaTeXML:
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

    etc - this may take several minutes.

An easy test that the install was successful: let us compile an extract from Alice

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Immediately this produces an html file which you can open in your browser:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

See below for a more realistic workflow!

MacOS

Installation of LaTeXML on MacOS should be done via Homebrew or Macports. Both of these environments rely on XCode (or just the XCode Command Line Tools) which you can get from https://developer.apple.com/xcode/resources/. Then choose one of the following routes:

  • either install MacPorts, and then:
    • if you have MacTeX installed, then sudo port install LaTeXML +mactex
    • if you have TeXlive installed (or if you don't know) then sudo port install LaTeXML
  • or install homebrew and then just brew install latexml. If you are unlucky (possibly just Catalina 10.15.4) and the installation runs into problems with a warning about XML-LibXSLT or libxml then follow these instructions to get LaTeXML and its dependencies.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Andy says: I have not actually done the install on a Mac, so let me know if there are changes to make to this document. Homebrew also runs on linux, but casks are not supported. You could try the cpanm method below if all else fails!

Linux

Depending on your distribution, your package manager XYZ (where XYZ is yum or apt or ) can probably install latexml and its dependencies in one command:

  • sudo XYZ install latexml.

Check that your Linux distribution's package is current, though (version

0.8.4).

If you prefer to install the most up-to-date github pre-release, first check you have installed tex, perl and imagemagick in your package manager, then

  • git clone https://github.com/brucemiller/LaTeXML.git and cd LaTeXML,
    (or download and unzip LaTeXML-master and cd LaTeXML-master)

then the standard Perl make (configure build, compile, test, install) procedure:

perl Makefile.PL
make
make test
sudo make install

The test phase may take a long time, and it is probably safe to skip it (famous last words!)

If you are using a machine which already has TeX and Perl installed but you do not have superuser rights to install other software, try:

# Download and install cpanminus
curl -L http://cpanmin.us | perl - App::cpanminus
# Setup a user directory in ~/perl5 to contain all perl dependencies
~/perl5/bin/cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)
# Install the current snapshot of LaTeXML directly from github:
cpanm git://github.com/brucemiller/LaTeXML.git

I have successfully built and used LaTeXML on nyx/nyx2 using this installation technique. It failed one of the build tests on SPECTRE2 as our supercomputer still has the 2013 version of TeXLive installed, but did install with cpanm --force git://github.com/brucemiller/LaTeXML.git.


Using LaTeXML

In the following, we're assuming you are converting lecture notes. Of course, you can easily adapt this to convert any latex files you like.

Set up your files

If you have chapters included from separate tex files with \include or \input, then you can easily have two "main" files, say, notes.tex and notes-latexml.tex:

  • one as you had before which you can use to compile to PDF as usual;
  • one which you change a little so you can use latexml to compile to html.

This way, if you change the actual content of the chapters, you only have to do it once, and can compile it to the two different formats.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Why keep both notes.tex and notes-latexml.tex?
For many people, the PDF might actually be a lot easier. HTML is easier on phones (as the text reflows better), and for screen readers (as they understand mathML/mathJax). If we provide both formats, people can choose what they prefer/need in a given situation.

An alternative to two "main" files is to outsource the preamble of your tex file. This can be useful e.g. if you have a shorter document with no chapters. Then you can let the compiler pick the appropriate preamble, e.g. with

\iflatexml
\input commandsLatexml.tex
\else
\input commands.tex
\fi

See use of \iflatexml below.

Prepare your latexml main file.

The vast majority of

LATEX and even plain
TEX
is understood by
LATEXML
but there are still a few things it struggles with. As discussed above, it is worth taking a copy of your main tex file - and rename it to notes-latexml.tex, say - and massaging the preamble a little. For example:

  • If you use the mathabx package, comment it out.
  • Remove any custom theorem styles: change to the standard \theoremstyle{plain} and \theoremstyle{definition}.
  • If you use \includegraphics to include PDF images, it may be necessary to convert these to GIF, JPG or PNG images.
  • LATEXML
    does not understand \xymatrix, and sometimes has problems with tikz pictures. Many of us use these a lot, but there are work-arounds - see below.
  • Some other 'wacky' packages may not work. For example, Julia uses mdframed, which does not work. But you can replace it in notes-latexml.tex by a dummy environment definition:
    \newenvironment{mdframed}[1][]{}{}
    Now latexml will effectively ignore \mdframed while your original "main" tex file can still use it in compilation to PDF. You don't need to touch the latex of the actual chapters!
    If you do want some coloured boxes round things, see below.
  • Some more advanced options below.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
We're only listing things here that we have already found out about. You may use other things which
LATEXML
doesn't like. Hopefully our suggestions on what we've found will give you an idea of how to deal with them. If you think it's something many people might use, do email us and we can add it here.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Your notes-latexml.tex file does not have to compile without errors to pdf! It just has to compile to xml and then html, see below.

Get necessary css and javascript files

For an excellent navigation menu to jump to chapters and sections:

For a cleaner layout and good fonts at a nice size (Sans Serif, recommended by Leicester Learning Institute):

  • Download normalize.css
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    updated link!

With all these files:

  • Put them in the folder with your latex files.

Here are the links again (right click and Save link as...):

File Link
andy-navbar.js navbar js
andy-navbar.css navbar css
normalize.css mormalize css

Compile latex to xml

  • Open a terminal or command line in the folder which has your notes.
  • Run latexml notes-latexml.tex --dest=notes-latexml.xml. (Obviously, change it to your actual file names.)
  • This may take a while.
  • If you get some errors, try the post-processing anyway, and see what comes out. It might be fine.

Compile to html: post-processing

  • In the terminal/command line, run
    latexmlpost notes-latexml.xml --dest=YourSubfolderName/notes.html --split --splitat=section --navigationtoc=context --javascript="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js?config=MML_HTMLorMML" --css=andy-navbar.css --css=normalize.css --urlstyle=file --timestamp=0 --javascript=andy-navbar.js
  • This will save all the html files and necessary images, java script files, css files etc into YourSubfolderName inside the folder which contains your latex files.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Having a subfolder is useful, because then you can zip up that whole folder to upload it to Blackboard. (See our document HowTo upload and host HTML files on Blackboard.)

Explanation of the different elements of this command (for more info, see LaTeXML manual):

  • --dest= sets the name of the "front" html page, i.e. the title page of your notes. You will then link to this on Blackboard (or whereever you want to use it). This could beindex.html, or you could use notes.html, or LAnotes.html (for Linear Algebra say), or index-split-by-section.html, or what you prefer. Giving a subfolder is helpful as described above. You could use NotesLatexml/ or some other name you prefer.
  • --split splits the html pages into several html pages rather than putting it all in one. Probably useful for easier reading. But you might also want to make one that is "all in one file", so it is automatically searchable.
  • --splitat=section tells it where to split. This could be chapter, section (the default), subsection or subsubsection.
  • --navigationtoc=context makes a navigation menu. On its own it's at the top and not so great, but with Andy's css and js files, it's excellent.
  • --javascript='https...' (the one which includes mathjax) is necessary so that the maths is actually made into mathjax which screen readers can read. This is the whole point of the conversion to html.
  • --css=andy-navbar.css and --javascript=andy-navbar.js make Andy's excellent navigation menu.
  • --css=normalize.css determines the look of the page. You could use different css files here if you want. But we chose this one for the fonts: sans serif is better for dyslexic people, and it is the recommended size. Be careful that the colours stay accessible: there should be enough contrast for colour-blind people.
  • --urlstyle=file is a slightly safer alternative to the default --urlstyle=server which might misinterpret index.html.
  • --timestamp=0 removes the timestamp next to the "Generated by LaTeXML" (with cat picture) at the bottom of the each html page.

Accessibility features

  • All maths content is made accessible to screen readers with this method, via mathjax.
  • Images should have alt-text. If you use captions in a figure environment, latexml will automatically put the caption as alt-text.
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    It won't put any maths content (in $ $ or \( \)) into the alt-text. It just leaves that bit out. Same for hyperlinks. All this still shows up in the caption, but not in alt-text.
  • (Still investigating whether there is another way to get alt-text from the latex, without using captions.)
  • You will have to check colour contrasts yourself.

Known bugs and work-arounds

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
We found a bug: some maths fonts, like \(\mathbb{Z}\) or \(\mathcal{U}\), stop working when inside a passage of bold or italicised text. No warning or error occurs, but it gives the wrong output (an ordinary Z or U).
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
This includes theorem environments, where all text is automatically italic.
We have flagged this bug with the latexml authors, but we don't know when/if it will be fixed.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Good news: the developers noticed our bug report and it is now fixed in the github version which will soon (!) be released as LaTeXML 0.8.5.

Workaround

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Fix the mistakes in the XML file produced by latexml, before you run latexmlpost:

  • If you are using mathbb in an (italic) theorem environment,
    then search and replace "blackboard upright" by just "blackboard" in the XML file;
  • similarly for mathcal, mathscr, mathsf, mathtt, mathbf etc inside bold, sansserif, or italic text.

The general rule is: whenever the XML tag <XMTok font="...... ......." .....> occurs, and the font has two words one of which is "upright", "medium" or "serif", then remove that word. For example:

  • blackboard upright or blackboard medium
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    blackboard
  • caligraphic upright or caligraphic medium
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    caligraphic
  • bold upright or serif bold
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    bold
  • serif italic or medium italic
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    italic

If you have the unix sed command (tested on GNU/linux and BSD/Mac dialects) you can do this automatically:

sed -E -e 's/font=\"([a-z]*) (upright|serif|medium)\"/font=\"\1\"/g' -e 's/font=\"(upright|serif|medium) ([a-z]*)\"/font=\"\2\"/g' -i'-old' notes.xml

If you don't know how to do it automatically, we recommend leaving it while you work on your notes, and just fixing it when you are ready to release them to your students. (Though you'll have to repeat it when you update your notes.)
If any windows or mac(thanks Katrin) users want to tell us how they can do it automatically, we can add it here.

[obsolete workaround]

Workaround: (old)

  • If you're actually using \emph{} or \textbf{} or similar, just leave the maths parts out of it. The changed font is not applied to the maths part anyway.
  • This is not possible for theorem-environments. Two Three options:
    • Put \textup{ \(\mathbb{Z}\)} or equivalent. Note the space at the front! That's crucial (for reasons we don't understand). This can be quite a lot of work.
      Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
      This will not affect your pdf version, as the italic font is not applied to the maths part anyway.
    • We also had some success redefining the mathbb command as follows
      ​​​​​​\let\oldbb\mathbb
      ​​​​​​\renewcommand{\mathbb}[1]{\!\!\!\!\mbox{ \upshape{ $\oldbb{#1}$}}}
      
      A similar hack should work for mathcal.
    • Alternatively, just avoid situations where the bug occurs: change your theorem environment style to \theoremstyle{definition}, so the text is upright. Students won't know that italicised theorems are normal
      Image Not Showing Possible Reasons
      • The image file may be corrupted
      • The server hosting the image is unavailable
      • The image path is incorrect
      • The image format is not supported
      Learn More →
      . If you are worried about delineating the end of the theorem statement, see some options with coloured background below.

Slightly more advanced: \iflatexml

If you have some things in your chapters (rather than the main latex file) which you want to be different in the pdf version and the html version (e.g. \xymatrix), you can do the following:

  • Put \usepackage{latexml} in the preamble of both your main files (notes.tex and notes-latexml.tex).
  • In the chapters, put
    \iflatexml (commands only for latexml) \else (commands only for pdf version) \fi

Obviously the same works for any tex file you want to compile both to pdf and to html.

Examples

Images of different sizes in pdf and html

You may need different scales for your images in pdf and html. So you could use

\iflatexml
 \includegraphics[scale = 0.6,keepaspectratio=true]{myimage.jpg}
\else
 \includegraphics[scale = 0.9,keepaspectratio=true]{myimage.jpg}
 \fi

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
The shorter \includegraphics[\iflatexml scale=0.6 \else scale=0.9 \fi,keepaspectratio=true]{myimage.jpg} doesn't seem to work: when you compile to pdf, latex doesn't like it and gives an error.

Work-around for \xymatrix (or other diagrams that might not work)

  • In your pdf, take a screen-shot of the diagram. Crop it close to the actual diagram.
  • Include the image, using \iflatexml, so that you still have the normal code for the pdf file.
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    You do have to re-do the screen-shot and update the image when you change the latex code. So maybe do this as late as possible.
Example 1: \xymatrix commutative diagram
\iflatexml
\begin{figure}
 \begin{center}
  \includegraphics[scale=0.3,keepaspectratio=true]{DiagramCompositionfg.jpg}
  \caption{Diagram of function f composed with function g, displayed as 
  arrows in a triangle.}
  \label{fig-composition-f-g}
 \end{center}
\end{figure}
\else
 \[
  \xymatrix{X \ar[r]^f \ar[dr]_{g\comp f} & Y \ar[d]^g\\ & Z}
 \]
\fi

for Diagram of function f composed with function g, displayed as arrows in a triangle., including captions to get alt-text.

Example 2: schematic for matrix multiplication
\iflatexml
\begin{figure}
 \begin{center}
  \includegraphics[scale=0.7,keepaspectratio=true]{DiagramMatrixXVector.jpg}
  \caption{Schematic of matrix times vector, using lines to 
  represent rows of the matrix and the column vector.}
  \label{fig-matrix-vector}
 \end{center}
\end{figure}
\else
\[
 \begin{pmatrix}
  \raisebox{.7ex}{\rule{1.5cm}{0.5pt}}\\
   \raisebox{.7ex}{\rule{1.5cm}{0.5pt}}\\
   \raisebox{.7ex}{\rule{1.5cm}{0.5pt}}
 \end{pmatrix}\begin{pmatrix}\ \vline\ \\\ \vline\ \\\ \vline\ \end{pmatrix}
 = \begin{pmatrix}  
 \raisebox{1.4ex}{\rule{.25cm}{0.5pt}} | \\ 
 \raisebox{.7ex}{\rule{.25cm}{0.5pt}} |\\ 
 \rule{.25cm}{0.5pt}|  
 \end{pmatrix}
\]
\fi

for Schematic of matrix times vector, using lines to represent rows of the matrix and the column vector., including captions to get alt-text.

Placement with \hfill

As a webpage has no intrinsic width, \hfill will not work, e.g. to place comments on the right-hand-side of the page. A table can give the same effect in html:

\iflatexml
 \begin{tabular}{lr}
   \(\diamond\) \(0\in S\)  & (zero vector is in the set)\\
  \(\diamond\) for any \(u,v\in S\), \(u+v\in S\)  & (closed under vector addition)\\
  \(\diamond\) for any \(v\in S\) and any \(\lambda \in \R\), \(\lambda v\in S\)  & (closed under scalar mult)
 \end{tabular}
 \else
 \begin{itemize}
   \item \(0\in S\) \hfill (zero vector is in the set)
  \item for any \(u,v\in S\), \(u+v\in S\) \hfill (closed under vector addition)
  \item for any \(v\in S\) and any \(\lambda \in \R\), \(\lambda v\in S\) \hfill (closed under scalar mult)
 \end{itemize}
 \fi

gives three lines with some text left aligned and other text right aligned in each line

Placement with Minipage

Similar to \hfill, \begin{minipage} will have no effect in the html file. The content will just be one underneath the other. If you want it differently, you'll have to think creatively. You can use \iflatexml to keep your original placement in the pdf.

Coloured boxes

LATEXML doesn't (yet) understand mdframed or other common ways to make coloured boxes or frames round text.

It does however understand \begin{shaded} from the framed package.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
The boxes will not be as nice as these ones on this page. We are using markdown to make this page.

This is an example using these boxes.

Here are some suggestions that will allow you to keep the latex of the content of your chapters the same, and lets latexml use shaded and the usual pdflatex use mdframed (or what you have chosen).

If you have the same colour background everywhere, you can use

​\definecolor{shadecolor}{rgb}{1.0 1.0 0}
​\newenvironment{mdframed}[1][]{\begin{shaded*}}{\end{shaded*}}

in the preamble of your notes-latexml.tex. (You should have commented out \usepackage{mdframed} in this version of your preamble.)

If you use, for example,

\newmdtheoremenv[style=highlight]{definition}{Definition}

in notes.tex for compiling to pdf, then you can put the following in the preamble of your notes-latexml.tex:

\newtheorem{defi}{Definition}

\newenvironment{definition}
  {\definecolor{shadecolor}{rgb}{1.0 1.0 0}
\begin{shaded*}\begin{defi}}
{\end{defi}\end{shaded*}}

Then you can use \begin{definition} ... \end{definition} as normal throughout the content of your document. When you compile notes.tex to pdf it will still use mdframed around the definition environment, and when you compile notes-latexml.tex to html, it will use shaded.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
You can see that you can do different colours for different environments. I found also the HTML colour codes useful, for example \definecolor{shadecolor}{HTML}{FFCCCC}.

Explore LaTeXML yourself

You can read the

LATEXML manual with lots more details and info.

Extra: Templates for embedded lecture video pages

If you use some pre-recorded videos for your lectures, you can see an example of html pages with embedded videos, notes and related exercise questions. If you like it: download the template files (as zip). They have comments in the html on where to change what. I also made one structured by chapter, using my automatically generated latexml lecture notes as a base, but it's harder to translate to a different course.