long-shebangs
: there are myriad bugs affecting configuration with single quotes in prefix. Even the parts which work are unlikely to work then if the path had a double-quote. Double-check whether this was covered by the old rejected pair of PRs on configure (for cycling make reconfigure
) and either resurrect or redo this along with a CI test with a tortuous prefix.-use-runtime
should actually conflict with -custom
(this extends a fix in long-shebangs
a bit further)caml_parse_ld_conf
appears to have a totally broken read loop - there's no check that it actually read the whole file and EINTR
is not handled?! The latter may not matter, but shouldn't we ensure that we've read to EOF?!OCAMLLIB
and CAMLLIB
when they're empty - this should pervade the whole distribution (i.e. an empty environment variable should be equivalent to no environment variable because of the impossibility on Windows)-thread
or -I +threads
-noautolink
- you end up in the strange situation that you've built a runtime system with all the required primitives, but the linking with unix.cma still adds dllunix.so to the resulting bytecode!ocamlc -custom -runtime-variant _shared -o foo
creates foo
linked against libcamlrun_shared.so
. However, this seems like a terrible hack both in bytecode and native. Why can't we have libcamlrund_shared.so, for example? It sounds as though shared and or static runtime should be a separate option (both for bytecode and native code) with dllcamlrun.so
and dllasmrun.so
. dllasmrun.a
can be the current _pic
variant which IIUC is used to link the runtime statically but when compiling an OCaml .so.PATH
(critically, they must have an incompatible runtime)PATH
will succeed and neither will succeed if PATH
is cleared.CAML_LD_LIBRARY_PATH
- both should work.CAML_LD_LIBRARY_PATH
to contain both stublib directories22-Nov-2024 Notes
1–3, 5 covered by installation-tests branch. 4 is out of scope (no plans to add OCAMLLIB-ID - the restricted version was to harden the error messages).
(notes from the original presentation)
dllunix.so
; CAML_LD_LIBRARY_PATH
abuse)ocamlrune
somewhere else, e.g. local switch)lib/ocaml
somewhere else, OCAMLLIB
abuse)stdlib.cma
should not be suffixed (Slide 10) - suffixing should be done using the directory it's found in. What is worth considering is whether the .cma
and .cmxa
format should embed the system ID for improved messages?OCAMLLIB
points to clearly wrong stdlib.cma
) then ocamlc can do the originally proposed ignoring of it and see what happens (i.e. ocamlc
should determine if OCAMLLIB
appears to be wrong and tell the user, but not just act as though it weren't there)camlheader
should be folded into a linker option, meaning all systems will have the executable versions in stdlib dir (and will generate shebang inside the driver)--enable-relative-libdir
should be the default with the absolute directory only being a fallback for when the full path to $0
cannot be found or the relative path from $0
cannot be found. There should be a new option not to embed the absolute path at all? Remember that ocamlrun uses the stdlib dir to locate ld.conf, so there is a case for keeping the absolute path in the search too. The big change in ocamlrun's behaviour is that ld.conf is loaded from $OCAMLLIB, $CAMLLIB and the preconfigured location (which may be relatively resolved). This is a separate PR.Hashtbl
already) with an appropriate key for direct 32-bit?Target commit is 1c5e748. Moving each branch in turn, but with relocatable-base-trunk still on d98fd80.
windows-ln-5.0
and compiled-primitives-5.0
need to be rebased on to 5.0 and stack updated with @5.0unified-target-bindir
needs to be rebased onto 5.0 (it's not needed post utils/Makefile merge on trunk). It should be renamed to target-bindir-5.0 at this point.unified-enable-relative
not done yet (the unified PRs need to be done when the base commit moves)unified-runtime-suffixing
not done yet (the unified PRs need to be done when the base commit moves)ld-warning-temp
can be removed as soon as the base commit is shifted (it's only there to reduce diff noise)runtime-id-temp
can be removed as soon as the base commit is shifted (it'll squash on to runtime-id-5.0
)Branches on relocatable-base-trunk
:
ocamltest
misc-win-fixes
<- PR 0windows-ln
<- PR 1one-camlheader
<- PR 2target-bindir
<- PR 3unified-target-bindir
dropped post-rebaseld.conf-CRLF
<- PR 4ld.conf-search
<- PR 5ld.conf-relative
<- PR 6compiled-primitives
<- PR 7enable-relative
<- PR 8unified-enable-relative
ld-warning
<- PR 9runtime-id
<- PR 10runtime-suffixing
<- PR 11unified-runtime-suffixing
camlheader-search
<- PR 12unified-camlheader-search
Remaining rebase tasks:
target-bindir-5.0
, rebase onto upstream/5.0 and update stackNotes for future rebases. Working a single branch at a time but keeping the backport-trunk
branch on the old commit creates some unnecessary resolutions, but they are all trivial - at each conflict, origin/backport-trunk
contains the correct commit, so additional work is only required for files which have no longer changed. It means throughout the process that all the origin branches remain identical. Completely avoid both rebasing and updating branches! When shifting the combined branches to the final commit, back-up the current combined set (which should match origin/backport-trunk) and use that for a diff-of-diff comparison with the patch on the new commit. The last check is important since some of the unified branches add missing code, so the branches may build even though they're technically incorrect. During this switch, it should only be necessary to re-combine to trunk and the latest release - as long as the release branch is identical, the backports should recompile by induction. Depending on the amount of time this takes to upstream, next time leave the old base as an "abandoned" target (i.e. continue to rebase to it). This would remove the need to merge the resolutions, at the cost of an extra target. At the following actual OCaml release, then consolidate the interim bases.
These are out-of-date from the 2021 version, but kept for now until the work is done. Note that CoW will be the way to do this - and we'll be identifying if the sources are the same based on the hash written by the ocaml-src package, not using runtime IDs (in other words, the configuration of the switch matching will be taken as the hash key)
Already previous notes on ideas for ocaml-src. The aim here is to make all the vanilla packages in ocaml-variants
and ocaml-base-compiler
(part from the actual forks) depend on ocaml-src
and it's in ocaml-src
that we'll embed patches.
The idea is that this package will use git
trickery (if available) and a Windows+Unix script to generate a .install file and a sha1 sum for the sources in their actual state. The build process for the compilers will therefore begin by cloning those sources.
Our default mode could also include tarring these up? It's tempting to have the copy operation be really quite fast. We could also use caching for this??
Then we introduce ocaml-compiler
as the core package - this uses ocaml-options
, etc. ocaml-base-compiler
and the existing ocaml-variants
are then rewritten in terms of this package only. The ocaml
package therefore begins to depend on on ocaml-system
and ocaml-compiler
only.
configure
could provide a fast way to get the runtime ID for its configuration. In this idea we would hoist the detection of the runtime ID as high as it physically can be and then have an enable switch which bombs the configure script at that point. Alternatively, we put the generation in a separate script which is called by configure (which might be better!) and opam just has to know how to replicate thatCC
which is not captured in it. This is fine - and would form part of detecting whether a compiler is a suitable caching candidate. It's a packaging concern; OCaml only has to be compatible with it.opam upgrade
/ opam reinstall
can fail!!opam switch
since we should aim to include local switches in the cache.bin/
1
man-pagesa
/lib
(C libraries)o
/obj
(C objects)byte
(bytecode executables)opt
(native executables)so
/dll
(Shared libraries)cma
/cmi
/cmo
/cmti
/cmt
/cmxa
/cmx
/cmxs
(OCaml artefacts)h
/tbl
(C headers)ml
/mli
(OCaml interfaces/installed code)hva
(ocamldoc Hevea artefacts)So the conclusion is that we can hard-link everything except ld.conf
, not because we want to edit, but because it could be edited and we similarly Makefile.config
not be cause it will be edited but because the cloning process will edit it (to set LIBDIR
and so forth to correct values)
With this in place, we can do the indistinguishable compiler test
XXX opam will re-package and libasm*.a and libcaml*.a files and re-link the .so files with the correct stdlib.o (the patches should install a helper Makefile for this)
ocaml
package's setting of CAML_LD_LIBRARY_PATH
needs updating (or, rather, the gen_ocaml_config.ml.in
implementation). At the moment it adds the relative paths to CAML_LD_LIBRARY_PATH. The simplest fix would be for it to process the paths in the same way as the runtime. Equally, it might be possible to manipulate the environment enough to be sure that ocamlrun -config
gives the required answer? Equally, it could be sufficient to detect that the version is greater than the released version and not include the updates (because they are systematically not necessary)ocaml-option-no-cache
which causes the compiler to be rebuilt without using the cache. The idea here is to allow opam to move towards being able to use the ln mode of caching (which we might do with ocaml-option-ln-cache
?) - but this depends on there being no changes to the ocaml package. Something here which we could do, for example, might be to make graphics 5 available across all OCaml's with the appropriate settings for the older ones.ocaml-option-flambda
installed and getting an instant OCaml compiler - i.e. the work is only recompiling your actual switch._set_abort_behavior
is not strictly related to any of this work, but it's well worth considering.-header
is introduced, since it uses Config.shebang
cp
and ln
. See notes below on hardlinking the tree.Arg.Symbol
for -set-global-string
instead of parsing name=value
?caml_standard_library_default
- if it's absolute, then use it, otherwise do the computation (this should mean only shebangs betray the build prefix?)--enable-relative
should fail if the checks for both realpath
and getcwd
fail.Makefile.config
when installed will end up with the wrong LIBDIR and BINDIR - what to do about this? Partially depends on whether libraries areble stupid w.r.t. Makefile.config
- possibly use Kate's opam-grep tool for this. It's possible to abuse $(MAKEFILE_LIST) in GNU make for this - but Makefile.config
should be trying to be as parseable as possible. It's possibly better to be considering patching it in opam's cached model instead.--enable-relative
, i.e. if installed in /foo/bin
and cloned to /bar/bin
the absolute runtime becomes /bar/bin
-make-runtime
must create a runtime which doesn't do relative lookup.enable-relative
)ocamlmklib
updated to generate stub libraries using the RuntimeID-suffixed
ocamlmklib
should search for a compiler in the same directory in --enable-relative
mode--enable-relative=compiler
and --enable-relative=strict
. Normal --enable-relative
mode keeps the configured absolute path - it can be observed both in bytecode versions of installed tools and in caml_standard_library_default
. In compiler
mode, the headers are all relative - so the installed tools will never for ocamlrun in the configured location; additionally caml_standard_library_default
is not set (which relies therefore on the overriding the weak symbol). This is what opam will use. In strict, caml_standard_library_default
is set to the relative location and the symbol is not overridden by either compiler - i.e. it is assumed that the resulting system understands the relative system. The aim is that this mode further reproducible builds.-shared -id-suffix
or something?) but there should definitely be an option to Dynlink.adapt_filename
(or probably a new function Dynlink.adapt_filename_with_id
or something… although an optional parameter here is probably not so bad) to allow this trick. Adopting this for plugins would greatly ease the likelihood of library mismatch and it might be enough to close the original issue../configure --prefix ~ --libdir ~/lib/ocaml
reports no common prefix… looks like something went wrong with bindir expansion? It does work if you add --bindir ~/bin
.These branches are stacked at the moment.
Fixes the bug @damiendoligez identified in https://github.com/ocaml/ocaml/pull/8622#discussion_r328158394
This was incorporated into an additional fix in ocaml/ocaml#11112 for OCaml 5.0. It's included in the back-port.
Tests: none required
:bulb: All the work required from this branch has either been upstreamed or moved to other branches
SearchPath
. The argument gets clobbered - needs verifying and fixing separatelyconfigure
-based fix for mingw-w64 (which defines the symbol in its headers, but uses a runtime which doesn't have it)CAML_LD_LIBRARY_PATH
in the output of ocamlrun -config
(general bug fix)stdlib/hashbang
(unused file)Camlheader
error (parameters the wrong way around)Miscellaneous fixes which need to be in PRs. These need dispatching to separate PRs.
Windows has supported symbolic links since Windows Vista, however creating them requires elevation ("sudo"). Since Windows 10 1703, that need for elevation is removed by enabling Developer Mode, which is a common configuration choice for, um, developers.
The distribution uses symlinks on Unix to alias ocamlc
to ocamlc.byte
/ocamlc.opt
, as appropriate. On Windows, this has always used cp
, incurring a considerable cost for the duplicated executables in the bin
directory.
This PR adds a test to configure to see if the ln
command is able to create native Windows symlinks. Cygwin's ln
is always able to create symlinks, as Cygwin emulates them if the native support is disabled. Cygwin has a mode nativestrict
which causes ln
to fail if Windows native symbolic links can't be created. However, that assumes we're controlling Cygwin's ln
- the test here is slightly stronger (checking the output of cmd's dir
command) in order to be paranoid that the symbolic links are only created if they will be readable outside the shell executing the script.
MSYS
variable as well as CYGWIN
CYGWIN
in the call (NB - work with Seb using AC_CONFIG_LINKS
may supersede this aspect - CYGWIN
/MSYS
may be correctly set in the build regardless)Relocatable OCaml - test harness
This is the first PR in the "Relocatable OCaml" series of changes. The primary motivation of this project is to allow compiler installations to be duplicated, both in opam and in Dune's package management feature (dune pkg
). From the compiler's perspective, this boils down to being able to use both the compiler and the runtime after renaming the prefix in which the distribution has been installed.
The changes to achieve this shine light into various dark corners of our linking and execution strategies (some of which have already been tackled in #12751), especially in bytecode.
The goal of this PR is to add a test harness for Relocatable OCaml, which subsequent PRs then amend (and in general simplify). There are two key differences between this test harness and the main ocamltest-based testsuite:
A consequence of the "in-prefix" part is that this is not a test that should be run by default and the fact it needs to operate outside the build tree has led to an additional harness, rather than additional features in ocamltest.
The harness itself has revealed various bugs not related to Relocatable OCaml at all (cf. ocaml/flexdll#146, #13496, #13520, #13638, #13692 and #13693, in addition to a fault in the partial linker alluded to #13692 which will be fixed in a subsequent PR)
The tests performed are covered in testsuite/in_prefix/README.md
in this PR.
In terms of review, the first two commits alter the compiler:
Sys.argv.(0)
w.r.t. Sys.executable_name
and also bytecode launching. For this to work, it is necessary for the harness to determine if caml_executable_name
returns NULL
. I've done this by tweaking the startup code ever-so-slightly in bytecode to ensure that caml_executable_name
is always called (it is always called in native startup) and then exposed that fact in a new primitive caml_has_proc_self_exe
. There are other ways this could be done - I like the fact that this approach has actually tested that caml_executable_name
works (versus adding more configure
-logic instead of the #ifdef
-soup in runtime/unix.c
's implementation)Ccomp.call_linker
(I mean really, really, simpler - I tried!). However, in order to that, the test harness needs to be able to control slightly more precisely the value of Config.standard_library
as interpreted by Ccomp.call_linker
. Having tried various approaches, the least invasive to compiler-libs seems to be to generalise Compmisc.init_path
via a new Compmisc.reinit_path
. Again, there are other ways of doing this, but this one I think is the simplest that doesn't involve duplicating code from utils/ccomp.ml
directly in the test harness.Those first two commits clearly change the compiler, and I expect to be reviewed as such. The next two alter the testsuite only so, while testsuite/tools/test_in_prefix.ml may be, um, a little long, it is also simply a test, like the other 1600 or so ml files in the testsuite! In an earlier incarnation, it was necessary to compile this test harness using the installed compiler, which is why it started out as a single OCaml script. I'm not averse to having to break it up into smaller files, but it wasn't instantly clear to me that it would bring much clarity, and it's not like there's anything reusable - it is in essence a very long script which happens to be written in OCaml.
As far as possible, the harness is told on its command line what to expect from the installation (shared library support; bytecode-only, etc.). That's permitted to guide the selection of tests. Beyond that, everything is executed - i.e. if a test is known to fail on a given platform or architecture then it is run and that failure is noted. The harness therefore "fails" when these issues are fixed.
The final commit plumbs the tests into CI - it (should!) be passing both on GitHub Actions here and has also passed precheck#1009. It's also passing an even wider text matrix including Cygwin and multiple different shebang/executable/static/minimal tests in dra27/ocaml#158.
…
Makefile
alterations - this should instead go into testsuite/Makefile.installed
or some such, and then the guard isn't neededXXX PR not started
There's some interim work on the Ubuntu VM on Libera merging the C files for the two headers. This should be seen as pre-requisite and done separately. We regressed the compiled header a while ago on Windows (in 4.06, with the Unicode change).
The major fault (work on Thor checking this?) is that the header entry was changed before in order to reduce the file size, and it does considerably. The fault on mingw goes back ages - there is a linker option for gcc (-Wl,-entry
IIRC) which does the same. Since 4.06 we're also using a stdlib function for no good reason.
Notes October 2024
Branch work on relocatable-base-trunk@033900bf
msvc64 vanilla: runtime-launch-info is 12310 bytes
mingw-w64 vanilla: runtime-launch-info is 17943 bytes
The strategy for this:
/O1 /GS- /link /nodefaultlib
Future todos (Nov 2024):
-custom
executable analyses argv[0]
first, which is a mild security concern. It's possible this can just be done where -custom
executable always tries this first (although was this done anyway in the rust interop improvement??)OCaml 4.05 msvc64:
result: 15872 bytes
OCaml 4.05 mingw64:
result: 277610 bytes
OCaml 4.06 msvc64:
result: 20992 bytes!
OCaml 4.06 mingw64:
result: 278581 bytes
OCaml 4.14 msvc64:
Result: 12800 bytes
OCaml 4.14 mingw64:
Result: 134339 bytes
For mingw-w64, remove -municode and add -nostdlib -Wl,-eheaderentry before the link command and then -lkernel32 -lmsvcrt afterwards
134339 -> 13239!
Looks like we only need wcslen for both
All the mucking around the msvc64 side is not getting below 7680 - indeed, -O2 is already inlining wcslen
Stripping the mingw-w64 header gets it to 6144!
XXX PR not started!
On Unix, it's reasonable to assume that argv[0]
will point to something we can open. On Windows, this isn't true: it's acceptable for the command line not to include the .exe
, if it was not used when the process was created (i.e. a program can see the difference between being invoked program
vs program.exe
).
This has always been the case, but it's been less visible since PATH
resolution will give the fully resolved filename. For example, assuming OCaml's bin
directory is in PATH
, ocamlc.byte
will be resolved to a path ending ocamlc.byte.exe
. However, if one runs C:\ocamlmgw64\bin\ocamlc.byte
then the .exe
is not appended, and ocamlrun
will claim that no bytecode file was specified, since it can't find C:\ocamlmgw\bin\ocamlc.byte
.
Adding .exe
is both a nasty smell in ocamlrun
and also brittle, given that other extensions are available.
The Windows executable launcher already determines its full location using GetModuleFileNameW
in order to read the RNTM
section. This PR tweaks the launcher to use this path as argv[0]
(following the rules in CommandLineToArgvW to escape it). On Windows, ocamlrun
:
argv[0]
, so we use GetFinalPathNameByHandleW
to canonicalise itGetModuleFileNameW
, so we open it and canonicalise it with GetFinalPathNameByHandleW
.argv[0]
when launching the bytecode imageThe Windows header is totally broken w.r.t. the .exe
extension and always has been. We need a way to convey to ocamlrun
the executable path we've already determined.
On Windows, environment variables are deleted by setting them to be empty. The main environment block does not differentiate between an empty environment variable and an unset environment variable.
For portability, it is therefore better to ensure that an empty environment variable is treated as un-set on Unix. This is also consistent with most released versions of opam at the moment, as when reverting environment changes, opam leaves empty environment variables, rather than unset ones (i.e. a variable which was unset before calling opam env
may be empty after a round-trip through opam env
followed by opam env --revert
)
This is a minor breaking change in that, for example, an empty OCAMLLIB
before resulted in a broken compiler. Similarly, an empty CAML_LD_LIBRARY_PATH
always added the current directory to the search path.
$OCAMLLIB
!CAML_LD_LIBRARY_PATH
XXX PR not started
This PR is about ensuring that backslashes passed to configure
make it through to the system as backslashes. The aim with this is that ./configure --prefix 'C:\OCaml'
should result in an installed OCaml with no forward slashes. This is different from the original ocaml/ocaml#658 as this is about preserving backslashes rather than forcing them.
This PR is a prerequisite for the "Relocatable Compiler" project, but the changes here are independent of it. It addresses five currently known bugs (see below) in #!
("shebang") handling in the bytecode compiler and its implementation considerably simplifies stdlib/Makefile
, in advance of @shindere merging that into the root Makefile
.
When not using one of its C-based linking modes (-custom
, -output-complete-exe
, etc.), ocamlc
creates bytecode executables by prepending a launch header to the bytecode image. This header's sole responsibility is to locate the actual OCaml runtime and transfer execution to it. There are three ways in which this can be done:
#!
(or "shebang") header is used with the full path to the runtime, e.g. #!/usr/local/bin/ocamlrun
. This is the default on Unix systems (except Cygwin, at least before this PR).exec
the runtime. Presently this is only used with -use-runtime
when the path given is too long or contains a space, for example:stdlib/header{,nt}.c
), which is able both to execute a runtime at an absolute location (i.e. /usr/local/bin/ocamlrun
) or to search $PATH
for a runtime (i.e. to search for ocamlrun
in $PATH
).At present, the choice between shebang scripts (mechanisms 1 and 2) and executable (mechanism 3) is made at configure
time ($(SHEBANGSCRIPTS)
and $(LONG_SHEBANGS)
), and the result is written by the build system to the file camlheader
which is kept in the Standard Library directory. This file is either the compiled executable header or it is the full path to where ocamlrun
will be installed.
The runtime variants (-runtime-variant d
and -runtime-variant i
) are supported by building multiple versions of this file, so there are in fact 3 of them: camlheader
, camlheaderd
and camlheaderi
.
Finally, in order to support the -use-runtime
option, a different file camlheader_ur
is created. ocamlc
copies this file and immediately starts the bytecode TOC recorder. It then writes the name of the runtime followed by a newline and then marks the RNTM
section.
Now, when shebang headers are supported, camlheader_ur
is exactly the string #!
. This means that ocamlc
's procedure writes a shebang header, though pointlessly records it in the RNTM
section. When mechanism 3 (the small C program) is in use, camlheader_ur
is exactly the same as camlheader
. The C program, in addition to knowing how to search $PATH
, is also able to read the RNTM
section of the bytecode image. It reads this data, cunningly converts the newline character which ocamlc
wrote into a nul character, in order to make the RNTM
payload a valid C string, and then proceeds to execute that runtime.
For -use-runtime
only, ocamlc
performs some validation on the runtime path to check if it's valid to use in a shebang line. If it's not, then it elects to write a mechanism 2 header (using /bin/sh
).
That gets us to OCaml 4.02.1. In OCaml 4.02.2, in order to assist the iOS and Android cross-compilation projects, an additional set of headers was added: target_camlheader
, target_camlheaderd
and target_camlheaderi
. These are the same as their unprefixed counterparts, except that the directory written to them is $(TARGET_BINDIR)
(which can be overridden when calling make
) instead of $(BINDIR)
. There is no target_camlheader_ur
because there are no paths embedded in the file (so it never differs).
As announced above, the five following bugs are present in this mechanism in OCaml. In decreasing order of severity, they are:
target_camlheader*
files are incorrectly generated, and bytecode executables produced by the installed ocamlc
will have invalid shebang lines. (this is a very real bug, originally identified by the Sandmark project; see also https://github.com/ocaml/ocaml/pull/2309#issuecomment-503582198 and #12709)-use-runtime
which is longer than 125 characters causes ocamlc to generate a corrupt executable since it uses a #!/bin/sh
header.configure.ac
and bytecomp/bytelink.ml
, these have subsequently diverged and are (still) both incorrect. In particular, configure.ac
only checks the length (and, even then, in a conservative check which rejects one otherwise valid possible header) and while bytecomp/bytelink.ml
checks for space characters, both places fail to check for tabs and newline characters, which are also not permitted in a shebang line. (this issue has been separately reported in #10724, and is also one reason that Windows CI actually includes a stronger test of strange characters in --prefix
than the Linux/macOS one)stdlib/Makefile
(generating the headers) and bytecomp/bytelink.ml
(processing -use-runtime
) assume sh
resides in /bin/sh
, which is not guaranteed by POSIX (and, indeed, is not the case on some, admittedly obscure, systems)camlheader_ur
is just the string #!
, bytecomp/bytelink.ml
still (unnecessarily) records the RNTM
section. This in itself isn't a bug, but when writing a #!/bin/sh
script version, the RNTM
section incorrectly contains the entire /bin/sh
script, rather than just the name of the runtime.The current implementation of all this goes to some lengths to ensure that it is enough for bytecomp/bytelink.ml
to copy the header blindly and never actually have to inspect its content. Regardless, the header is a relatively subtle piece of configuration state. The compiler and most of the tools will be compiled with boot/ocamlc
which is built with a generic Config
module. With the current setup of the build, therefore, it is not possible for boot/ocamlc
to use values from Config
(although Config.bindir
exists, boot/ocamlc
will see the value it was built with during the bootstrap, not the value used during configuration and, for this reason, there is no Config.shebangscripts
to mirror the $(SHEBANGSCRIPTS)
variable in the build system). Although ocamlc
doesn't at present actually analyse the content of the header it's copying, the decisions it takes as to which file to read (based on -use-runtime
and -runtime-variant
) mean that the header is effectively acting as a series of "ghost" command-line arguments! While this is sort of neat, it's causing a few problems:
-use-runtime
when camlheader_ur
is #!
, ocamlc
ends up writing the full path of the runtime twice (even allowing for bug 5)-use-runtime
mode, ocamlc
only needs to mark the RNTM
section for the executable header, but it's unnecessarily marking it even when camlheader_ur
is just #!
(which ocamlc
has in fact read!)RNTM
section is needed, the string ends with the wrong terminator in order to keep the format valid for the shebang case (i.e. the RNTM
section is written unnecessarily in one case and, in order to ensure that the string is correct in that unnecessary case, the string has to be mangled in the necessary case :exploding_head:)%PATH%
for the runtime, and never use an absolute path. This is very subtly encoded in stdlib/Makefile
.camlheader
et al means that the code for validating shebang lines is at present implemented in m4sh (in configure.ac
), in OCaml (in bytecomp/bytelink.ml
) and should be being implemented in GNU make (in stdlib/Makefile
).Presently, the processing of the header in ocamlc is simple, because it boils down to copying the correct file. I think it's possible to fix these 5 bugs while maintaining that. However, the code (in GNU make and m4sh) won't be terribly tasteful and the various checks will still be duplicated in several places. It is not possible to do Relocatable's switch cloning this way, where camlheader
instead of being #!/usr/local/bin/ocamlrun
wants to be something akin to #!../../bin/ocamlrun
with the ../
interpreted relative to the header itself.
So, at last, to the details.
The principle here is to allow ocamlc
to do all of the work, being given only the information which it can't know in advance via the "header". Since the header is now really a data-file, it's called runtime-launch-info
. It contains the following three pieces of information:
sh
in a shebang$prefix/bin
)stdlib/header.c
, or stdlib/headernt.c
on Windows)ocamlc
is responsible for:
-runtime-variant
and the bindir read from runtime-launch-info
)sh
, if runtime-launch-info
doesn't contain an absolute path to itRNTM
section is used only with the executable header (and is now null-terminated). Furthermore, when using executable headers, it is required that the RNTM
section is present in the image (this stops the executable header from ever containing an absolute path).It makes sense while overhauling all this to implement (finally) the --with-target-bindir
option, which was added in the switch to autoconf in 4.08 but never plumbed in. Previously, cross-compilation systems will have specified TARGET_BINDIR
to make
instead. Additionally, a new switch --with-target-sh
has been added to complete the cross-compilation picture, at least in terms of the shell scripting. This allows every aspect of stdlib/target_runtime-launch-info
to be controlled in the build. In particular, it allows an improvement to Cygwin's compilation (see below), also providing an immediate upstream use-case for this change.
While the implementation necessarily adds quite a lot of code to bytecomp/bytelink.ml
, it removes a relatively complex bit of m4sh from configure.ac
and an exceedingly complex mess from stdlib/Makefile
. In passing, there are a couple of related issues which can be trivially fixed:
sh
may not be found. This is "solved" by always compiling the executable launcher, even on Unix. A minor side-effect of this is to reduce bit-rot in this file, which had started to happen (see first commit).--with-target-sh
allows the use of shebangs on Cygwin to be improved.The approach I've adopted in this PR is to allow ocamlc
to look at the data it reads from camlheader
and act accordingly. There is more than one way to do this! I think it is possible to achieve this using a mix of -use-runtime
in the build system and carefully ensuring that the Config
module's values are only ever used by a compiler which has been actually installed. Likewise, we have previously discussed being able to dynamically load the complete Config
module into the boot compiler (see #9291). I think the -use-runtime
approach is likely to be a bit too brittle and although I have a possible approach for loading Config
at runtime, it's not a trivial change, and we are also fixing bugs here and now.
one-camlheader
camlheader
to ocamlheader
(or ocamllauncher
- launcher is feeling good). Current verdict is runtime-launch-infoboot/ocamlc
does have access to Sys.win32
. That changes gets reverted when unified with camlheader-search.\t
and \n
into account.bytecomp/symtable.ml
's function for getting the primitives from the runtime should echo the command when -verbose
is used (generalise the mechanism already present in utils/ccomp.ml
) Not sure why this is a TODO against this branch?tools/stripdebug.ml
but not tools/cmpbyt.ml
to ignore the RNTM
section - shouldn't it do both? For 5.0+ this is unnecessary, because cmpbyt is optional, but for the back-ports this might matter?At present, Windows can correctly read either a CRLF or an LF-formatted ld.conf
, however Unix cannot. This PR adds the appropriate tweaks to Dll
and dynlink.o
to skip \r
characters when parsing ld.conf
. Note that the compiler itself already handles this correctly for source files.
\r
character at the end. Given the lack of an escape hatch, not sure whether to go with this, reject it completely, or possible post-process the list and do the ocamltest
-style "remove exactly the last \r
only". However, I think accidentally loading the wrong file is more likely to be a problem than this!The "harmonious" feature of the relocatable compiler is that the configuration of one compiler should not interfere with the configuration of another. The bytecode runtime has to read ld.conf
from the Standard Library location on startup. This location at present is taken from $OCAMLLIB
or $CAMLLIB
if either of these is set. If neither is set, the location the compiler was configured with is used. However, if OCAMLLIB
has been set for one runtime, then another runtime will not always load the "wrong" ld.conf
.
This PR primarily alters the runtime so that ld.conf
is loaded from all the possible locations in order.
This is a "breaking" change inasmuch as programs which would have been expected to fail before might instead work. Programs which worked are unaffected, because they must have been loading libraries based on the first ld.conf which was found.
The runtime no longer uses caml_get_stdlib_location
. I've consequently removed it (and therefore it's no longer displayed in ocamlrun -config
).
This PR includes a simplification the memory management for caml_shared_libs_path
- the effect can be seen by looking at the PR commit-by-commit, but it eliminates the need to allocate and return an array of pointers from caml_parse_ld_conf
. In this PR, that's a simplication - in the subsequent PR introducing relative syntax to ld.conf
, it's mandatory (since the strings passed to caml_shared_libs_path
are then computed rather than simply read).
This is the first of the patches allowing the compiler to be relocated/cloned.
At present, the lines in ld.conf
are expected to be absolute paths, but this isn't actually checked. Determining if a path is absolute is mildly complicated (Windows…), however explicit relative paths can be portably identified with ease, since these are paths beginning ./
or ../
(or just .
and ..
).
These entries in ld.conf
are now interpreted relative to the directory containing ld.conf
. The default ld.conf
file can be written:
which can clearly be copied or moved.
Implicit paths (as defined in Filename.is_implicit) retain the old, somewhat bizarre, interpretation. CAML_LD_LIBRARY_PATH
retains the same interpretation as before.
CAML_LD_LIBRARY
being blank (rather than unset) seems to have an interpretation. Has that changed with this PR? (it shouldn't change)enable-relative
, but the generation of stublibs assumed that STUBLIBS
in Makefile.config
hadn't been overridden. Not sure if this is unnecessary pedantry (so whether the original just writing ./stublibs
would do) or whether it should go further and do a relative computation.Extends the format of ld.conf to recognise implicit paths as being relative to the location of ld.conf (NB in this instance, implicit includes .
and ..
). This is breaking, since previously such paths would have been relative to the build directory. Reasons for picking this scheme:
.
which is nicer than a blank line. The default file becomes .
and ./stublibs
+
for two reasons:
+
elsewhere to refer to the effective standard library location whereas here it means the location of the ld.conf
file being read+
(which is not ridiculous)When compiling a custom bytecode runtime, a C file is generated containing the combined primitives table of the runtime (camlprim.c
). Presently, this is passed directly to the C compiler and is written in a way which avoids using any of the C headers. This has resulted in increasing amounts of code duplication - the typedef
for intnat
(already in caml/config.h
) has to be inferred and the linker command line in Bytelink
has to handle -fdebug-prefix-map
, duplicating logic already in Ccomp
to handle this. It gets worse with the relocatable compiler patches.
This PR alters ocamlc
's link process so that primitives file is explicitly compiled using Ccomp.compile_file
. That eliminates the duplicated debug prefix map code and also allows caml/mlvalues.h
to be used to get all the required definitions.
For the build system, this means when building with -custom
that we must ensure that runtime
directory has been included with -I
so that the headers are available.
--disable-shared
mode?This PR adds --enable-relative
to configure
which, when given, specifies that both the bytecode runtime and the compilers should locate the Standard Library using a path given relative to the directory containing the tools themselves. For a default installation on Unix, this changes the default location of the Standard Library from /usr/local/lib/ocaml
to ../lib/ocaml
.
At first glance, implementing this seems straightforward - the relative value for LIBDIR
gets injected into runtime/dynlink.c
via OCAML_STDLIB_DIR
and into utils/config.ml
via %%LIBDIR%%
and a relative calculation can then be added in both places. This seemingly straightforward approach fails in two ways:
-custom
or -output-obj
would search for ld.conf
relative to the compiled program, which is clearly wrong.Config
module require updating to recognise a relative value being returned by Config.standard_library
The second problem could clearly be solved by using a new value, and continuing to leave Config.standard_library
as it was before (i.e. deprecating Config.standard_library
internally and using, say, Config.effective_standard_library
). However, this breaks the reproducibility of the build, and doesn't solve the first problem.
What is needed, therefore, is one value for the Standard Library location used by ocamlrun
and by the compiler drivers, which can be relative, and an absolute value which is used by programs created by the compilers.
The solution proposed here is to introduce caml_standard_library_default
(a relative path in --enable-relative
or LIBDIR
otherwise). This symbol is not included in either libcamlrun
or libasmrun
but, like prims.o
, is added to ocamlrun
. That works correctly for ocamlc
when outputting executables using the launcher stub (i.e. which are invoked using ocamlrun
). ocamlopt
, and ocamlc
when linking actual executables or objects, then calculate the effective value of caml_standard_library_default
and put this in the startup object. This deals with both problems, except that it means that ocamlc.opt
and ocamlopt.opt
now always have an absolute path for caml_standard_library_default
(since only ocamlrun
had the relative path). To deal with this, a new compiler option -set-global-string name=string
is added to both drivers. This parameter is only valid when ocamlc
is linking C code (i.e. it's not valid for bytecode which is sent to ocamlrun
) and causes the global name
to be added set to the "string"
. The compiler's build system then uses this flag when linking to specify the relative path to the Standard Library. If caml_standard_library_default
is not set using -set-global-string
, then the compiler automatically sets it to the absolute path it's computed. Now, all the compiler distribution tools compute the Standard Library relatively, but everything produced by those tools use an absolute path, computed when those tools start.
Note that while libcamlrun
and libasmrun
gain an additional undefined symbol which has to be provided when linking an executable, these libraries are already expected to be used with an object emitted using -output-obj
, which already defines these symbols.
Additionally:
ld.conf
at present is always loaded by the runtime, which is unnecessary for -custom
executables. The runtime is tweaked only to read ld.conf
if it will need to use the search path to load shared libraries.-set-global-string
option adds the need for OCaml to be able to manipulate UTF-16 strings (so that ocamlopt
can emit the correct assembly listing on Windows) and also to be able to encode an OCaml UTF-8 string as a C string literal, both of which are done using C primitives. The use of C for producing the C string literals allows the use of the Windows API functions for converting between UTF-8 and UTF-16, rather than having to add a decoder to the Standard Library (the code is also already present in runtime/sak.c
).Config.standard_library
remains the absolute path to the Standard Library. Config.standard_library_default
is the actual value computed by configure
(which may therefore be a relative path). Config.standard_library_effective
is always an absolute path but, unlike Config.standard_library
, it does not read $OCAMLLIB
or $CAMLLIB
. Finally, Config.standard_library_relative
is true when the compiler was built with --enable-relative
(it is effectively Filename.is_relative Config.standard_library_default
)Cmm_helpers.emit_global_constant
. In particular, compared with the output of gcc
, .section
, .type
and .size
directives may be wanted.
signifies a relative path to find the runtime. Note that when this is changed, that should mean --enable-relative --disable-runtime-search-target
should mean that the re-run of executables fails, but after renaming the prefix, the executables produced by the compiler should work (at the moment, for example, the bytecode dynlink test requires a shim).OCAMLRUNPARAM
below there is an issue with DOTOPT_LINKFLAGS
- when compiling bytecode executables with -custom
, -set-global-string
should also be used. This probably isn't visible, because there aren't many for which this actually matters (perhaps ocamldoc??)… this implies that the -set-global-string
ought to be always applied (both for bytecode and native code) … that would fail for the tendered bytecode programs at the moment, but won't once the below is applied… (could hack the bytecode section together for caml_standard_library_default?)-set-global-string
(which, so far, doesn't have other users):
OCAMLRUNPARAM
to a structocamlrun -config
Sys
caml_standard_library_default
to this new structcaml_standard_library_default
in OCAMLRUNPARAM
ocamlc
and ocamlopt
to override any part of OCAMLRUNPARAM
-set-global-string
which isn't available in this context, all the parameters can then be overridden during bytecode load (i.e. we load the default struct from bytecode). For bytecode, it's almost certainly simplest to actually write the section using OCAMLRUNPARAM
format - i.e. that gets parsed, and the struct is updated, then the actual value of OCAMLRUNPARAM
(from the environment) is parsed over that. This is good, because tendered bytecode executables were inadvertently relocatable before - i.e. they would always pick up the relative stdlib for the runtime which managed to execute them, which isn't strictly equivalent.ocamlrun
will determine the location of ld.conf
using its default for caml_standard_library_default
- the value in the bytecode is then used subsequently for dynlink
and so forth. That is consistent - ld.conf
is part of ocamlrun
… the decision about where to load things from subsequently is part of the bytecode executable.-set-global-string
is now a somewhat simpler addition to OCAMLRUNPARAM
The first implementation acquired caml_realpath
and caml_dirname
which provided the building blocks to implement caml_locate_standard_library
in a cross-platform way. The problem is that this is all required in C, and implementing the Windows versions of both of those functions properly is non-trivial. However, for Windows, GetFullPathName
combined with GetFinalPathNameByHandle
gives the required result (both a good filename to display, and GetFullPathName
will set a pointer to the basename of the result), so it's actually better to implement caml_locate_standard_library
separately on each platform.
Tests:
Three of the backends are missing .type
and .size
directives for caml_system.frametable
, which causes linking warnings when using libasmrun_shared.so
.
This PR introduces the concept of a RuntimeID to describe a given version and configuration of OCaml and forms the basis of filename mangling used to allow both multiple versions and configurations to co-exist harmoniously.
The RuntimeID itself is documented in runtime/RuntimeID.md
. Since bytecode and native code have different configuration options, a value is calculated for each in configure
and exposed in Config.bytecode_runtime_id
and Config.native_runtime_id
. The choice of a 5-bit encoding means that only lowercase letters are needed, so no two RuntimeID values end up relying on a case-sensitive file system. It is intentional that while the RuntimeID is always written in lowercase, it may be searched case-insensitively (especially on Windows).
This is still WIP
Tests:
This PR allows OCaml runtimes and associated shared libraries to co-exist harmoniously on the same system without having to hide from each other. At present, the following interactions can all fail as a program seeks a runtime:
ld.conf
may be read, if OCAMLLIB
or CAMLLIB
is set to another Standard Librarylibcamlrun_shared.so
or libasmrun_shared.so
could be loaded; it's necessary to ensure that only the correct one appears in LD_PATH
.dllunix.so
) if CAML_LD_LIBRARY_PATH
includes a directory for another runtimeThe first issue is partially dealt with by ensuring that ld.conf
is loaded from both $OCAMLLIB/ld.conf
and the configured default. This change turns the first problem into an instance of the third.
The second two problems are addressed by this PR. The RuntimeID is used to mangle filenames so that shared libraries do not conflict between different configurations and runtimes because they have different names.
The name mangling is also applied ocamlrun
. Historically, the Windows launcher searches for ocamlrun
in PATH
(since Windows OCaml was distributed as a precompiled binary), and therefore suffered this same problem. However, ocamlrun
uses a slightly different RuntimeID, based solely on the release number of OCaml. Bytecode is compiled to be portable, so the intent is that the bytecode "declares" (both in its magic number, and in its "Zinc" RuntimeID) that it runs on a specific version. That specific instance of ocamlrun
will use its bytecode RuntimeID to load shared libraries, which must of course exactly match.
For ocamlrun
, libcamlrun_shared
and libasmrun_shared
, everything is handled by the build system. For bytecode C stub libraries, a little more work is required. It is intended that RuntimeID values are not "exposed" to the user - i.e. that a programmer should never need to care about their existence. ocamlmklib
is therefore augmented with a -suffixed
option, which indicates to ocamlmklib
that the name given in -oc
should be automatically suffixed when creating the shared stubs library. Note that this is only done for the shared library (.so
/.dll
). The static library (.a
/.lib
) is left undecorated: the name mangling is used to solve runtime problems, not compile-time problems. ocamlc
then gains -dllib-suffixed
which receives which similarly indicates to ocamlc
to suffix the supplied library name. These two parameters together mean that the only change required in a user's build system to take advantage of the suffixing is to add -suffixed
to the ocamlmklib
invocation.
The existing bytecode implementation embeds relatively portable names for shared libraries into the bytecode image, in that the shared object extension (.so
vs .dll
) is stripped. In order to keep the bytecode images as portable as possible, -dllib-suffixed
causes the un-suffixed name to be written either to the DLLS
section of the bytecode image, or in the .cma
header. An indicator byte in DLLS
tells ocamlrun
whether to apply suffixing to the name when the bytecode executable itself is started. This means, for example, that an application using Str
compiled on Unix and an application compiled using Str
on Windows produce the same bytecode image.
In passing, ocamlopt
now supports compilation against libasmrun_shared
. This can never have worked since the library had the wrong extension (I think that either libasmrun_shared.so
was only ever compiled as the dual of libcamlrun_shared.so
or it was used with -output-obj
and so linking the final executable was done separately). ocamlopt -runtime-variant _shared
will now produce a native executable requiring libasmrun_shared-<suffix>.so
.
Similarly in passing, the DLLS
and DLPT
section were always written even if they were empty. 16 bytes are now saved by omitting them entirely when there's nothing to put in them.
tools/objinfo
needs updating for the new DLLS
formatruntime/Mangling.md
of the Zinc Runtime ID-no-
options in OCaml or is that just gcc?Bytecode executables are normally a bytecode image with a small launcher prepended. On Unix, this is a shebang, usually directly to the interpreter. On Windows, this is a small executable.
Historically, Windows quietly didn't include the full path to the runtime, allowing the header to search PATH for the runtime.
This PR formalises this behaviour, extending both the shebang launcher and the executable to be capable of performing three searches for the runtime:
At present, Unix does 1 only and Windows does 3 only. A new option --enable-runtime-search
is added to configure
. This option accepts three values:
no
(equivalent to --disable-runtime-search
) maintains the current behaviour, with the difference that Windows will also use a preconfigured location onlyyes
(equivalent to --enable-runtime-search
) prefers the current behaviour, but if the runtime is not found at the preconfigured location, then the same directory is executable is tried, followed by a search of PATHalways
first looks in the same directory as the executable for the runtime, and then searches PATH if necessaryThe behaviour is designed to allow the compiler distribution to be cloned, and --enable-runtime-search=always
removes the last traces of a hard-coded path from the compiler distribution when combined with the other PRs in this series.
The runtime search mode is encoded in camlheader
. The existing format is extended to convey to ocamlc
the required runtime seach mode.
With --disable-runtime-search
, the shebang header is as before, except that the directory must end with a directory separator. A default camlheader
would be:
The executable header has a similar first line, but with the #!
changed to !!
(e.g. !!/usr/local/bin/
) followed on the next line by the binary data for the header.
With --enable-runtime-search[=yes]
, the shebang header is replaced with an entire shell script which begins with shebang for sh
(usually #!/usr/bin/sh
). ocamlc
identifies this case by the lack of a trailing directory separator. In this case, the entire script is copied to the executable, except that line exactly matching r=
is replaced with r=''<runtime-name>'
. The executable header is encoded as for --disable-runtime-search
, but with !#
instead of !!
Finally, with --enable-runtime-search=always
, the shebang header is processed as for --enable-runtime-search=yes
(the file is generated differently). The executable header simply has !!
on the first line, followed by the executable itself.
ocamlc
is therefore able to determine from camlheader
exactly what to write both in terms of header and for the RNTM
section. The same executable header is used regardless of the mode - the format of RNTM
is tweaked, using null characters (which are illegal in filenames on all systems):
RNTM
ends with a null, then RNTM
is the preconfigured location and is the only runtime which should be triedRNTM
begins with a null, then the rest of the RNTM
data is the name of the runtime to search for and is not null-terminated (used for --enable-runtime-search=always
)RNTM
will contain one null in the middle of the string--enable-runtime-search
controls stdlib/camlheader
, and thus all the bytecode tools which will be built and installed. There is also --enable-runtime-search-target
controls stdlib/target_camlheader
, and thus everything which will be produced by ocamlc
after installation.
These settings all have active use-cases:
--disable-runtime-search --disable-runtime-search-target
behaviour (the runtime being required to be in /usr/bin/ocamlrun
).--enable-runtime-search=always --enable-runtime-search-target=yes
, allowing the compiler to be cloned, but producing executables for the user which assume that switches won't move, but which are resilient to that move.--enable-runtime-search-target
^r=
approach to substitutionRNTM
it's much easier - we null terminate RNTM
regardless and then search for the first null)
tools/ocamlsize
is working with the various shebang headers (it needs to be able to parse both the path and the name lines).
) and relative header search (so no need to specify runtime)--enable-runtime-search=always --enable-runtime-search-target=no
- i.e. the compiler is relocatable, but it writes shebang headers which are based on the inferred absolute location of the compiler. That absolutely requires the header to be computed in ocamlc. It also suggests that camlheader-search should probably be earlier in the patch-set - it makes sense that enable-relative alters this branch, rather than the other way around.|––––––-|––––––-|––––––
| | (2 systems) | (2 systems)
|4 runtimes: | 64-bit only | 32 bit only
_ ↓ |
_ ↓ |
|
---|---|---|
int31-only |
0 0 |
1 1 |
static [1] |
0 1 |
0 1 |
↑ |
↑ |
|
shared-only | shared only | |
(1 system) | (1 system) |
Absolute (only one runtime possible)
…
Search (priority + rank others)
…
Always (stable ordering - shared+int63, static+int63/shared+int31, static+int31)
Boot compilwe: not shared + int31-only which results in a fixed zinc ID.
Tests:
This is an interest branch for the ape header version. Notes so far:
cosmo/camlheader.ape
on ubuntu.thorMZ\0\0\0
to MZ='\n
'\nread -r r<<"EOF"
tmpheader.exe
using cl with /stub
after /link
to inject this revised stubcamlheader
with !!
on the first line followed by this stub\nEOF\n
to camlheader
followed by the --enable-relative-search=always
scriptocamlc
to still scan for the ^r=$
line when compiling.str.cma
that the file produced is the sameXXX PR not yet started!
The intention here is to allow the compiler to continue towards reproducibility by adding --enable-relative=strict
. In this mode, the default injected for caml_standard_library_default
is ""
- i.e. if a program use Config
, then it must link with -set-global-string
to set the correct value (which may or may not be relative, depending on the use-case)
Revisiting some of the stuff in enable-relative which embeds caml_standard_library_default
. Issues:
Tests:
This branch is functionally complete, and includes two fixes to the Windows version (``\?\UNCis now correctly translated; there's an off-by-one error in a buffer calculation, and the use of
GetFullPathNameis more reliable than using
Filename.dirnamealthough it's not clear that that's anything further than just fixing the bugs in
Filename.dirname` on Windows)
However, the resulting code is very complicated.
Conclusion: enable-relative will use the syscalls directly. The bug fixes in Unix.realpath
should be transferred to trunk (in OCaml). The error handling stuff is worth doing a PR for, but not terribly urgently. That's possibly more worth looking at as part of the wider error handling stuff.
This PR moves the C parts of realpath
from unix/win32unix to give caml_realpath
. The move simplifies the implementation of Unix.realpath
(this could have been done anyway, but it's now more obvious).
There's also a change to move win32_maperr
from win32unix into the runtime - this makes a small change to the errno used for ERROR_CURRENT_DIRECTORY
which needs investigating, but this feels like a better change - there was a smaller stub function already in win32.c. It's not a necessary part of the change, though - it'd be possible to pick errno
values in caml_realpath
for the error cases instead.
:warning: This branch is no longer required. The fix here is superseded by the fixes in runtime-launch-info
.
This option has been present since 4.08 in configure
but it wasn't propagated to the build system (which allows TARGET_BINDIR
to be specified manually when building cross compilers).
--with-target-bindir
has been broken since 4.08.0 - there's a notional remnant in that TARGET_BINDIR
can be passed directly to make
. Looking on GitHub, no use of -target-bindir
or --with-target-bindir
is actually correct (they're all pointless - it's the same as BINDIR).
This test should return just the prefix (/usr/local/bin/ocamlrun
):
and stdlib/target_camlheader
should be #!/somewhere/ocamlrun
:warning: This branch is no longer required. The fix here is superseded by the fixes in one-camlheader
.
use_shebang is presently in utils/config.fixed.ml which is probably correct, but needs confirming. In particular, that means that boot/ocamlc
must never rely on it (because it will write the wrong thing on Windows). The key thing is that the bootstrap doesn't use -use-runtime
and then attempt to run the resulting binary (which is correct) and that under correct configuration we never emit #!
during the build on Windows!
Eliminates camlheader_ur: it's broken in terms of long shebangs. It's either literally #!
or camlheader (which will be an executable).
Tests: none required
:warning: This branch is no longer required. The fixes here have been folded into one-camlheader
.
-use-runtime
mode with a long shebang, there is a comment in bytecomp/bytelink.ml
which doesn't entirely make sense. It appears that the #!/bin/sh
shebang is written and then the runtime gets written on a stray line following it and also it gets written in RNTM. Confirm that this is three copies and, if so, do we need the extra line? i.e. does tools/ocamlsize
still work?endif
in Makefile
is tagged with comments (n/a)Fixes the bug @dra27 identified in https://github.com/ocaml/ocaml/pull/8622#issuecomment-503605224
This PR moves the generation of all shebangs back into ocamlc
It also always build the executable header for Unix.
Tests: none required
+
notation for explicit relative paths in ld.conf
, but this seems unnecessary: the present behaviour (in OCaml) borders on a bug and can only have been done by hand (I think?!) so the alternative is to treat any relative path as relative to where ld.conf
was loaded from.OPAM_SWITCH_PREFIX
. This is not necessary with the relative searching PR.#load "+file.ml"
, although it's not at all clear why this is a necessary fix.OPAM_SWITCH_PREFIX
trick poses a further problem - anything using compiler-libs, but not installed in the switch, immediately fails with another switch:Original document
ocamlrun
Bytecode executables presently have three mechanisms for invoking the bytecode interpreter:
#!
) header or inserting the location immediately below the camlheader
program. This is the preferred mechanism on Unix.PATH
searching by prefixing the bytecode image with a small C program which searches PATH
for ocamlrun
. This mechanism is primarily used on Windows.-custom
which builds an entire runtime (with any other C support required) and embeds the bytecode into this executable.@@DRA Check that this is true
Note that abs can be implemented both as #!
and also with the camlheader
program.
Each of these has various strengths and weaknesses:
CAML_LD_LIBRARY_PATH
), but is not relocatable (that is to say, the runtime must be in the same place on any system on which the executable is expected to run)sudo
operation) or if the first ocamlrun
found is for the wrong version of OCaml.Both the runtime and the compilers use the location of the standard library.
The runtime uses it to locate ld.conf
which, in conjunction with CAML_LD_LIBRARY_PATH
, is used to search for dynamically linked C stub libraries.
The compilers use it as the final search directory for object files.
In each case, the value of the standard library location given to configure
is embedded and is the default. Without manual tweaking, this location is presently absolute.
The default may be overridden by the OCAMLLIB
environment variable. For the compilers, it can also be completely ignored by using the -nostdlib
parameter.
@@DRA check interaction with -I
- this will be clearer with a formal list of where the search path can presently come from (same for CAML_LD_LIBRARY_PATH
)
The runtime forms a search path from directories given in CAML_LD_LIBRARY_PATH
and ld.conf
for locating .so
(or .dll
on Windows) files containing C primitives needed by the bytecode image.
Changes to the compiler and runtime should seek to unify the following goals:
PATH
without breaking 1). A corollary of this is that it should be possible to install multiple runtimes.OCAMLLIB
should not cause a compiler to cease working just because the library it points to is for a different version of OCaml.CAML_LD_LIBRARY_PATH
should not cause one version of ocamlrun
to load C primitives intended for a different version of the runtime.Every build of OCaml will include a new MD5 magic number formed by the checksum of the concatenation of several configuration parameters:
This does not include any paths, so it is intended that the magic numbers would be the same on any system for a given configuration of an identical version of OCaml (indeed, a database of these may be included in OCaml to allow better error reporting, similar to changes proposed for ocamlobjinfo
(@@DRA reference?)
These magic numbers would be displayed in the output of ocamlc -config
RuntimeMD5
is the lowercase first 8 characters of this checksum.
OCAMLLIB
If OCAMLLIB-
RuntimeMD5
is defined, then OCAMLLIB
is ignored. Note that the special case of defined and empty allows for ignoring OCAMLLIB
without actually overriding the configured value.
If stdlib.cma
exists in OCAMLLIB
but it does not match the cma magic number of the compiler then it is ignored (silently by the runtime and with a non-fatal warning by the compilers - ocamlopt
obviously checks stdlib.cmxa
)
Note that any system reliably wishing to override the standard library has always been able to add a final -I
to any compiler (or ocamlrun
invocation), so this ocamlrun
mechanism is not really intended for general use, but more for ‘completeness’ in the handling of OCAMLLIB
.
CAML_LD_LIBRARY_PATH
If CAML_LD_LIBRARY_PATH-
RuntimeMD5
is defined, then it used before CAML_LD_LIBRARY_PATH
.
Note that as for OCAMLLIB
, it has always been possible to pass -I
to ocamlrun
.
ocamlrun
ocamlrun
is installed as ocamlrun-
RuntimeMD5
For legacy support, ocamlrun
would continue to exist as a symlink to the new name.
camlheader
camlheader
(and its variants) now take the form:
header.c
should be updated with the same logic (i.e. it should initially attempt the absolute path). Note that the header is already a sh
-script for shebangs which would be too long.
ocamlmklib
will have a new parameter --suffixed
which will cause .so
/.dll
files to be suffixed with -
RuntimeMD5
.
This approach will instantly fix the location problems with CAML_LD_LIBRARY_PATH
, but it requires opt-in, since it involves build system support. ocamlbuild
and dune
would aim to support this mode of operation from release time. TODO Not sure that either tool actually invokes ocamlmklib
directly, so this would involve patching them to follow the same naming standard, which for ocamlbuild
may be too hard (Dune being both opinionated, and also generating .install
and META
files automatically, should be more able to adapt)
The intention would be to make --suffixed
the default with the following schedule:
ocamlmklib
displays a non-fatal warning if invoked without --suffixed
and a dynamic stub library is generated--suffixed
the default behaviourThere should not be a --unsuffixed
flag - all build systems should convert to this new standard, given that it’s only a build system alteration, and does not materially affect any code.
--enable-relative-libdir
configuration optionIn this mode, the relative location from BINDIR
to LIBDIR
is embedded as the default location of the standard library. Note that while ocamlc -config
should display this as a new value, the existing display of the standard library location should remain an absolute path (i.e. ocamlc -where
will return a computed absolute path based on the location of ocamlc
). If the binary is unable to determine where it was invoked from, then an error will be reported, as if the standard library could not found.
Everything from here on is archived (i.e. not useful either for documentation or tasks)
All dealt with 5-Sep-2022
Stubs bitness potentially exposes a problem: the runtime must attempt to load a .so which matches itself
This requires the ability to transform the DLL name
dllunix-bbbp-machine.so
dllunix-bbbp-machine.so
Runtime ID and Machine ID
Bytecodeder executables do not specify Machine ID: that's the point
So there is ocamlrun at a given version -> don't care when executing that machine, only that everything loaded needs to match
In native mode, this would only matter for the shared runtime
This requires a slight tweak to the .cma format to add a list of suffixed DLLs -> will have to do something in the DLLS section itself to indicate that they're loaded this way.
This is particularly useful on Windows, where it allows mingw and msvc to coexist in completely separate harmony. It's also potentially useful on other systems which are capable of loading in two different modes (macOS?)
Another important aspect, which might affect ocamlmklib -suffixed
is whether the runtime ID used for bytecode stub libraries has 64-bit support or not. I think it's probably fine that this doesn't specify, or that it's always set to 31bit. I think it's that - the C library has hard-coded runtime support, so it's more that the machine must match up. Yes, this is definitely it: ocamlmklib should be clearing the 63-bit ID (or using a specific computed stubs runtime ID)
This is worth noting in runtime ID: a runtime ID always has the same set of bits, but depending on the context, some of them are not permitted to be set. Set the 63/31 specifier allows bytecode to locate a runtime which can run it, but it obviously cannot be used to find the correct machine for loading stub libraries - that must clearly match the given configuration and machine type.
All dealt with 5-Sep-2022
Clarification for Runtime ID and Machine ID
Machine ID is just the host triplet - e.g. x86_64-pc-windows … it's not the shortest, but it's obvious that x86_64-pc-windows-ocamlrun-bbbp is the msvc64 version of ocamlrun-bbbp
In other instances, the Runtime ID describes a target configuration, but the context in which its being used will then necessarily zero out some other bits:
For ocamlrun, we're selecting the runtime based on the properties of the bytecode in question:
A similar combination applies to native code:
There is then an additional context for stub libraries:
So I think what's missing for the stub libraries is that we should be modifying the .cma format to specify that stub DLLs are being loaded which need transforming - but we'd still write
dllunix. What alters is that we wrote "dllunix" and then rely on the runtime to add the .so and the appropriate host triplet and runtime ID.
Note that when the runtime is working out which DLLS to load, it would be doing this based on configured runtime ID… i.e. it's not about the basename. That involves embedding the runtime ID in the runtime as well, which I don't think has been.
All dealt with 5-Sep-2022
Looks like the i386-static needs some work in one of the earlier branches.
Target here:
At this point the target is to set up 4.11.2, 4.12.1, 4.12.1-relative and 4.13.0 and 4.13.0-relative
switches to demonstrate the breakages. 4.11.2 will be the actual release - the 4.12 and 4.13 branches should be simulated by
changing VERSION
on their branches.
These could do with flipping around - i.e. to be 63bit-required
and shared
↩︎