Skip to content

Commit fa4f789

Browse files
committed
Add section on interpreter paths
Draft section based on notes from Brett Viren on the use case of scripts with shebang lines. Use, as per notes, Python as the main example.
1 parent 0d88b14 commit fa4f789

File tree

1 file changed

+59
-5
lines changed

1 file changed

+59
-5
lines changed

RelocatableSoftware/README.md

Lines changed: 59 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -112,8 +112,8 @@ compatibility are helpful for simplifying binary packaging and deployment.
112112
- Ensures software can be used by as wide a range of upstream clients as possible
113113
- Nevertheless, can be tricky for [languages like C++](https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B%2B)
114114
- There [are tools to help check compatibility](https://fedoraproject.org/wiki/How_to_check_for_ABI_changes_in_a_package) at least for ELF, but needs a more thorough survey.
115-
- Program for multiple versions of any dependencies (assumes they have good API/ABI versioning!!)
116-
- Dependecies should provide a versioning header [as per HSF (draft) guidelines](https://github.com/HEP-SF/documents/blob/master/HSF-TN/draft-2016-PROJ/draft-HSF-TN-2016-PROJ.md)
115+
- Program for compatibility with multiple versions of any dependencies
116+
- Dependencies should provide a versioning header [as per HSF (draft) guidelines](https://github.com/HEP-SF/documents/blob/master/HSF-TN/draft-2016-PROJ/draft-HSF-TN-2016-PROJ.md)
117117
- Hide dependencies as implementation details as far as possible.
118118
- Consider versioned symbols and/or inlined namespaces?
119119
- Building binaries
@@ -264,6 +264,49 @@ hard-coding or use of standard environment variables. On UNIX, these could inclu
264264
- `/var`
265265
- `/tmp` or `TMPDIR`
266266
267+
(Re)Locating the Interpreter for Programs
268+
=========================================
269+
Programs implemented using intepreted languages such as Python are usually written as scripts using (on Unix platforms)
270+
a ["shebang"](https://en.wikipedia.org/wiki/Shebang_(Unix)) on the first line to define the interpreter program to pass the remainder of the script to. For example, a Python "hello world" program might be written as
271+
272+
```Python
273+
#!/usr/bin/python
274+
275+
print("hello world")
276+
```
277+
278+
This hard codes the system interpreter into the program and whilst this program is relocatable (assuming a valid system
279+
Python install), it cannot be used with any other interpreter. Typical HEP software stacks install, and require use of,
280+
their own interpreters, whose paths may also end up hard coded into scripts:
281+
282+
```Python
283+
#!/custom/stack/root/python/2.7/bin/python
284+
285+
print("hello world")
286+
```
287+
288+
The resulting stack is then not relocatable as the interpreter path will not exist after relocation.
289+
290+
Rather than hard coding system or custom interpreter paths, script authors should prefer the use of the
291+
[`env`](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/env.html) program as the shebang, e.g.
292+
293+
```Python
294+
#!/usr/bin/env python
295+
296+
print("hello world")
297+
```
298+
299+
Use of `env` makes the program relocatble, but defers location of the interpreter to the `PATH` environment variable,
300+
and consequently the configuration management system for the software stack. Whilst package authors should prefer
301+
usage of the `env` pattern, software stack managers can also consider rewriting the shebang line during install
302+
and on any relocation to the absolute path of the required interpreter. As it is plain text, simple regular expression
303+
replacement can be used, but the chosen packaging system must support this, and care must be taken
304+
if the resultant stack is to be deployed over network file systems (and hence unknown mount points).
305+
306+
**TODO?** Binaries *also* have an interpreter (on Linux, `ld-linux.so`, On macOS, `dyld`). These are also hardcoded,
307+
though can be changed with, e.g., `patchelf` for ELF binaries.
308+
309+
267310
(Re)Locating Dynamic Libraries
268311
==============================
269312
A non-trivial package will usually be partioned into a main
@@ -315,8 +358,15 @@ dependencies. At install time, rpaths are usually stripped, unless
315358
configured otherwise.
316359

317360

318-
Scripting/Development Support Tools
319-
===================================
361+
(Re)Locating Language Modules
362+
=============================
363+
**TODO** How to handle module lookup, e.g. `PYTHONPATH` for Python (other languages?). Things that package authors can do.
364+
Things that the packaging system should do (inc. any packaging system provided by the language, e.g. `pip`, `virtualenv`).
365+
Things best left to configuration management.
366+
367+
368+
Development Tools
369+
=================
320370
CMake
321371
-----
322372
To support use of a Project by a CMake based client project, scripts for
@@ -382,7 +432,6 @@ Relocatability with External Dependencies
382432
What happens to relocatability when we have two packages with a dependency?
383433
For example `Foo` and `Bar`, with `Foo` linking to `libbar` from `Bar`.
384434

385-
386435
1. Can move `Foo` if its `RPATH` contains absolute path to `libbar`.
387436
2. Cannot move `Bar` without updating `Foo`'s RPATH or using/updating dynamic
388437
loader paths
@@ -432,6 +481,11 @@ paths, so can only really be handled by a package manager system and would
432481
not work for deploying software over network file systems where final
433482
mount points are not guaranteed to be identical.
434483

484+
Interpreter Paths
485+
-----------------
486+
Shebangs are plain text, so are straightforward to patch directly using regular expression
487+
find/replace directly, or via tooling at build or install time.
488+
435489
Library RPATHs
436490
--------------
437491
1. RPATHs can be changed at install time by the packaging system/tools (`patchelf`, `otool`, `install_name_tool` etc)

0 commit comments

Comments
 (0)