The Rexx Highlighter


The Rexx Highlighter

36th International Rexx Language Symposium

The Wirtschaftsuniversität Vienna, Austria, May 4-7 2025

Josep Maria Blasco

EPBCN – ESPACIO PSICOANALITICO DE BARCELONA

Balmes, 32, 2º 1ª — 08007 Barcelona, Spain

May the 6th, 2025

Notice

This whole document1 is an experiment in CSS printing. It comfortably mixes normal text and programs beautified by the heavy prettyprinting produced by the Rexx Highlighter, and it can be viewed both as a standard web page and as a slide show.

If you are viewing this file as a PDF, chances are good that you are looking at it as a slide show. If you are viewing it as a web page, the suggested print settings to produce a PDF containing a slide show are: no headers or footers, and background images active. This is true for the Chrome browser at the time of this writing (Jan-May 2025).

The default style for Rexx fenced code blocks is dark. You can choose the light style by adding a style=light query string to the url of this document.

The Rexx Highlighter

Introduction Tools
Architecture Further work
Features Acknowledgements
The three highlighters

Introduction

Introduction

The Rexx Highlighter is a child project of the Rexx Parser, and was patient and lovingly written by Josep Maria Blasco in the last half of 2024 and the first half of 2025. A working knowledge of the functionality of the Rexx Parser is required to be able to follow this presentation.2

Developed around a common code base, the Highlighter currently includes output drivers for three modes: HTML, ANSI terminals emulators, and (Lua)LaTeX.

~~~rexx
Say "Done!" -- Inform the user
~~~
Say "Done!" -- Inform the user

The figure above shows the HTML highlighter in action, or, to be more precise, the effect of a Rexx fenced code block in a Markdown file.

Architecture

The HTML class assignment system Tools
Coarse- vs. fine-grained class assignments Future work
Specifying a style with CSS Acknowledgements
The three highlighters

The HTML class assignment system

Irrespective of whether we are going to produce output for ANSI terminals, HTML, or LaTeX, the Highlighter assigns one or more HTML class names to every element category and subcategory. This is done in the HTMLClasses routine.

In many cases, an element category is mapped directly to a single HTML class. For example, .EL.SHEBANG, the category that identifies shebang lines, is assigned the class "shb".

In many other cases, an element category is mapped to two HTML classes: the first is a generic one, which is the same for a whole set of elements, and the second one is a more specialized one, which uniquely identifies the category. For example, .EL.OP.MULTIPLICATION, which identifies the "*" operator, is assigned a generic class of "op" (for "operator"), and also a specialized class of "mul".

These class names are then prefixed with a customizable prefix (by default, "rx-" is used) to avoid conflicts with classes defined by other programs or frameworks. In our example, the "*" operator would be assigned classes "rx-op" and "rx-mul".

Coarse- vs. fine-grained class assignments

The element categories that may be assigned two classes instead of only one class belong to one the following sets: assignments ("asg" + a specific class), operators ("op" + a specific class), special characters and special character sequences ("spe" + a specific class) and taken constants ("const" + a specific class).

Depending on the supplied program options, each run of the Highlighter may assign one or both of the classes to the elements that have a category belonging to each of these sets.

For example, when using rexx fenced code blocks, the operator attribute may have a value of "group" (assign the first class only), "detail" (assign the second class only), or "full" (assign both classes).

This mechanism allows to choose, for every of the above sets of elements, between very fine-grained and more coarse-grained class assignments on a run-by-run basis.

~~~rexx {assignment=detail special=group operator=full constant=detail classprefix="P"}

Specifying a style with CSS

Once the HTMLClasses routine has mapped the element categories and subcategories to their respective HTML classes, a CSS style file is used, to transform these classes into effective highlighing attributes.

The Highlighter is distributed with two sample style files, rexx-light.css and rexx-dark.css, although you can of course create your own.

Say "Done!" -- Inform the user
Say "Done!" -- Inform the user

The choice of style file is determined by the style= attribute in Rexx fenced code blocks, by the --style= option of the highlight utility, and by similar mechanisms when using other tools. Specifying style=name will select the rexx-name.css style file.

CSS files are fetched by the browser when using the HTML highlighter, and are parsed by the Highlighter when other output formats are requested. In the latter case, only a very limited subset of the CSS specification is recognized.

Style patches

The style patch system allows one-time, simple and easy patching of a CSS style.

Say "Done!" -- Inform the user

Standard highlighting

Say "Done!" -- Inform the user

With patch="all comments yellow"

Inline style patches can be specified by using the patch= attribute of a Rexx fenced code block, or the --patch option of the Highlight utility. This mechanism is suitable for very small patches, which fit on a single line. For example, the patched code block above has been created with

~~~rexx {patch="all comments yellow"}

Larger patches can be stored in a separate file (see the patchfile and --patchfile= options). The parse method of the Highlighter class also accepts an optional style patch argument.

Features

Variable symbol highlighting (1/2)

Depending on the context, an element with the morphology of a variable symbol may play the role of a local variable, of an instance variable, of an instruction or directive keyword, or be taken as a constant, like in the case of labels or method names.

The Rexx Parser assigns different element categories to each symbol, depending on its role (and, in the case of taken constants, it also assigns different subcategories), and these differences are passed to the various highlighter versions, so that they can highlight each case appropriately.

Variable symbol highlighting (2/2)

::Method then                           -- A method name
  Expose then expose                    -- Instance variables
If:                                     -- "If" is a label here
  If (then = 2)                         -- "then" is an instance variable
    Then else = 3                       -- "Then" keyword, "else" variable
    Else Do Label 4                     -- Three keywords
      end = 4                           -- A variable called "end"
      Signal Then                       -- Signaling the "then" label
    End 4                               -- A keyword
  self~then                             -- "then" as a method name
Then:                                   -- "Then" is a label
  value = Else()                        -- A ::ROUTINE call

::Routine Else                          -- A ::ROUTINE name
  Use Arg while                         -- Assign to the "while" variable
  Loop Label Forever forever = 1 By 1 - -- "forever", label and control variable
       While (while > forever)          -- Two uses of "while"
    If forever > 16 Then Return while   -- "forever" and "while" are variables
  End Forever                           -- "Forever" as a label
  Return forever                        -- "forever" as a variable

Compound variable highlighting (1/2)

Compound variables are special, in the sense that they have two simultaneous aspects: they are, at the same time, variables, and indexed stem references. The Rexx Highlighter honours this duality by returning compound variables as single elements that include a number of sub-parts; you can decide which of the two aspects of a compound variable will determine the highlighting mode.

  -- As an indexed reference
  Say Matrix.1.2A.j..
  -- As a whole
  Say Matrix.1.2A.j..

You can select the compound variable highlighting mode using the compound=true|false attribute on the rexx fenced code block marker. The default is to highlight all the components individually.

```rexx {compound=true}
(rexx code goes here)
```
```rexx {compound=false}
(rexx code goes here)
```

Compound variable highlighting (2/2)

When highlighted as a single element, a compound variable will have a class of .EL.COMPOUND_VARIABLE or, if the variable is exposed (i.e., it is an instance variable), of .EL.EXPOSED_COMPOUND_VARIABLE.

::Method myMethod
  Expose var stem.
  local = var + 1
  Say stem.12..2E.var.local

When taking sub-parts into account (which is the default), different highlighting attributes will be used for the stem name (a .EL.STEM_VARIABLE or a .EL.EXPOSED_STEM_VARIABLE) and for all the components of its tail. The first dot in a compound variable is part of the stem name. The rest of the symbol, the tail, is an arbitrary sequence of: variables (either local, .EL.SIMPLE_VARIABLE, or exposed, .EL.EXPOSED_SIMPLE_VARIABLE); signless integers (.EL.INTEGER_NUMBER); pure dotless constant symbols (.EL.SYMBOL_LITERAL), and tail separators dots (.EL.TAIL_SEPARATOR).

Function and subroutine calls

The Rexx Highlighter is able to differentiate between internal, built-in, local ::ROUTINE, namespaced ::ROUTINE and external function and subroutine calls.

  len =  Length(  var )                 -- Call a BIF as a function
  Call   Length   var                   -- Call a BIF as a subroutine
  b   =  Verify(  var )                 -- Internal routine, as a function
  Call   Verify   var                   -- Internal routine, as a subroutine
  Call  "VERIFY"  var                   -- BIF subroutine call
  Call  "Verify"  var                   -- External subroutine
  Call   meaningOfLife                  -- Package-local ::ROUTINE, function
  Call   Name:myRoutine                 -- External ::ROUTINE as a subroutine
  Call   External                       -- External subroutine
  Signal On Syntax Name  Verify         -- A label
  Call   On Error  Name  Length         -- A BIF
  Call   On Error  Name "Length"        -- External

"VERIFY": Return .True                  -- An internal routine

::Routine meaningOfLife; Return 42      -- A locally defined ::ROUTINE

Taken constant highlighting

In many places of the Rexx syntax, a "taken constant" is required. Generally speaking, a "taken constant" is a string or a symbol which is taken as a constant, although some few contexts may impose additional limitations to the acceptable elements.

::Class    myClass  Public              -- A class name
::Method   myMethod Class               -- A method name
  Return   myRoutine(12)                -- Calling a ::Routine
::Routine  myRoutine                    -- A routine name
  Return  .Resources[myResource][2]|| -
   .myClass~myMethod                    -- A message term
::Resource myResource End "The end"     -- A resource
A resource line
Another resource line
This is line number 3
The end is near (additionally, "is near" and what follows it are ignored)

The Highlighter assigns different highlighting classes to every subcategory of taken constants. This allows to specify different highlighting choices for labels, method, routine and resource names, etc.

Documentation comments

Documentation comments or doc-comments are a special form of comment, similar to JavaDoc comments.

Doc-comments can be placed before a directive, or before a callable label.

Standard doc-comments start with "/** and end with "*/".

/******************************************************************************/
/* This is a set of three classic comments, forming a box.                    */
/******************************************************************************/

/**                                                                             
 *  This is a doc-comment. It Starts with "/**" and it ends with "*/", and it   
 *  is placed immediately before a directive. A style patch has been applied    
 *  to highlight the doc-comment in reverse fuchsia, and a pad of 80 has been   
 *  specified as a fenced code block attribute, to embellish the display.       
 */                                                                             
::Routine R

Markdown doc-comments

Markdown doc-comments are contiguous sequences of line comments starting with exactly three dashes (that is, with "---" but not with "----").

--------------------------------------------------------------------------------
-- This is a set of three line comments, forming a box.                       --
--------------------------------------------------------------------------------

---                                                                             
--- This is a markdown doc-comment. It Starts with "---", and it is placed      
--- immediately before a callable label. A style patch has been applied         
--- to highlight the doc-comment in yellow over dark blue, and a pad            
--- of 80 has been specified as a fenced code block attribute,                  
--- to embelish the display.                                                    
---                                                                             
Proc: Procedure Expose a b c

Returned doc-comment elements

Documentation comments are always returned as a single element.

  • In the case of standard doc-comments, the Rexx Parser adds to the element all the whitespace which can be found, if any, in the first comment line, before the first "/" character, and in the last line, after the last "/" character. The returned element has an element category of .EL.DOC_COMMENT.
  • In the case of Markdown doc-comments, the Rexx Parser combines all the line comments, and preceding whitespace, if present, into a single element. The returned element has an element category of .EL.DOC_COMMENT_MARKDOWN.

The three highlighters

The HTML highlighter (1/2)

The HTML highlighter wraps every non-inserted element in a <span> tag that will contain its corresponding HTML class or classes. For example, when operator=full is in effect, a multiplication operator "*" might be transformed into:

<span class="rx-op rx-mul">*</span>

These <span> tags are in turn collected in lines, which are enclosed in <code> tags. If the user has requested line numbering, the value of the lineno attribute will be added to the printed line by using a ::before pseudo-element. Indentation has been added to the following figure to improve legibility.

<code lineno="n">
  <span tag 1>
  ...
  <span tag n>
</code>

The HTML highlighter (2/2)

Lines are then wrapped inside a <pre> tag, and the <pre> block is itself enclosed in a <div>:

<div class="highlight-rexx-dark"><pre>
  (code lines)
</pre></code>

The <div> tag has a class of highlight-rexx-style, where style is the style specified in the corresponding program options. Style patches are handled by adding a random id attribute to the <div> tag, which will be met by a corresponding inline <style>:

<div id="rx63e218035994" class="highlight-rexx-dark">
  <style>
    #rx63e218035994 .rx-var {font-weight:bold; color:#000000; background-color:#cccc00; }
  </style>
  <pre>
    ...

The ANSI highlighter (1/2)

The ANSI Highlighter uses ANSI SGR (Select Graphic Rendition) codes to highlight a Rexx program. Only a subset of all the ANSI SGR codes are used, namely (ESC denotes character "1B"X):

Code Meaning
ESC[0m All attributes off
ESC[1m Bold
ESC[3m Italic
ESC[4m Underline
ESC[38;2;r;g;bm   Set foreground colour to RGB(r,g,b)
ESC[48;2;r;g;bm   Set background colour to RGB(r,g,b)


Different SGR codes can be combined in a single sequence by separating them using semicolons. For example, ESC[1;3;38;2;255;0;0m means "choose a bold font, choose an italic font, and set the foreground colour to pure red".

The ANSI highlighter (2/2)

Terminal applications and emulators differ wildly in their support of ANSI SGR codes. The default configuration of the Windows terminal, for example, implements boldface by substituting colours with a lighter version of themselves, and that, only for basic colours (i.e., codes 30-37, 40-47, 90-97 and 100-107); the same is true of the default Ubuntu terminal under WSL.

Additionally, support for basic colours is abundantly erratic: see, as an example, the colour chart that appears in the Wikipedia. In an attempt to minimize this problem, all the colours generated by the ANSI highlighter are emitted using the 38 and 48 codes, i.e., they are pure RGB colours. CSS named colours3 are also translated to their RGB definitions: you can be sure that foreground blue will be generated as #0000ff, that is, as ESC[38;2;0;0;255m. How this displays in your emulator may vary.

The LaTeX highlighter (1/2)

The current version of the LuaLaTeX highlighter is based on the following packages:

  • xcolor, for basic colour support.
  • lua-ul, for proper background highlighting.
  • listings, to hold the listings themselves.
  • tcolorbox, to fix an annoying problem with extra space appearing between listing lines in some circumstances.

The default mono font does not support boldface. If you need boldface in your highlighting schemas (the default ones supplied with the Rexx Parser use boldface), you should use a different mono font. The Hack font4, for example, seems to produce decent results.

The LuaLaTeX highlighting framework defines \textexclamup as !: the exclamation mark is used as an escape character, and the command is needed when the exclamation appears in a Rexx program.

The LaTeX highlighter (2/2)

You will most probably need to adjust the size of your font, depending on the page size, and the number of characters you want displayed on the largest line. The following style definition, for example, allows for exactly 80 characters per line, when using the Hack font and an A4 paper size:

\lstdefinestyle{rexx}
{
    basicstyle=\fontsize{8.0pt}{11.0pt}\selectfont\color{white}\ttfamily
}

Notice

The LaTeX highlighter should be considered experimental. It would need some love from a LaTeX expert. Any volunteers?

Tools

Style patch syntax (1/3)

Style patches follow a very simple syntax:

  • A style patch is an ordered set of lines, separated by semicolons, or line-end separators, or both.
  • Leading and trailing blanks are removed.
  • Null lines and comments (i.e., lines starting with "--") are ignored.
  • Once comments are discarded, dashes "-" are replaced by blanks " " (may come useful when specifying patches in a Linux shell).

Highlighting patches for element categories, category sets, and taken constant names:

Element category     highlighting
All     set          highlighting
Name    constantName highlighting

Style patch syntax (2/3)

Element categories may omit the "EL." prefix; sets may omit the "ALL." prefix; and constant names may omit the ".NAME" suffix.

Highlighting is a blank-separated sequence of case-insensitive elements, selected between

  • Foreground colours, in the format #rgb, #rrggbb, or one of the 147 standard CSS named colours.
  • Foreground/background colour combinations, in the format fg/bg (with no blanks), where fg and bg are either #rgb, #rrggbb, or one of the 147 CSS named colours.
  • One of the single words bold, italic or underline, optionally preceeded by the single word no.

Style patch syntax (3/3)

A simple patch:

-- Patch simple variable elements to display as bold black over 75% yellow
element EL.SIMPLE_VARIABLE #000/#cc0 bold
-- Patch method names to display as black over 75% magenta
name    METHOD.NAME        #000/#c0c

The same patch, abbreviated:

E SIMPLE_VARIABLE #000/#cc0 bold; N METHOD #000/#c0c

The patch in action:

::Method methodName
  len = Length("String")
  n   = Pos("x", value)

The command line: highlight.rex

The highlight utility processes a file and highlights it according to a set of options. If the file has a .md or a .html extension, all Rexx fenced code blocks are processed; in all other cases, the whole file is highlighted.

Except for .html files, the default highlighting mode is ANSI when called from the command line, and HTML otherwise.

Options:

  • choose the highlighting schema;
  • specify an optional patch, or a patch file;
  • determine whether lines are numbered, and, if yes, what is the number of the first line printed;
  • enable support for the TUTOR-flavoured Unicode dialect;
  • specify that all lines will have a minumum line width (ANSI only).

Fenced code blocks: FencedCode.cls

The FencedCode routine is a Rexx fenced code block preprocessor. It processes code blocks enclosed between ~~~rexx and ~~~ markers (starting on column 1, optional attributes enclosed in braces are admitted in the start marker).

It is language-agnostic, that is, it happily processes HTML files, but also Markdown files, etc., as long as they contain the required markers.

It is highly customizable, using an extensive set of options.

It can be integrated into any application by calling it directly; it will automatically be invoked by default by the highlight utility when the extension is html or md; it also can be easily integrated in your CGI infrastructure.

Apache httpd CGI integration

The Rexx Parser package includes a sample Markdown CGI processor which automatically invokes the FencedCode routine, and then runs Pandoc5 to dynamically produce HTML code. The preprocessor (which should be customized and adapted to fit your needs, should you want to use it) creates Bootstrap 3.0 code,6 and includes optional exits for headers and footers, sidebars, etc.

The document you are reading has been served by this experimental CGI processor.

The processor installs as an Apache httpd action handler, which is invoked by the httpd daemon when certain files (normally, files with a .md extension) are served:

Action RexxCGIMarkdown /some/path/CGI.markdown.rex
# ...
<Files *.md>
  SetHandler RexxCGIMarkdown
</Files>

Further work

Further work

  • Based on experimentation and user feedback, collect a set of "things that work" and document them. For example,
    • which emulators display ANSI highlighting correctly, or
    • how to configure certain emulators so that they work correctly;
    • which fonts are suitable for certain forms of highlighting (if possible, for all forms of highlighting);
    • etc.
  • Improve the LaTeX highlighter.
    • Possible integrations with packages other than listings (maybe minted?).
  • ...

Acknowledgements

Acknowledgements

Jean Louis Faucher has integrated TUTOR into ooRexxShell, and Rony Flatscher has included TUTOR and the Rexx Parser in the net-oo-rexx distribution.

Some preliminary versions of the Rexx Parser and its accompanying subproject, the Rexx Highlighter, have been tested by different members of the RexxLA community, including, but not limited to Gilbert Barmwater, Jean Louis Faucher, Rony Flatscher, Ruurd-Jan Idenburg, René Jansen and Till Winkler; I want to thank them all for their observations and enhancement proposals.

I also want to thank my colleagues at EPBCN, Laura Blanco, Silvina Fernández, Mar Martín, David Palau, Olga Palomino and Amalia Prat, who have read several drafts of this presentation and helped to improve it with their comments and suggestions.

Questions?

References

References

Concepts explained elsewhere, in other documents:

The highlight utility: https://rexx.epbcn.com/rexx-parser/doc/utilities/highlight/

Documentation about the Rexx Highlighter can be found at: