A ThML to PBB converter - Download
THM2BBB.EXE is a simple command line program that converts a ThML file into a collection of PBB ready HTML files.
ThML is a biblical markup language developed for the Christian Classics Ethereal Library at Calvin College in Grand Rapids, Michigan. A description of ThML can be found by following this link.
The output for this program is placed into a new directory that is created inside the working directory called 'bookdir'. The output consists of two CSS files (one generic and one containing style information [if any exists] from the ThML file) and a batch of HTML files that are PBB ready. In most cases, after processing, the user will need to do a certain amount of adjusting (e.g. fixing some headers, combining or dividing one or two files, designing a front page, etc.) but, for the most part, the presentation can be easily adjusted simply by editing the CSS files. How much adjustment that will need to be done depends largely on the CCEL design (out of control of the user) and the options chosen to convert the file (very much within the control of the user).
Scriptural references, scriptural milestone tags, footnotes and page numbers are all translated into PBB tags if the corresponding ThML tags exist in the ThML file. Footnotes are translated into Libronix text fields (hover popups) that are tagged around red superscript footnote numbers. If a bible reference is contained inside a ThML note then that reference is also added to the superscript footnote number. Greek and Hebrew text is tagged as such but phrases are not split up into individual words. A Greek or Hebrew word tagged this way will keylink normally in LIbronix as long as it is contained in one of the users lexicons but phrases won't keylink (of course) so, if any book designer wanted to optimize this functionality, the Greek and Hebrew phrases would need to be manually split into separate Libronix tags.
The program provides a method whereby the user may convert certain CCEL paragraph tag classes to HTML headers to help facilitate a good distribution of headers throughout the PBB documents (for construction of the contents pane tree).
Please Note: This tool inserts a credit to the tool and to CCEL at the very end of the pre compiled document (the bottom of the last HTML file that it produces). If you are just making books for yourself then go ahead and do what you like. However, if you are using this tool to create books for distribution to the public then please make sure that these credits are included in the finished resource. If, for some reason, you remove the last HTML file from your final pre compiled version then please make sure you copy the credit notes somewhere into the finished document. The end of the last page is good enough.
Executing the command 'thm2pbb.exe' without any parameters will produce the following output:
Usage: thm2pbb <options> [<THML filename> or <URL of a network THML file>]Converts a CCEL THML file to PBB ready filesOptions are:-d0, d1, d2 .. d4 Set the HTML file division level (default = d1)-h <header file> Header tag file (h1, h2, etc.)-ka Include THML file table and list attributes.-kc Include THML file class values.-kf Keep THML font tags and attributes.-pn<number> Insert page numbers (THM page numbers are ignored)-r Replace THML header tags with p tags and header classes-s<number> Prefix number of the first HTML file (default = 0)-t <title string> Document title (defaults to THM file name)-vc Display THML classes in Internet Explorer
The ThML file must be the last argument in the list. It may be a local file on disk or a remote file accessed by its network URL (e.g. http://www.ccel.org/ccel/macdonald/elginbrod.thm).
Put the file 'thm2pbb.exe' into a folder that is included in your system command path or copy it to the folder where you will be doing the conversion. Then open a command prompt window and navigate to your working folder. Copy the THML file to this folder or run the command on a remote file on the CCEL site.
Sample commands:
> thm2pbb -ka -kc -kf -pn -t "Robinson Crusoe" http://www.ccel.org/ccel/defoe/crusoe.thm Keep THML Table and List attributes. Keep THML paragraph classes. Keep THML font tags. Include page numbers beginning at page 1. The HTML title string is "Robinson Crusoe". XML document loaded successfully > thm2pbb -wb calcom36.thm
Show THML classes. All other options will be ignored.
XML document loaded successfully
> thm2pbb -h headers -d2 -r -s14 -pn304 calcom37.thm
Class/Header substitutions are in "h".
The file division level is set at 2.
Replace all THML headers with paragraph tags.
The first numbered HTML file prefix is 014.
Include page numbers beginning at page 304.
XML document loaded successfully
Option |
Function |
Description |
d0, d1, d2 etc. |
File division level |
THML divides books by hierarchical 'div' tags. For instance, in some particular resource, 'div1' tags could define books, div2 tags chapters within books, div3 tags sections within chapters and so on. The division level that the user chooses will determine the size and quantity of the PBB HTML files that are produced. In the above example d1 would produce one file per book, d2 one file per chapter, etc. Level d0 will generate one large html file. This level is handy for smaller books. Level d1 is the default. Any number higher than 4 will be ignored.
|
-h <header file> |
Add headers |
HTML header tags are translated directly by thm2pbb. However not all ThML files use HTML header tags. Some use only paragraph tags with CSS classes for formatting. Others use a mix of both but don't have enough header tags (or have them in the optimal places for a PBB contents pane tree. The -h option allows the user to change THML CSS paragraph classes into header tags. See the header file description below.
|
-ka |
Keep ThML attributes |
Any table and list tag attributes will be copied to the HTML output table and list tags.
|
-kc |
Keep ThML classes |
The paragraph tag CSS classes are copied into the HTML output. If the ThML file contains style sheet info it will be copied into a CSS file called 'ccel.css' in the output folder. All HTML output files will contain a link tag linking the ccel.css file regardless of whether the -kc flag is set. If the -kc, -ka and -kf flags are all set then the output should resemble the CCEL web pages almost exactly (except for the headers).
|
-kf |
Keep ThML font tags |
All ThML font tags will be copied to the HTML output files. Please note that fonts are more easily manipulated with the style sheets. |
-pn<number> |
Add Page Numbers |
Add Libronix page number tags to the HTML output files beginning with the number appended to the -pn flag (e.g. -pn200). If there is no number appended to the -pn flag the first page number will be 1. There cannot be any spaces between the -pn flag and the starting page number. If the -pn flag is specified then ThML page number tags will be ignored.
|
-r |
Remove ThMl header tags |
Sometimes the CCEL headers will not line up in a way that is conducive to the construction of a good contents pane tree hierarchy. In these cases it may be more desirable to remove them altogether and add header tags manually or by way of the header file parameter. The '-r' flag removes all of the ThML header tags and replaces them with paragraph tags with the corresponding header class (hhead1, hhead2, etc.). The styles can then be manipulated easily in the style sheets.
|
| -s<number> |
Starting file prefix number |
The HTML output files are prefixed by 3 digit numbers (beginning at '000') so that the files can be discovered by PBB in the proper order. Sometimes it may be desirable to combine two or more ThML volumes into one PBB book. The '-s' tag will start the file numbers at <number> so that more than one batch of thm2pbb output files can be combined in the same book without any file name ambiguity. For example, if you want to combine the thm2pbb output from ThML files calcom36.thm and calcom37.thm (book of Acts parts I and II) into one Personal Book, then PBB would discover the files in collated order:
and so on. In this case you can fix the file numbering by setting the start file on the calcom37.thm conversion to a number higher than the last file name prefix of the calcom36.thm output. (e.g. -s15)
|
-t <title string> |
Set the HTML title |
This flag just sets the html file document titles to the text in the string that follows. There must be a space between the -t flag and the title string. If there are spaces in the title string it must be enclosed by quotes. If there is no title following the -t flag then your ThML conversion will fail or behave unexpectedly. (e.g. your title could be set to the string "-ka" and the 'keep attributes' flag won't work). Setting the HTML document titles won't affect your PBB output unless your document doesn't contain any header tags. This option is only included as a cosmetic adjunct for the HTML files themselves. If the '-t' flag is not specified then the HTML file titles will be set to the ThML file name.
|
-vc |
View ThML classes |
If the '-vc' flag is selected then samples of the ThML headers and paragraph classes will be displayed in an Internet Explorer window. If the ThML file contains CSS information then the class samples will be displayed in the same way that they show up in the CCEL documents. Use this command to help decide which ccel classes to change into headers by way of the header file. The samples displayed are content from the first occurrence of each class or header in the ThML file. In the case of the headers this isn't all that useful because the CCEL pages usually have a lot of header content on their title pages, but it's included anyway so the user can compare and see what sort of CCEL header content is translated to PBB header content ThML page number information is also output. The '-vc' output is saved in an HTML file called 'classes.html' in the working directory for future reference. See sample output below.
|
What follows is a couple of screen shots of the output of the thm2pbb command when the '-vc' flag is specified.
The first line is simply a notification indicating whether or not the ThML file contains any page number information and, if so, what the starting page number is (this can be a roman numeral as well as a decimal value).
The second section contains header tag information. It includes the number of each header type that is contained in the ThML file, a sample of each header type from the ThML file and the total number of headers throughout the document.
The third section contains the ThML paragraph tag class information. Each class is presented in a table with the class name contained in the top right cell and some sample output in the bottom row.
Image 1: -cv browser output - Headers and Classes

Image 2: -cv browser output - ThML class names and samples

The header file is just a simple ascii file that contains translation information on each line. A line of input contains a label to tell the program which header needs to be translated (h1, h2, h3, etc.) and a class name from the collection of classes contained in the ThML file. The user may also specify any classes that need to be translated as "text that should be centered" by using the 'center' label.
For example consider the screen shot above. In the case of this ThML file it might be appropriate to change every <p> tag that has a class definition of 'BOOK-CHAP-BLU-24-CEN08' to a Header2 (<h2>). In that case the line:
h2 BOOK-CHAP-BLU-24-CEN08
would be entered as one line in the header file. The desire to center all text tagged with the class name
center HDG-C-Gr-15-Pref05
Up to 6 HTML header levels are supported. You can add as many class translations as you like and a multiple number of translations of any particular header level is allowed.
The header file name can be anything you like. An example of a header file follows.
h1 3rdLevelAllCap12Bld09 h2 BOOK-CHAP-BLU-24-CEN07 h2 BOOK-CHAP-BLU-24-CEN08 h3 HDG-C-Gr-16-Pref05 h4 Normal01 h4 Normal02 h4 Normal03 h5 QUOTE10 center QUOTE11 center TableCaption13
Most header files would not be as large as this one. This one was made long simply to demonstrate the possibilities. If the ThML file contains headers then it will probably be the case that a header file will not be needed. In the cases where the ThML does not contain any header tags, expect, in most circumstances, to create a header file that is no larger than three or four lines.
Important Note: In certain cases where the CCEL page design is not satisfactory (check out some of their tables in Calvin's NT commentaries), or file divisions are just too convoluted or whatever there will be a temptation to modify the THML file before running this utility. This can be done but caution must be exercised. First of all ThML files are all well formed XML documents. If you remove a tag without balancing out the end tag or even if you inadvertently remove a quote in the wrong place your document will cease to be well formed and the thm2pbb command will fail to load it. Also, ThML files are all coded as UTF8 and, when edited, must be properly saved in with the same encoding. If you edit them with an editor that doesn't handle unicode very well (or, for instance, change them with a perl script that doesn't use the proper I/O syntax) then you will (at best) create a situation where all your Greek and Hebrew (etc.) turns into garbage or (at worst) your entire PBB book becomes indecipherable gibberish.
MIcrosoft Word will edit these files and save them properly (as long as you answer the prompts correctly).
Be Blessed
John
This program was designed and implemented by John McComb