LAF-Fabric has a successor: Text-Fabric. LAF-Fabric stays around in order to run legacy notebooks. It is recommended to use Text-Fabric for new work.
T.text() function gets a new optional parameter
Instead of returning its results as plain or HTML text,
can also return its results as a list of object nodes of the type you specify in
The chapter and verse parameters may also be given as string numerals now.
See ETCBC Reference under T.text().
Bug fix in the generic connection features x and y: see
API Reference under C, Ci (Connectivity) and then B.
C.laf__y had stopped working properly.
I pushed master before merging the branch I was working on into it.
Basic support for sources in the Greek language, such as the New Testament. Thanks to Jonathan Robie and Cody Kingham for providing Greek sources to play with.
T API needed modifications, and also the translation tables for Bible book names.
LAF-Fabric makes it now easier to use several sources in one API, e.g. one for the Hebrew Bible and one for the Greek New Testament.
Bug fix in multiple extra annotation packages: something with XML identifiers was broken.
Bug fixes in multiple extra annotation packages.
When loading data, LAF-Fabric accepts multiple extra annotation packages. Before, it only accepted zero or one extra annotation package (annox).
New output formats for plain text: you can now produce texts with inflection stripped: only the lexemes. See T.text().
The functions in the ETCBC API for outputting text have been streamlined. See T.text().
Removed an optional parameter
error=True to the
Now the API supplies
msg() as before, but also
inf(), which does the same, but then to standard output instead of standard error.
Added an optional parameter
error=True to the
Normall messages go to the standard error, but with
error=False they go to the standard output.
When run in a Jupyter notebook, messages sent to the standard error get a coloured background,
as opposed to messages on the standard output.
Small fix in
la (=Latin) did not work. Now it does.
Better signature of the method
T.node_of(book, chapter, verse, lang='en').
book is now a book name in language
lang (was: a book node).
More languages for bible book names, now also Peshitta Syriac.
Small fixes in transcriptions (nun hafukha, setumah, petuhah, paseq).
New languages for book names added: Russian, Turkish, Korean, Spanish, Swahili.
There are now multilingual book names: English, Latin, Greek, Hebrew, German, French, Dutch. in English and Latin.
The ETCBC API
T is enriched with a function books(lang=...) that delivers the names of the books
in English and Latin.
The text forms generated by ETCBC API
T for consonantal representations, suppress the special final forms of
the Hebrew consonants that have final forms.
Small fixes: The new T API needs to load the annox
lexicon. Now this will be done without the
user having to bother about it.
Higher level text producing functions in a new ETCBC API element:
See ETCBC Reference.
Better logic in transcription methods in etcbc.lib See notebook plain for methods to represent Hebrew text in various ways.
Skipped (for clumsy reasons)
The MQL API in the etcbc package now uses the ETCBC data plus the enrichments (x_etcbc4b).
Improvements in the documentation.
Slightly better error messages if configuration files cannot be found or contain wrong values.
etcbc.extra.deliver_annots() has been changed.
It is now easy to generate annotation packages that consist of various sets of data.
The new method accepts a list of set specs to generate those annotation sets.
The API element
L has a new method
L.p which enables you to drill down quickly to a
book, chapter, verse, sentence, clause and phrase of your choice.
Under the hood: the
L API element was coded in the
laf package, although it used
ETCBC-specific concepts. Now it has been moved to the
etcbc package entirely.
In order to find the documentation of
L you should consult the ETCBC reference.
Fixes: preparation of data still failed in some cases.
Fix: preparation of data failed in some cases.
Fix: prepared data is only loaded when needed, like all other data.
New API element
L (with methods
L.u) based on new preprocessed data.
These methods take you from a node up to container nodes or down to contained nodes.
This is a big improvement in the interplay between MQL queries and LAF-Fabric.
The better practice is to write a clean MQL query to get the targeted patterns, and use
to retrieve information from the context of the hits.
Warning: when your LAF-Fabric needs the data for
L for the first time, it will compute it
and store it as binary data on disk. This computation takes several minutes.
In subsequent cases, LAF-Fabric can load the data from disk in a matter of seconds.
Bug fixes and documentation.
The etcbc.px module has been replaced by etcbc.extra. This is a generalized module to transform extra data to annotations. It can be used to process data from px files, but also data from lexicon files. New lexicon data is underway.
The etcbc.px module has been generalized to etcbc.extra. It is a module to turn extra data into a valid annotation set.
The welcome string now contains a reference to the feature documentation.
etcbc.featuredoc now produces sphinx output that can be put on a readthedocs website.
Documentation update. Links to the original data as archived in DANS-EASY.
Adaption to the new ETCBC4 version of the data: in documentation and in the etcbc and emdros2laf packages. Bugfixes.
Documentation update. The data source BHS4 has been rebaptized to ETCBC4, and the documentation, which was geared towards the BHS3 data source, is now adjusted to ETCBC4.
Fine tuning of the Hebrew transliteration. The new plain text looks exceedingly well now. All changes w.r.t. the previous version of the ETCBC database have been reviewed, which has resulted in new code to generate the fine points of Hebrew text and type, e.g. multiple accents and vowel pointings, and inversed nuns.
The transliteration in etcbc.lib which converts between Hebrew characters and transliterated latin characters, has been extended to deal with vowel pointings and accents too.
The module etcbc.px retrieves one more field, called instruction from the px files.
Changes in the annotation space, a new etcbc.px which can read certain types of px data and transform it into an extra LAF annotation package.
Due to the new names for edge features, the data for BHS3 and BHS4 has been recompiled, and all tasks that use the old names have to be updated.
A few changes in etcbc.emdros2laf: edge annotations are no longer empty annotations, but have a feature structure.
A few changes in etcbc.emdros2laf, which facilitates generating feature declaration documents.
In the API you can ask for the locations of the data directory and the output directory.
LAF-Fabric reports the date and time when it has loaded data for a task. So in every notebook you can see the version of LAF-Fabric, the datetime when the loaded data has been compiled, and the datetime when this data has been loaded for this task. This is handy when you share tasks via nbviewer.
New API element EE, which yield all edges in unspecified order. The module featuredoc can now document all features, also edge features.
Separated the data directory laf-fabric-data into an input directory (laf-fabric-data) and an output directory (laf-fabric-output). In this way, it is easier to download new versions of the data without overwriting your own task results.
Minor improvements in the emdros2laf conversion, discovered when converting the new BHS4 version of the Hebrew Text database. If you want to use the BHS4 data (beta), download the data again.
Minor improvements in the laf-api.
Added NK, which can be passed as a sort key for node sets. It corresponds with the “natural order” on nodes. If an additional module, such as etcbc.preprocess has modified the natural order, this sort key will reflect the modified order. If you let NN() yield nodes, they appear in this same order.
Also added MK, which can be passed as a sort key for sets of anchors. It corresponds with the “natural order” on anchor sets.
Improvements in etcbc.trees, the module that generates trees from the ETCBC database.
Developed the etcbc.trees module further. Trees based on the implicit embedding relationship do not exhibit all embedding structure: clauses can be further embedded by means of an explicit mother relationship. The rules are a bit intricate, but it has been implemented (BHS3 only, no CALAP). See the updates trees notebook.
Added tree defining functionality to the etcbc package: etcbc.trees. You can make the implicit embedding relationship between objects explicit by means of parent and children relationships.
Adapted the node order as customized by etcbc.preprocess: the order is now a total ordering. Main idea: try to order monad sets by the subset relation, where embedder comes before embedded. If the sets are equal, use the object type to force a decision. If two monad sets cannot be ordered by the subset relation, look at the elements that they do not share. The monad set that contains the smallest of these elements, is considered to come before the other.
Added Syriac transcription conversions.
In emdros2laf every source can now have its own metadata. In etcbc there is a workable definition between consonantal Hebrew characters and their ETCBC latin transcriptions.
More fixes in emdros2laf, a new source, the CALAP has been converted to LAF. LAF-Fabric has compiled it, and it is ready for exploration. See the example notebook plain-calap. The CALAP is included in the data download (see Getting Started).
Small fixes in emdros2laf.
The conversion program from EMDROS to LAF (now the package emdros2laf) has been integrated in LAF-Fabric. Because of this a small reorganization of subdirectories was necessary (again). The EMDROS source of the LAF has a place in laf-fabric-data as well. So: again: a new download of the data is required.
Small reorganization of subdirectories. The structure is now better adapted to work with completely different data sources. Update your configuration files. The trailing directory names must be removed. So:
work_dir = ~/laf-fabric-data/etcbc-bhs
should change into:
work_dir = ~/laf-fabric-data
Because of this reorganization you have to download the data again.
After loading LAF-Fabric display the compilation data and time of the data used.
In specifying what features to load, you may omit namespaces and labels. You can specify the features to load in a much less verbose way.
load_again() have a new optional parameter
add, which instructs laf fabric to
do an incremental loading, without discarding anything that has already been loaded.
The order defined by
etcbc.preprocess has been refined, so that it can also deal with empty words.
Under the hood¶
More unit tests, especially w.r.t. node order and empty words. The example data on which the unit tests act, has been enlarged: it now contains also Isaiah 41:19 in which two empty words occur.
Better error handling, especially when the load dictionary does not conform to the specs of the API reference.
Under the hood¶
More unit tests, especially w.r.t. error checking, and node order, and the
BF API element.
The special edge features for all annotated edges and unannotated edges are now called
laf:.x, because otherwise
their names become private method names in Python.
Under the hood¶
More unit tests.
Because of the renaming of special edge features, a new copy of the data is needed. Download the latest version.
The methods of the connectivity objects (except
e() yield all iterators and have an optional parameter
The API elements now can be added very easily to your local namespace by saying:
For connectivity there is a new API method:
C.feature.e(n). This returns
True if and only if
n is connected to a node by means of an edge annotated with
This function can also be obtained by using
C.feature.v(n), but the direct
e(n) is much more efficient.
When calling up features as in
F_shebanq_ft_part_of_speech, you may now leave out the namespace and also the label.
F.part_of_speech also works.
Small bug fixes.
The API has changed for initializing the processor and for working with connectivity (
Please consult API Reference.
- There is an example dataset included: Genesis 1:1 according to the ETCBC database.
- Configuration is easier: a global config file in your home directory.
- There is a laf-fabric-test.py script for a basic test.
More data has been precompiled. This reduces the load time when working with LAF-Fabric. The data organization has changed. Please download a new version of the data.
Configuration is easier now. A single config file in your home directory is sufficient. There are also other ways, including a config file next to your notebook.
Changes under the hood¶
- The mechanism to store and load LAF data now has a hook by which auxiliary modules can register new data with LAF Fabric.
Currently, this mechanism is used by the
etcbcmodule to inject a better ordering of the nodes than LAF Fabric can generate on its own. In future versions we will use this mechanism to load compute and load extra indices needed for working with the EMDROS database.
- Unit tests. In the file lf-unittest.py there are now several unit tests. If they pass most things in LAF-Fabric are working as expected. However, the set needs to be enlarged before new changes are undertaken.
- You can make additional sorting persistent now, so that it becomes part of the compiled data. See the
prepfunction in the API reference.
- It is possible to set a verbosity level for messages.
- There were chunks of time consuming data that were either completely or often unnecessary. This data has been removed, or is loadable on demand respectively. Overall performance during load time is a bit better now.
The etcbc module has a method to compute a better ordering on the nodes. This module works together with the new API method to store computed results.
There is a significant addition for dealing with the order of nodes:
- New function
BF(nodea, nodeb)for node comparison. Handy to find the nodes that cannot be ordered because they have the same start points and end points in the primary data.
- New argument to
NN()for additionally sorting those enumerated nodes that have the same start points and end points in the primary data.
- The representation of node anchors has changed. Existing LAF resources should be recompiled.
When LAF-Fabric starts it shows a banner indicating its version.
Opening and closing of files was done without specifying explicitly the
Python then takes the result of
locale.getprefferredencoding() which may not be
utf-8 on some systems,
notably Windows ones.
open() call for a text file is now passed the
open() calls for binary files do not get an encoding parameter of course.
Code supporting ETCBC notebooks has moved into separate package etcbc, included in the laf distribution.
When loading data in a notebook, the progress messages are far less verbose.
Added an introspection facility: you can ask the F object which features are loadable.
Changes in the way you refer to input and output files.
You had to call them as methods on the
processor object, now they are given with the
Under some conditions XML identifiers got mistakenly unloaded.
Fixed by modifying the big table with conditions in
Configuration fix: the LAF source directory can be anywhere on the system, specified by an optional config setting. If this setting is not specified, LAF-Fabric works with a binary source only.
A download link to the data is provided, it is a dropbox link to a zipped file with a password. You can ask me for a password.
Focus on working with notebooks. Command line usage only supported for testing and debugging, not on Windows.
Thoroughly reorganized and adapted to latest changes.
The configuration file, laf-fabric.cfg will no longer be distributed. Instead, a file laf-fabric-sample.cfg will be distributed. You have to copy it to laf-fabric.cfg which you can adapt to your local situation. Subsequent updates will not affect your local settings.
Notebook additions only.
The notebook clause_constituent_relation is an example how you can investigate a LAF data source and document your findings.
We intend to create a separate github dedicated to notebooks that specifically analyse the Hebrew Text Database.
- New API element
- There is a new object
Cby which you can traverse from nodes via annotated edges to other nodes. The difference is that
Ciuses the edges in the opposite direction. See C, Ci (Connectivity).
- New API element
Bugfix. The order of node events turned out wrong in the case of nodes that are linked to point regions,
i.e. regions with zero width (e.g.
(n, n), being the point between characters
This caused weird behaviour in the tree generating notebook
trees (rough path).
Yet it is impossible to guarantee natural behaviour in all cases. If there are nodes linked to empty regions in your LAF resource, you should sort the node events per anchor yourself, in your custom task. Existing LAF resources should be recompiled.
Bugfix. Thanks to Grietje Commelin for spotting the bug so quickly.
My apologies for any tension it might have created in the meantime.
Better code under the hood: the identifiers for nodes, edges and regions now start at 0 instead of 1.
This reduces the need for many
+ 1 and
- 1 operations, including the need to figure out
which one is appropriate.
- Node events are added to the API, see NE (Next Event). With
NE()you traverse the anchor positions in the primary data, and at each anchor position there is a list of which nodes start, end, resume or suspend there. This helps greatly if your task needs the embedding structure of nodes. There are facilities to suppress certain sets of node events.
- Node events make use of new data structures that are created when the LAF resource is being compiled. Existing LAF resources should be recompiled.
- API elements are now returned as named entries in a dictionary, instead of a list.
- In this way, the task code that calls the API and gives names to the elements remains more stable when elements are added to the API.
- Documentation: added release notes.
- New Example Notebook: participle.
laf.tasknow returns a keyed dictionary instead of a 6-tuple.
The statement where you define API is now
API = processor.API() F = API[‘F’] NN = API[‘NN’] ...
(msg, NN, F, C, X, P) = processor.API()
- Connectivity added to the API, see C, Ci (Connectivity).
- There is an object C by which you can traverse from nodes via annotated edges to other nodes.
- Documentation organization:
- separate section for API reference.
laf.tasknow returns a 6-tuple instead of a 5-tuple:
- C has been added.
- nodes or edges annotated by an empty annotation will get a feature based on the annotation label.
- This feature yields value
''(empty string) for all nodes or edges for which it is defined. Was
1. Existing LAF resources should be recompiled.