Unicode square brackets
Posted: Sat Jan 31, 2026 9:04 pm
This is mainly for Tim, but others may be interested . . .
I have the NASB 1995 module with Strong's numbering from BIble Analyzer Store, but it presents display issues on Mark 16:9,20; John 5:3-4; and other places where the left and right square bracket is inserted into the text. This is an example: The first square bracket in verse 9 causes the parser to miss "[G1161]" and its closing square bracket, which is the Strong's code for the Greek word for "now" (Gk dé).
The problem is caused by the fact that the same character, "[" or "]", is used in the Bible for two different purposes: as a symbol to be printed literally and also to wrap a Strong's number. The parsing engine gets confused about what to do, so the display is munged.
As I see it, there are two solutions. One is to rewrite the parser to better distinguish the two. For example, Strong's numbers follow a Python regular expression like \[[GH]\d+\] which take priority, and all other square brackets should be displayed literally, not interpreted for the display. I don't know how much effort it would be to code, but that is a solution.
The other solution, simpler IMO, is to use UTF-8 unicode symbols to designate square brackets in the text so the parser does not confuse them with regular square brackets for Strong's numbers. I have made a few experiments using several different fonts: Verdana, Bookman Old Style, Georgia, Galaxie Unicode Greek, Segoe Print, and Trebuchet MS. In all of them, the Unicode brackets displayed below work normally. I have inserted the literal Unicode hex values if you want to investigate further.
I prefer the white square brackets 〚like this〛but they insert a little extra whitespace on the left and right, as do the full width square brackets which look[like this].
What do you think?
I have the NASB 1995 module with Strong's numbering from BIble Analyzer Store, but it presents display issues on Mark 16:9,20; John 5:3-4; and other places where the left and right square bracket is inserted into the text. This is an example: The first square bracket in verse 9 causes the parser to miss "[G1161]" and its closing square bracket, which is the Strong's code for the Greek word for "now" (Gk dé).
The problem is caused by the fact that the same character, "[" or "]", is used in the Bible for two different purposes: as a symbol to be printed literally and also to wrap a Strong's number. The parsing engine gets confused about what to do, so the display is munged.
As I see it, there are two solutions. One is to rewrite the parser to better distinguish the two. For example, Strong's numbers follow a Python regular expression like \[[GH]\d+\] which take priority, and all other square brackets should be displayed literally, not interpreted for the display. I don't know how much effort it would be to code, but that is a solution.
The other solution, simpler IMO, is to use UTF-8 unicode symbols to designate square brackets in the text so the parser does not confuse them with regular square brackets for Strong's numbers. I have made a few experiments using several different fonts: Verdana, Bookman Old Style, Georgia, Galaxie Unicode Greek, Segoe Print, and Trebuchet MS. In all of them, the Unicode brackets displayed below work normally. I have inserted the literal Unicode hex values if you want to investigate further.
I prefer the white square brackets 〚like this〛but they insert a little extra whitespace on the left and right, as do the full width square brackets which look[like this].
What do you think?