On 22 Apr 2020, calcmandan said the following...
I'm so glad you jumped into the thread becaue this is the most
complete explaination I've yet to see.
Thank you; that's very kind of you to say. I wrote
more extensively about this on The Fat Dragon:
http://fat-dragon.org/post/terminals/
All I want is to have my new game render properly on as many
systems as possible. Up until now, I was expecting to release it
as a dos door game to allow full CP437 support but now you've
given me a different perspective.
I've used many terminals across Windows and Linux and none of them
render ANSI graphics properly, at all.
Well, again I think it's important to draw a distinction
between "ANSI terminal handling" and "the CP437 character
encoding" here. It's likely that most of those terminal
programs are correctly interpreting ANSI terminal codes;
but if your software is generating CP437 character data,
then it's unlikely to be interpreting _that_ correctly.
To recap for those who aren't up on the terminology,
character data is represented by an integer. However,
we rely on the use of some well-known "character encoding"
to assign meaning to the values of those integers;
familiar examples include 7-bit ASCII, 5-bit BAUDOT,
or 8-bit EBCDIC. In ASCII, the capital letter 'A' is
represented as decimal value 65; 'B' is 66, and so on.
Values less than 32 are various control symbols ("SOH"
for "Start of Header", etc). For the IBM PC, IBM
chose to define a new 8-bit encoding they called CP437
(supposedly it was on page 437 of their character
code book). CP437 coincides with ASCII in the low
seven bits, but if the high bit is set, this gives
access to an extended set of glyphs, like the
pseudo-graphical characters. For example, the double
box drawing character is decimal 186. Unfortunately,
CP437 was always pretty limited, and it was never
heavily used outside of the PC world.
Most modern terminals either use something like UTF-8
natively (really, there's very little remaining reason
not to do this these days) or they're using something
ISO/IEC 8859-1 encoding for the US/Western European
Latin alphabet. In places where English and/or the
Latin alphabet are not regularly used, there are other
encodings as in common use. Eventually someone realized
that having a ton of such encodings that were often in
conflict due to sharing a small encoding space (say,
8 bits) wasn't going to scale globally, and the Unicode
consortium was formed to come up with a single encoding
that would encompass all of the world's languages.
Obviously, there are two many symbols in such a set
to represent them all in 8-bit bytes, so this led to a
number of competing "encoding" standards; Ken Thompson
(author of Unix) and Rob Pike (primary creator of the
the Go programming language) and others are Bell Labs
had built a new operating system intended as a successor
to Unix that they called Plan 9 that would use Unicode
natively. They ran headlong into the encoding problem
because Plan 9 ran machines of differing Endian-ness;
while the world basically agrees on how to move a byte
between systems (e.g., which bit is most significant
and which is not), the same is not true of multi-byte
data. Ken and Rob and a couple of others were at an
all-night diner in New Jersey discussing this problem
when they designed a new, variable-length encoding
that could be used to encode any Unicode character
into a unique byte sequence; they called this UTF-8.
UTF-8 has swept all that came before it and is now
the standard Internet character encoding.
So.... This brings us to your problem of things not
rendering properly in modern terminals. Syncterm
is special because it understands CP437 natively for
compatibility with legacy BBS software. That's cool,
but the same is just not true of PuTTY, xterm, etc.
Those are expecting either UTF-8 or something like
ISO-8859-1, but you're sending them CP437. But the
terminal software doesn't really care; it gets some
byte with some value and displays some glyph; if
the glyph it displays isn't what you intended, it
more or less doesn't even known.
What to do? Well, as I mentioned earlier, one _can_
find a CP437 *font* that will render in e.g., xterm
or a similar terminal program (and I'm sure similar
fonts exist for analogous Windows programs, but I'm
not a Windows user so I don't really know). You lie
to it and tell it it's using ISO 8859-1 (aka Latin-1)
and it will draw the right things. That all works
fine, but if you then switch to running a program
that expects to generate Latin-1, then you have the
inverse problem that THAT will look strange.
A more robust solution is to always use UTF-8 in your
terminal: since the CP437 glyphs all appear in the
Unicode character set, they can all be represented as
UTF-8 byte sequences and displayed in a UTF-8-aware
terminal. Now we're back to a font problem, however:
the Unicode font in question has to actually display
the glyphs. That's where fonts like Unscii come in;
they support all the old BBS-style stuff (and Commodore
and Amiga fonts, too!).
So how do we get UTF-8?
There are a couple of solutions one can usefully
employ here: one is to write your software so that
it snoops the character set used by the terminal
software somehow and generate either CP437 or
e.g. UTF-8 depending. Another is to always generate
one or the other and then translate on the receiving
side; I believe that's what the software that ryan
pointed you at earlier does (it translates from UTF-8
to CP437 and vice-versa interactively). A thing
that's a bit of a bummer here is that I don't think
that `syncterm` supports UTF-8, but perhaps Digital
Man can be persuaded to add support?
Personally, I'd always generate UTF-8 and assert
that people should use a UTF-8 aware terminal to
connect. But a lot of people in the BBS world use
syncterm or similar programs to connect, so you've
got to make sure those people know to use something
else for your game.
I load syncterm and it works beautifully, and it supports CP437.
I'll talk to my collaborator and we'll play with UTF-8 with our
existing ansi designs and see how it'll translate.
In the end, I suspect you may find using CP437 is
the easiest route, and you get compatibility with
other terminals using something like the `cp437`
program Ryan pointed you at earlier.
Thanks again for your great advice.
Sure thing! I don't know how great it's going to
be, though. :-P
--- Mystic BBS/QWK Gate v1.12 A46 2020/04/13 (Linux/64)
* Origin: % disksh0p!bbs % bbs.diskshop.ca % SciNet ftn hq % (77:1/100)