Published 08 June 2021 • 3415 words
Most programming languages need to deal with text in some way or
another—and programming languages for writing interactive fiction
need to deal a lot with text. The way modern languages do it is to
have some sort of String type, which will generally support text
encoded using some Unicode format.
But text is deceptively simple. Even if we don’t get into all of the
complexities of Unicode and internationalisation (“I just want to count
the characters in this text, how hard could that be?”),
requirements on how you store and operate on this text can vary wildly
depending on the operations and limitations that you have. For example,
a contiguously-stored binary is good for displaying text, but terrible
for editing it, if you have a text editor. A rope storage is the complete
opposite of that. Storing Unicode in UTF-16 is great for implementing
operations on a JavaScript string, but it wastes too much memory on
small devices like mobile phones.
Because of this, even though we generally talk about “String” as a single
type, modern languages will tend to have several of these that embody
different trade-offs. This may be exposed to the user (Haskell has at
least 5 in the standard library, and you’re supposed to pick the tradeoff
that fits your use-case), but it may also just be a runtime detail
(JavaScript implementations have one “String” type, but multiple
representations covering interning, ropes, slices, and ASCII-only special
cases for saving memory).
Crochet has many types of strings as well, and it forces you to pick one
of them. The difference here is that Crochet’s types are not about
storage, they’re about security.
So, why would you want to differentiate strings for security, even if
ultimately they have the exact same storage representation?