When is a reversed string, not a reversed string?

! Warning: this post hasn't been updated in over three years and so may contain out of date information.

reverseThe .NET framework has a lot classes, with a lot of methods. So many in fact that when certain, obvious ones are missing, it’s a puzzle as to why. One such missing feature is a Reverse method for strings. Taking a collection of characters and reversing them is easy; so why is it missing?

As I wanted such a method, and didn’t want to burden the library I was writing with a disconnected feature, I created the ReversedString nuget package, which supplied it. Job done. Still, why isn’t it included in the standard framework? Experimenting with tests led me to a possible reason.

In order to test it, I figured I’d use words from other alphabets. One of the things I picked was a piece of Sanskrit: शक्नोम्यत्तुम्. Now I know nothing about Sanskrit, beyond discovering that it’s “the primary sacred language of Hinduism”. So apologies for those that can read it if the next bit is both stupid ignorance on my part and/or if what I put here is accidentally offensive or inappropriate. I started copying and pasting the individual characters in reverse, and noticed that they looked different: म्तुत्यम्नोक्श. Also, as I typed the characters, the existing ones changed too. Further, "शक्नोम्यत्तुम्".Reverse() didn’t equal “म्तुत्यम्नोक्श”. So I assume that the shape of the characters, and how they are encoded in UTF8, must be contextual. I there put शक्नोम्यत्तुम् through codebeautify.org’s string reversal tool and that gave me ्मुत्तय्मोन्कश. The first character looks odd, but when I used that in the test, it passed. So that reversal tool and my code:

are using the same method of reversing the string. Not sure what that means though. Are both wrong?

English too has some odd characters, known as typographic ligatures. So the obvious next question is, how are those ligatures handled when reversed? So I experimented with the “æ” ligature, which is an “a” and “e” joined together. It’s unusual to use it these days, but “æsthetic” is a valid way of writing “aesthetic” in English. What happens if I reverse that? According to the previously mentioned reversal tool and my code, it’s “citehtsæ”. The “ae” hasn’t been reversed. Now, according to http://www.vanhamel.nl/codecs/Ea_(ligature), there exists an “ea” ligature in Irish, but as far as I can tell, this doesn’t exist in UFT8. And “æsthetic” is an English word; not an Irish one. So maybe the reverse of “æsthetic” should be “citehtsea”, but then it’s more characters. Further, "æsthetic".Reverse().Reverse() wouldn’t get you back to the original string, ie there’d be information loss.

And so this has led me to a possible reason for why there is no built-in string.Reverse() function; because it works fine for simple ASCII characters, but doesn’t work for more “exotic” (from a simple-minded, English-speaking person’s perspective) characters. I’ve no idea if that is the reason, but it seems to fit.

If you need a string reversing method, feel free to copy my code or use my nuget package. But be aware of its naivety over how it reverses some strings.

2 thoughts on “When is a reversed string, not a reversed string?

  1. Yeah, both are wrong. Like so wrong that you can’t really call it reversing the string, it’s more like reversing its codepoints. It breaks for combining characters (which includes diacritics), for surrogate pairs (which includes emoji), for variation selectors, practically for pretty much everything but basic ASCII characters. Have a look at StringInfo if you need reasonable (but probably still far from ideal) reverse.

Comments are closed.