- All strings used only contain ASCII characters.
- A base-128 integer type has been defined. 128 comes from the number of possible characters we will be expecting.
- Although most basic operations are not supported, there is implementation to compare one of these base-128 integers to another (<, >, =).
- Our data is stored in an array of base-128 integers that each represents a string.
The sorting process would work like this:
- Convert all given strings into base-128 integers. This can be done by having each ASCII character represent a different value at each place. Store these in the array.
- Utilize a general sorting algorithm using the comparisons defined in the base-128 integer class.
The searching process would work like this:
- Convert the "search" string to a base-128 integer.
- Implement a basic binary search method on the array of base-128 integers and return the index of a matching value if found.
This seems to work for the most part, but I’d like to know if there’s a more reasonable way to approach the problem.
There is no point to convert strings to numbers. (Coincidentally, they appear to have the exact same memory layout anyway – the conversion is a no-op).
You say that
there is implementation to compare one of these base-128 integers to another (<, >, =).
but that’s meaningless. We can equally state
there is implementation to compare one of these strings to another (<, >, =).
(which in fact does exist) and all you have described is to "use a general sorting algorithm on an array of strings" and "Implement a basic binary search method on the array of strings". The description of your method becomes empty – yes you can always do X by doing X.
That said, no this is a method with bad time complexity. Lexical string comparison (just like base-128 integer comparison) is linear in the length of the string/integer. It is more efficient to actually build a trie from the array of strings and search in that.