What is text on a computer?
A brief introduction to code pages and Unicode is a good overview of code and text.
The point is that on the computer there are just sequences of binary digits. The data doesn’t include the information needed to decode the sequences. (Even the sequencing is coded.) To get recognizable text one needs a lookup table that maps code points to abstract characters and from there to glyphs that look right. To get text you need a system with enough variety to handle the characters you want – it is the difference between the codes that makes text – something the poststructuralists realized at a different register.
Here is a first pass at what you need:
1. You need rules for sequencing the flow. When does a sequence of digits start? What direction does it go in? How do you get a string (sequence) of digits from the physical medium like a hard drive?
2. You need rules for chunking or interrupting the flow. What consitutes a primitive unit in the flow? Is each character a byte (or 8 bits)? See Deleuze and Guattari on machines interrupting the flow.
3. Once you have interrupted the flow and transformed it into a series of codes you need a lookup table to assign a letter or other character to each code. The letters assigned are really abstract differences.
4. Finally you need to assign a glyph or visible character to each abstract character.
This process is the foundation of text analysis.