How Do They Do That – Editors

Authors

Publication

Pub Details

Date

Pages

See all articles from QL Hacker's Journal 15

Here is an interesting response to a posting on editing files larger than memory. The algorithm listed below has a few good points, but also a few bad ones. I guess the key problem is finding the last line of a file quickly without scanning through the entire file each time. – ED

arisz@csri.toronto.edu (Aris Zakinthinos) writes:

I was working on a document in a wordprocessor a couple
of days ago and one thing struck me as cool. Even though
the document is around 500k I could insert text at the
begining of the document with ease. The characters were
appearing as fast as I could type. So I started thinking
how do they do it. I don’t believe they move ~500k of
memory around for every new keystroke. Anybody have any
ideas on how this is done?

A relatively simple scheme that allows editing of files as large as disk space permits goes as follows:

Keep the current line in memory. Keep all lines above current line in one temporary file (file 1) and all lines beyond current line in another temporary file (file 2). The trick is to keep the lines in file 2 in reverse order.

Suppose that I’m editing the following file:

     1
2
3
4
5
6
7
8
9

And that the curser is placed at line 4:

The edit files will show:

   File 1:     File 2:

1 10
2 9
3 8
7
6
5

To insert a line above current line: Append to file 1:
To insert a line beyond current line: Append to file 2:
To delete a line: Read last line from file 2 into buffer.
To move one line up: Append buffer to file 2 and read last line from file 1 into buffer.
To move one line down: Append buffer to file 1 and read last line from file 2 into buffer.

Initially file 1 will be empty and file 2 will contain the original file in reverse line order. To create file 2 in the first place at least two different approaches may be used:

1) If no processing is to be done on the original file (no CR LF expansion etc), that is if file 2 will be exactly the same size as the original file the following scheme may be used:

Obtain the size of the original file. Call it SIZE

Create file 2 and seek to position SIZE.

Read original file one line at a time each time seeking backwards in file 2 the size of the line read and write the line to file 2. After writing seek backwards again.

2) If there is some need for processing on the original file:

Open original file, seek to the end, and read backwards one line at a time and append the lines read to file 2.

The advantage of 1) above is that the original file is read from the beginning and that file 2 is created from the end.. This makes it possible in a multitasking system (Unix) to do the creation of temporary files as a separate task (process). Using proper synchronization between editing and copying it is possible to allow the user to start edit the file even before file 2 is finished. Quite impressing to start editing a 5 MB file and see the first page show up immediately.

Actually we are using exactly this scheme on a Unix based wordprocessor system.

Of course this scheme is not as fast as a strictly memory based one, but using proper buffer sizes and disk caching the performance may be quite good. The main disadvantage is, that moving around requires moving data between the two temporary files.

Per H. Nielsen

Products

 

Downloadable Media

 

Image Gallery

Scroll to Top