Mason Firewall
 
Search:

Home    Articles    Authors    Links    Useful Tips    Polls    HOWTOs   
Browsing Issue # 4  
Monthly Notifications


Issue 4 Articles
Sudo and other ways to avoid root!
Interview With John Ousterhout - Author of Tcl/TK
Netcat - Network Connection Made Easy
GPL and MPL explained
Linux Security Tips for Redhat Distribution
Data file formats for TCL scripts

Latest Tips
Adding the hostname to the XTERM titlebar - dead simple!
Build RPMs as a User
Netstat - What is it good for ?

P o l l  Q u e s t i o n
Where are you most likely to find a Linux answer?

Mailing Lists
Newsgroups
Linux Document Project
Email the Author
Man Pages
Paid Support
Books/Magazines
Web Ezines
Linux User Groups

   [ Results ]


Feedback

 
TCL File Formats
by: Koen Van Damme
(New)    Print Edition

Abstract: This column shows some simple but powerful techniques to make TCL data persistent in a text file and to parse it again from the file at a later point. We make use of TCL's flexible syntax to make the data files easily readable and even editable.

This column is also available as a paper including runnable examples.

Many thanks to everybody on comp.lang.tcl who scrutinized the paper and suggested improvements, in particular to Andreas Kupries and Morten Jensen.



Introduction

A typical TCL script stores its internal data in lists and arrays (the two primary data structures in TCL). Suppose you want to write a TCL application that can save its data on disk and read it back again. For example, this allows your users to save a project and load it back later. You need a way to write the data from the place where it is stored internally (lists and arrays) to a file. You also need a way to read the data back into a running script.

You can choose to store the data in a binary form or in a text file. This paper is limited to textual data file formats. We will look at a number of possible formats and how to parse them in TCL. In particular, we will show some simple techniques that make text file parsing a lot easier.

This paper assumes that you are familiar with TCL, and that you have written at least a few simple scripts in TCL.

A simple example

Suppose you have a simple drawing tool that places text and rectangle items on a canvas. To save the resulting pictures, you want a textual file format that must be easy to read. The first and simplest file format that comes to mind looks something like this:

   example1/datafile.dat
   rectangle 10 10 150 50 2 blue
   rectangle 7 7 153 53 2 blue
   text 80 30 "Simple Drawing Tool" c red

The first two lines of this file represent the data for two blue, horizontally stretched rectangles with a line thickness of 2. The final line places a piece of red text, anchored at the center (hence the "c"), in the middle of the two rectangles.

Saving your data in a text file makes it easier to debug the application, because you can inspect the output to see if everything is correct. It also allows users to manually tinker with the saved data (which may be good or bad depending on your purposes).

When reading a data file in this format, you somehow need to parse the file and create data structures from it. To parse the file, you may be tempted to step through the file line by line, and use something like regexp to analyse the different pieces of the text. This is one possible implementation:

   example1/parser.tcl
   canvas .c
   pack .c

   set fid [open "datafile.dat" r]
   while { ![eof $fid] } {
      # Read a line from the file and analyse it.
      gets $fid line

      if { [regexp \
         {^rectangle +([0-9]+) +([0-9]+) +([0-9]+) +([0-9]+) +([0-9]+) +(.*)$} \
            $line dummy x1 y1 x2 y2 thickness color] } {
         .c create rectangle $x1 $y1 $x2 $y2 -width $thickness -outline $color

      } elseif { [regexp \
         {^text +([0-9]+) +([0-9]+) +("[^"]*") +([^ ]+) +(.*)$} \
         $line dummy x y txt anchor color] } {
         .c create text $x $y -text $txt -anchor $anchor -fill $color

      } elseif { [regexp {^ *$} $line] } {
         # Ignore blank lines

      } else {
         puts "error: unknown keyword."
      }
   }
   close $fid
We read one line at a time, and use regular expressions to find out what kind of data the line represents. By looking at the first word, we can distinguish between data for rectangles and data for text. The first word, therefore, serves as a keyword: it tells us exactly what kind of data we are dealing with. We also parse the coordinates, color and other attributes of each item. Grouping parts of the regular expression between parentheses allows us to retrieve the parsed results in the variables 'x1', 'x2', etc.

This looks like a simple enough implementation, assuming that you understand how regular expressions work. But I find it pretty hard to maintain. The regular expressions also make it hard to understand.

There is a more elegant solution, known as an 'active file'. It is captured in a design pattern, originally written by Nat Pryce. It is based on a very simple suggestion: Instead of writing your own parser in TCL (using regexp or other means), why not let the TCL parser do all the work for you?

The Active File design pattern

To explain this design pattern, we continue the example of the simple drawing tool from the previous section. First, we write two procedures in TCL, one that draws a rectangle, the other writes text.

   example2/parser.tcl
   canvas .c
   pack .c

   proc d_rect {x1 y1 x2 y2 thickness color} {
      .c create rectangle $x1 $y1 $x2 $y2 -width $thickness -outline $color
   }

   proc d_text {x y text anchor color} {
      .c create text $x $y -text $text -anchor $anchor -fill $color
   }
To make a picture on the canvas, we can now call these two procs several times, once for each item we want to draw. To make the same picture as above, we need the following three calls:
   example2/datafile.dat
   d_rect 10 10 150 50 2 blue
   d_rect 7 7 153 53 2 blue
   d_text 80 30 "Simple Drawing Tool" c red

Does this look familiar? The code for calling our two procs looks almost exactly like the data file we parsed earlier. The only difference is that the keywords have changed from 'rectangle' and 'text' to 'd_rect' and 'd_text'.

Now we come to the insight that makes this design pattern tick: to parse the data file, we treat it like a TCL script. We just put the calls to our two procedures in a file, and we use that as the data file. The fact that the data file actually contains calls to TCL procedures, is the heart of this design pattern.

Parsing the data file is now extremely easy:

   source "datafile.dat"
The built-in TCL command source reads the file, parses it, and executes the commands in the file. Since we have implemented the procedures d_rect and d_text, the source command will automatically invoke the two procedures with the correct parameters. We will call d_rect and d_text the parsing procedures.

We do not need to do any further parsing. No regular expressions, no line-by-line loop, no opening and closing of files. Just one call to source does the trick.

The data file has become a TCL script that can be executed. This is called an Active File because it contains executable commands, not just passive data. The Active File design pattern works in most scripting languages, and is excellently described by >> Nat Pryce on his website.

Advantages of using the Active File pattern:

  • No more need to write a parser. source invokes the TCL parser which does the job.
  • Easy to read data file format.

Disadvantages of using the Active File pattern:

  • If the data file contains dangerous commands a la exec rm *, they get executed and can cause serious damage. You can solve this by executing the active file in a 'safe interpreter' that blocks the dangerous commands. See 'safe interpreter' in your TCL manual.

Limitations of the Active File pattern:

  • This pattern does not work for all possible data formats. The format must be line-based, and every line must begin with a keyword. You write a TCL procedure with the same name as the keyword, turning the passive keyword into an active command. This also implies that you cannot use keywords such as if or while, because TCL does not allow you to write procedures with those names. In fact, the reason why I changed the keyword text into the command d_text in our example, is because Tk already has a reserved word text for creating text widgets.



Next time

Next time, in part two, we will look at some easy tricks to make the persistent data easier to read by humans, without making it any harder to parse. And we will see some more advanced examples of how you can flex TCL into handling more complex data files for you.


0.4.0 Copyright to all articles belong to their respective authors.
Everything else © 2024 LinuxMonth.com
Linux is a trademark of Linus Torvalds.
Powered by Apache, mod_perl and Embperl.