Newing off floaters: an analog in Perl

Background

In InterSystems Caché (when you’re not using procedure blocks), all variables are of global scope — if Function A sets a variable and then calls Function B, Function B can see all the parameters Function A passed it plus any other variables Function A set (or that Function A’s caller set, or that Function A’s caller’s caller set…)  Once a Caché global variable (in Caché parlance, the term global has a different meaning, so we refer to these as floaters…) — once a Caché floater has been set, it floats throughout the system, as you’d expect of a global variable.

Mitigating the Floatitude

Having many such variables floating through the system de-modularizes the system and makes it easy for a change in one area of the system to cause unintended side-effects in another area of the system.  This effect is somewhat mitigated by Caché’s NEW command.

The NEW command creates a new “stack level” at the point it is invoked such that the global variable appears undefined from that point on, until the end of the block where the NEW appears.  If the global variable is set to a different value after NEW has been called, the value persists only while the NEW is in effect, and once processing falls of the end of the block where the NEW command was called, code accessing the global variable will see the value it had before NEW was called.

Perl

All this to say, I just read about a function in Perl that looks just like Caché’s NEW command: the local function.

From Temporary Values via local() in the Perl subroutines doc:

A local() modifies its listed variables to be “local” to the enclosing block…and to any subroutine called from within that block. A local() just gives temporary values to global (meaning package) variables. It does not create a local variable. This is known as dynamic scoping. Lexical scoping is done with “my”, which works more like C’s auto declarations.

…All listed elements must be legal lvalues. This operator works by saving the current values of those variables in its argument list on a hidden stack and restoring them upon exiting the block, subroutine, or eval. This means that called subroutines can also reference the local variable, but not the global one.

And from the top of that section:

NOTE: In general, you should be using “my” instead of “local”, because it’s faster and safer. Exceptions to this include the global punctuation variables, filehandles and formats, and direct manipulation of the Perl symbol table itself. Format variables often use “local” though, as do other variables whose current value must be visible to called subroutines.

This sounds similar to InterSystems’ advice to use procedure blocks when writing new code.

My guess is that people tend not to write enterprise applications in Perl and that because of this the maintainability issue isn’t as pronounced there.

Procedure Blocks: A Low-Level Look

I wrote up the following about InterSystems Caché Procedure Blocks as a twiki post when I was trying to drive our internal adoption of their use.  The first revision was created in December 2005, but I continued to revise it as I learned more about procedure blocks (particularly the error trap issue) through 2006 and 2007.

The examples focus on procedure blocks in routines, but the concepts are the same when you’re writing classes, since Caché classes (and individual methods thereon) can also be marked ProcedureBlock or not.


0. Introduction

Marking your class as [ProcedureBlock] can add goodness to your code. Procedure blocks change the rules around so that instead of having to be ever vigilant with newing off all variables or you’ll float variables out into the rest of the system (default to danger), all variables are local unless explicitly specified (default to safety). If it weren’t for the killer error trap issue (see section 2), procedure blocks would be a win all the way around.

0.1 Why Haven’t We Used Procedure Blocks More?

There are several reasons we used to think we couldn’t use procedure block.

  1. We didn’t think procedure block functions worked in the presence of xecute or the indirection operator (@)
  2. We didn’t think procedure block functions worked when you needed to new variables off by indirection (e.g., @$$VariablesToNewOff^MyRoutine)
  3. Procedure blocks don’t show up properly in the stack in the error trap (^%ER)

Procedure block functions do support #1 and #2 – see section 1.5 for details. Number 3 is the big concern – this issue makes the error trap of more limited use when debugging procedure block code. See section 2 for details.

1. How Procedure Block Functions Work

This section introduces procedure blocks and demonstrates some interesting points about how they work. For a more complete discussion of the syntax, see the User-Defined Code chapter of Using Caché ObjectScript, in the Caché documentation.

The basic idea of procedure blocks is that instead of variables floating in and out by default, only variables whose names start with the ‘%’ character float in and out by default. If your method needs to see or use a non-‘%’ floating variable, you can specify a public list [ PublicList = (a,b,c) ] attribute and then new a,b,c in your code as usual. Procedure blocks do not protect from floating %-variables (variables whose name starts with a “%”), so if you were newing off a %-variable before, you still need to with procedure blocks. An individual method in a class can be marked not procedure block if necessary by setting [ ProcedureBlock = 0 ] .

The following examples are written using Caché routines rather than classes. Methods and ClassMethods in classes work the same way (they compile down to .INT code that looks like these examples).

1.1 Basic Syntax

A Caché function is turned into a procedure block by enclosing the body of the function in curly braces, like the SayHello function in the following example:

 //Example 1: Demonstrates the declaration and calling of a simple procedure block function
EXAMPLE1
  do SAYHELLO()
  quit

SAYHELLO() public
{
  write "Hello",!
}

Notice the two minor syntactic differences between procedure block and non-procedure-block functions:

  • Procedure block functions are always called with open and close parentheses, even if there are no parameters.
  • Procedure block functions default to private, meaning they can only be called within the same routine. To make a procedure block function available to other routines, use the public keyword (as shown above).

1.2 Explicit Specification of Floating Variables

When a function is a procedure block, variables do not float in unless explicitly specified (variables whose names start with ‘%’ being the exception to this). Consider the following example.

 //Example 2: Demonstrates that floating variables not explicitly specified
 //by a procedure block function are not accessible from within that procedure block.
EXAMPLE2A() public
{
  D DOSOMETHINGRANDOM
  I MyError D  QUIT     ;will get <UNDEFINED> error here
  .  W "EXAMPLE2: Couldn't do something random: "_MyErrorMessage,!
}

DOSOMETHINGRANDOM
  S MyError=0,MyErrorMessage=""

  ; Try to do something random
  ; ...

  I $$randoverflow() D  QUIT
  .  S MyError=15,MyErrorMessage="Randomness overflow"

  ;... do other things...
  Quit

In this code, EXAMPLE2A cannot see the MyError and MyErrorMessage variables that float out of the DOSOMETHINGRANDOM label, and an <UNDEFINED> error results. To allow a procedure block function to see certain floating variables, specify them within square brackets. See Example 2b:

 //Example 2b: Demonstrates how to specify variables that should be allowed to float in.
EXAMPLE2B() [MyError,MyErrorMessage] public
{
  D DOSOMETHINGRANDOM
  I MyError D  QUIT
  .  W "EXAMPLE2: Couldn't do something random: "_MyErrorMessage,!
}

1.3 When Procedure Block Functions Call Functions that Float Data Out

This set of examples demonstrates that though a procedure block function can only see floating variables explicitly specified in the square brackets, it can still cause variables to float out.

 //Example 3a: Shows that a variable does not float into a procedure block function after
 //a function is called to set it.
EXAMPLE3A() [MyVar] public
{
   new MyVar,@$$VariablesToNewOff^MyRoutine
   set MyVar = "MyFieldName"
   do SetVarsAFloatin^MyRoutine
   write "EXAMPLE3: MyVarDescription="_MyVarDescription,!  //will get <UNDEFINED> error here
   do USEFIELDDESC
   quit
}

USEFIELDDESC
   write "USEFIELDDESC: MyVarDescription="_MyVarDescription,!
   quit

Removing the erroring code will show that, though the MyVarDescription variable is not visible from EXAMPLE3B, it did get set and floats in to the USEFIELDDESC function.

 //Example 3b: Shows that though a procedure block cannot see a floating variable created by a
 //function called by the procedure block, the variable does exist and floats into non-procedure block
 //functions
EXAMPLE3B() [MyVar] public
{
   new MyVar,@$$VariablesToNewOff^MyRoutine
    set MyVar = "MyFieldName"
   do SetVarsAFloatin^MyRoutine
   do USEFIELDDESC
   quit
}

If the variable does need to be accessed by the calling procedure block function, it should be added to the list of public variables.

 //Example 3c: Shows that if a procedure block function needs to access a floating variable created by a
 //function the procedure block calls, the variable should be added to the list within square brackets
 //at the top of the procedure block.
EXAMPLE3C() [MyVar,MyVarDescription] public
{
   new MyVar,@$$VariablesToNewOff^MyRoutine
    set MyVar = "MyFieldName"
   do SetVarsAFloatin^MyRoutine
   write "EXAMPLE3: MyVarDescription="_MyVarDescription,! //Now no <UNDEFINED> error.
   quit
}

1.4 Procedure Block in the Middle

 //Example 4: Shows that variables float into a non-procedure block function
 //even when there is a procedure block function in the middle.
EXAMPLE4
 new MyVar,@$$VariablesToNewOff^MyRoutine
 set MyVar = "MyFieldName"
 do SetVarsAFloatin^MyRoutine
 do IntermediatePB()
 quit

IntermediatePB()
{
 //Show that MyVarDescription is not visible in here
 write "Is MyVarDescription visible in the procedure block function? ", $data(MyVarDescription),!

 do USEFIELDDESC
}

USEFIELDDESC
 //Though not visible in the procedure block function (since it is not in
 //the public list), a non-procedure block function that the procedure block
 //function calls can see the variable.

 write "USEFIELDDESC: MyVarDescription="_MyVarDescription,!
 quit

1.5 Xecute and Indirection in Procedure Blocks

1.5.1 Xecute

The Xecute command (“X”) can be used within procedure block functions. If the xecuted string accesses a variable, the value is that of the floating variable of that name, not any private variable of that name that might have been set by the procedure block. The following example will write 5, even though that is a floating value and A is not specified in the list of public variables.

 //Example 5: Shows that the xecute command accesses floating variables, even when used within a procedure block.
EXAMPLE5
 new A
 set A=5
 do Xecuter()

Xecuter()
{
 set A = "BUBBA"
 set B = "write A,!"
 xecute B //Writes 5 (the value of the floating A), not BUBBA (the value of the private A).
}
1.5.2 Indirection Operator

The indirection operator (“@”) operates similarly to the Xecute command.

 //Example 6: Shows that the indirection operator ("@") accesses floating variables, even when used within a procedure block.
EXAMPLE6
 new A
 set A=7
 do Indirection()

Indirection()
{
   set A = "HUBBA"
   set B = "A"
   write @B
}

Analysis: These constructs bypass the protections gained by using procedure blocks. Care must be taken with code that uses Xecute or the indirection operator to avoid surprises.

1.5.3 Newing Variables by Indirection
 //Example 7: Shows what using a new handler would look like with a procedure block function.
 //This example will compile.
EXAMPLE7() [MyVar] public
{
 new MyVar,@$$VariablesToNewOff^MyRoutine
 set MyVar = "MyFieldName"
 do SetVarsAFloatin^MyRoutine
}

2. Difficulties

2.1 Private Procedure Blocks and Error Handling

When a routine errors off in a private procedure block function (recall that procedure blocks not marked public are private by default), the error trap does not show the name of the procedure block function. Instead, it displays an offset from the last non-private label. This also applies when ZBreaking through a private procedure block function.

 //When PUBLICFUNC is executed, the <DIVIDE> error will be reported as
 //occurring at RANDOMLABEL+7 rather than at PRIVATEFUNC+2 .
EXAMPLE8A
 do PRIVATEFUNC()
 quit

RANDOMLABEL
 //Do something random
 //...
 quit

PRIVATEFUNC()
{
 write "Here in PRIVATEFUNC; about to try something..."
 write 1/0
 write "done.",!
}

2.2 Procedure Blocks and the Error Trap

Even when procedure block functions are not marked private, the frame stack shows up weird in the error trap when procedure block functions are involved. Variables used in procedure block functions don’t show up in the stack, making it harder to use the error trap to debug problems using the error trap. The error trap is such an important part of our debugging process that this issue alone makes it difficult to want to switch over to procedure blocks (this issue is rumored to be fixed in Caché 2007.2).

3. Myths about Procedure Block Functions/Methods

  1. Myth: If my method is procedure block, it can’t cause variables to float out into the rest of the system (unless they’re listed in the method’s PublicList).
    • Fact: If a function that the procedure block function calls does not new off a variable, it floats out of that function – the procedure block function will not see it (unless the variable is in the function’s public list), but if the procedure block function’s caller is non-procedure block, it will be able to see the floating variable once the procedure block function returns. See section 1.3. (Also, procedure block functions offer no protection against variables starting with the ‘%’ character – these will float out of even procedure block functions and should be newed off if you want to avoid that.)
  2. Myth: Procedure Block functions implicitly new off all the variables they use (or implicitly do an argumentless new)
    • Fact: Procedure block functions do not perform an implicit new. (This is shown by the example in section 1.3, where if the procedure block function calls a function that floats a variable out, it floats back to the procedure block function’s caller. This would not occur if the procedure block function performed an implicit new.) Instead, procedure block functions use different visibility rules that determine what variables they can see, without the use of new.

— DanielMeyer – 26 Dec 2007