Tuesday, April 01, 2008

Cola Interface Support
Finished basic support for interfaces today. interfaces are limited to single inheritance for now, will have to fix that to fully expose the .NET builtin libraries.

Interfaces are key for running on .NET and Java due to their heavy use in the standard libraries, as well as the integration into the languages. The foreach() statement was originally implemented in C# 1.0 to require the collection being iterated to implement the IEnumerable interface, but recently I noticed that C# in VS 2005 only checks for the required method signatures, not the actual interface. I'll probably follow suit with Cola.
interface IEmpty {}

interface IHasMethods {
   void test();
}

class Class {}

class ClassChild : Class
{
}

class ClassWithInterface : IHasMethods
{
}

This is typical syntax but I took a different approach from Mono C#, or at least the old grammar I have looked at. Where Mono implements interfaces explicitly in the parser grammar, I am sharing the grammar with the class rules, and checking interfaces during the semantic phase. This just means colac will parse some illegal syntaxes, but emit errors when it finds the illegal constructs, such as declaring a field inside an interface, or declaring a method with a body.

Time will tell whether this approach works better, I favor it because it would seem to allow better error reporting than implementing in the parser layer only, which often results in a cryptic "syntax error". It does require a bit more special case code in the semantic checker, but at least the errors have more context.

Monday, March 31, 2008

Summer Internship
I have a really cool summer internship available at itech to develop Cola and Cola related technology.
  • Work on a real compiler for pay
  • Visual Studio integration package (the basics are there, but lacks debugging, ASP.NET designer, etc.)
  • Start the JVM code backend
  • Start Parrot PIR backend
  • Start on the Eclipse plugin

itech is located in Byron, GA south of Atlanta

If you are interested contact jobs001@itechdata.com WITH your resume. Bright people only, preferably with a CS background in compiler development.

Cola Array Range Notation and Range Lists One of the common things I use Perl or UNIX command line tools to do is process either CSV or fixed width data files. Even though Perl has some nice regex notation, I've found often that the UNIX 'cut' utility is more direct and clear than a Perl regex, substr or pack/unpack. Say I'm given a fixed width ASCII file with first name, last name, zipcode and a lot of other fields and I'm told to create a new file with only the 3 fields listed above. The spec says each field is 10 characters wide and the first, last and zipcode fields are the 1st, 2nd and 7th fields respectively. In UNIX, to extract, its as simple as:
cut -c0-9,10-19,60-69 myfile.dat > newfile.dat
I don't know of any briefer way to write it than with 'cut'. With Perl I'd usually use a regex or substr for this.
# Perl
 while(<>) {
  print substr($_, 0, 10) . substr($_, 10, 10) . substr($_, 60, 10) . "\n";
 }
Not as clear, and usually where the substr becomes less clear is it doesn't match up with the actual START:STOP ranges given in the specs or something like an Oracle SQL Loader file definition, so I have to remind myself that Perl substr() uses START:LENGTH notation, not START:END notation. If the problem is a bit more complex, like splicing in new fields, substituting field values, etc. the Perl is usually the way to go to, though I'm sure some shell gurus will argue, but by the time I have to use three (cut + sed + awk) I just prefer one (Perl). For Cola I'm playing with the idea of range notation and range lists for both arrays and strings (which I treat as arrays).
 // Cola
 string s;
 while(s = readln()) {
   print( s[0..9, 10..19, 60..69] );
 }
Its easy to implement this use as an rvalue, but eventually people want to assign to it (lvalue) and so the question becomes what is the type of the expression:
   foo[0..9, 10..19]
As a rvalue, I expect the type to be of whatever the type of foo is, so if is an array of int, the expression should be a new array of int with only those ranges. The question becomes can I assign to that in any meaningful way? The tangent to this is immutable (or mutable) strings. Click said link for hours of reading from fanatics from either camp (see my post on mutable strings .. link to be added). C# and Java don't have mutable strings, but Perl has a smart string. I'm leaning towards smart strings. For the sake of this dicussion forget immutable strings, or forget we are talking about strings, and assume arrays. Is it useful to be able to assign to an array with range notation?
   foo[0..2] = {1, 3, 7};   // sure
   foo[0..2, 5..7] = {1, 3, 7, 11, 13, 17};  // makes sense to me
   foo[0..2, 5..7] = { {1, 3, 7},  {11, 13, 17} };   // nested arrays ?
The problem with the 3rd example is it lacks orthogonality? The principle of orthogonality would dictate if I evaluate the expression as type X, then when I assign to it I must assign to type X, not X[]. Other points to ponder. If the target range is not the same size as the source range, what happens?
   foo[0..2] = {1};   // leave range 1..2 untouched or collapse and discard them
   s[0..2] = "ABCDE"; // replaces outside the range, illegal or squeeze it into the array?
I think the collapse or expand has more practical utility use in this notation, however its probably more bug-prone. An option is to generate a compiler warning about this sort of expression, the other option is to just make it illegal. Perl's substr() assign is illegal if the assignment is out of range or src size > target, so it doesn't try to do any magic. The question is where does Cola fit. I've added in some Perly features, but still stick closer to C#/Java syntax and Cola is in fact a staticly typed language, but C#/Java do not do compile time bounds checking on arrays, and this range feature falls right in line with that.

Sunday, March 30, 2008

Cola Visual Studio Integration

Typically the first question I am asked about Cola is whether it integrates with Visual Studio. "Integrates" is subjective, since Cola does both generate valid .NET CLR assemblies and can reference any other CLR assembly, but what folks really mean is "can I write Cola inside Visual Studio and compile, run and debug it." Up until today the answer was no...

Tonight I finished the initial plugin for VS 2005 using the VS SDK. (Boy do I hope the Eclipse integration is easier, but since the Java target for Cola is likely months away, I'll burn that bridge...).

If you don't know, VS SDK allows you to build extensions or plugins for Visual Studio that can do anything the builtin languages can do, since Microsoft uses VS SDK internally for the production languages. Several things we can do in Visual Studio as a plugin include:

  1. Recognize .cola source files and open an editor with syntax coloring, codesense or whatever else we want in the editor
  2. Provide and recognize a "Visual Cola" project with various project templates
  3. Compile the project from inside Visual Studio
  4. Run and debug (breakpoints, tracing, watches) Cola programs within Visual Studio

I accomplished (1) by instantiating the builtin C# editor since Cola is at least 90% source compatible with C# (and Java). We get basic keywords and constants and formatting but we don't get codesense. This requires a custom Cola language service which will come later.

(2) and (3) are done as well, but (4) will require a bit more work. Cola will generate a .pdb file (program debug database) but the debugging support still needs to be implemented in the VS plugin.

Just getting the basic framework for the plugin was a relief as I've been avoiding it for a long time. I am hoping to get a summer intern to work on fleshing out the Visual Studio integration for actual release. As of 0.102.0 its too rough to realistically use, as is colac due to its limitation of only accepting one .cola file at a time (no multi-file assemblies yet).

I'll post an update when it is done.