Home Blog Docs Tutorials Scripts Tests About

Tags

2009

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov

2008

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2007

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2006

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2005

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2004

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2003

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2002

Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Pel Tutorial Too

Perl Best Practices Continued

Require a flag in front of every cmd line argument except filenames. Use -- to indicate end of args. Standardize meta options: --usage --help --version. If possible, allow the same file to be used for input and output with IO::InSitu. Use Getopt::Long for processing command lines. It emulates C behavior. Make documentation consistent with behavior. Factor out command line processing between common scripts.

Don't introduce needless objects or over complex frameworks. Don't use pseudohashes or restricted hashes. Inside out objects handle the same problem as restricted objects. Getters and setters isolate objects callers from changes to the object. Call all constructors new. Don't clone in your constructor, call it clone. Inside out objects need destructors.

Traditional objects are hash references. They can be modified outside a class. Inside out objects use a separate hash for each attribute:

my (%height, %width);
sub new {
  my $self = bless {}, shift;
  my %args = @_;
  width{$self} = $args->{width};
  height{$self} = $args->{height};
  return $self;
}

This restricts access to package scope and solves the typo problem. But you need a destructor:

sub DESTROY {
    my $dead = shift;
    delete $height{$dead};
    delete $width{$dead};
    for (our @ISA) {
    	my $can = $_->can("DESTROY");
	$dead->$can if $can;
    }
}

The modules Class:Std and Object::InsideOut support inside out objects. Provide separate read and write accessors. Don't use lvalue accessors, they require direct access to a variable. Don't use indirect object syntax, it can be parsed in a way you don't expect. Only overload operators that make sense. Provide sane coercions to Boolean, Numeric, and String, if possible. Do not manipulated @ISA, "use base" or Class:Std. Never use the one argument bless. Pass constructor args as a hashref. This catches an odd number of args at the caller. All classes should separate creation with initialization to simplify multiple inheritance. Call them BUILD. Object::InsideOut handles a lot of concerns automatically. Specify coercions as attributed methods. This is provided with the above mentioned frameworks. Don't use AUTOLOAD.

Design a module's interface first and name things from the user's perspective. Don't use vstrings. Use the version module from CPAN. Enforce version requirements programatically with use. Once set, a default export list can't change. If possible only export on request. Have a look at Perl6::Export::Attrs. Export subroutines, not data. You can build module frameworks with ExtUtils::ModuleMaker or Module::Starter. Use cor modules when possible. See "perldoc perlmodlib". Module::Corelist has info on when modules were added to Perl. Use CPAN modules where feasible. It allows communities of interest to form.

Write tests first. Turn your examples into tests. Create tests before coding. They should fail. Code to pass the test. Use Test::More. Read Test::Tutorial. Write test cases to make sure failures actually fail. Pair with others to write tests. Test minimum and maximum values. Test empty strings, multiline strings, non-numeric data for numeric parameters, undef. Test every bug you've ever encountered. Add test cases befor debugging. Use strict. Use warnings catches many user errors. In the debugger m followed by class or instance names prints the methods on an object. Reserve warn for debugging code. Or use Log::Log4Perl. I have an article on trapping CGI errors and sending them to email.

Use revision control. It is essential for team programming. It is great for sharing updates. It is helpful for seeing when something broke. It prevents potential loss from rm or disk failures. My favorite RCS is git. I recommend it highly. There are wrappers to git to support cvs and svn commands. There are problems with Windows, but it is portable to Unix like systems. Integrate non-Perl code with Inline:: modules. Consider Config::Std for configuration files. Don't use formats, consider Perl6::Form instead. Don't use Ties. Don't write clever code. If you must be clever, hide it well. Don't optimize until you benchmark. Use the Benchmark module or Devel::DProf. Devel::Smallprof profiles at a statement level. Devel::Size tells you the overhead of a Perl object. Look for oportunity to cache. For example, subroutines that return the same results. You can do this automatically with the Memoize module. But benchmark any caches. Use Perl::Critic to enforce guidelines.

Persistent Perl Data

Arrays can be persisted in a flat file, one item per line. But data cannot contain newlines or undef data. But this is too simple. Data::Dumper converts a data structure to Perl code. It's smart enough to handle cycles. Reading the file in with do will recreate it. But this is dangerous, unless the data is secured. YAML is a serialization format that works with most languages. Use the YAML subs DumpFile, LoadFile. Storable allows us to read and write arbitrary data, including some coderefs. But it is a binary format. It's in core as of 5.8. Tie::Persistent marks a variable as persistent.

use Tie::Persistent;
my $storage, Tie::Persistent, 'file';

DBM allows you to update part of a hash. Dbmopen is the old calling interface. But the keys must be strings and the values must be simple scalars. MLDBM is like DBM, but the values can be complex data. tie my %data, MLDBM, 'file';

But it only updates the top level data. DBM::Deep creates a hash that lives on disk. It's pure Perl!

use DBM::Deep;
my $data = DBM::Deep->new("file");

DBI is Perl's sql database interface. You create connection and statement handles.

use DBI;
my $dbh = DBI->connect($data_source, $user, $auth, \%attr);

Typical attributes are PrintError, RaiseError, AutoCommit. If you replace connect with connect_cached, DBI maintains a cache of connections. If you set RaiseError to 1, calls will die if they throw an error.

my $sth = $dbh->prepare($sql_statement);

Quesion marks in the statement are place holders for data values. Then you call

$sth->execute(@placeholder_values);

DBI protects against sql injection attacks. Placeholders are replaced by values passed in the execute method. To specify a value is numeric or string, use the bind_param method, usually it guesses right from the value passed. The simplest result method is fetch, which returns the values as an array ref. Also known as fetchrow_arrayref. There is also fetchrow_hashref. It makes the field names the hash keys. You can also use fetchall_arrayref, which returns an arrayref of arrayrefs, and fetchall_hashref, which returns an arrayref of hashrefs. The selectrow, selectcol, and selectall combine prepare,execute, and fetch into a single call. The do method returns the number of rows affected by the operation. begin_work and commit bracket transactions.

$dbh->begin_work;
eval {
...
}
if ($@) {
  $dbh->rollback;
  die $@;
} else {
  $dbh->commit;
}

Sub available_drivers gets the available DBD drivers. Sub tables gets the tables in a database.

Class::DBI makes each table in your database a class. Each row is an instance of the class and the columns are attributes fetched and stored with named accessors. Foreign key accessors fetches objects frim the named table. So for 95% of your coding you don't need to write sql code. But it doesn't stop you from using hand written sql. Column data is loaded lazily, but you can override that. It assumes a single column primary key. Foreign keys are only partially supported. You define methods table and columns to define a table and has_a to define foreign keys. Class::DBI::Loader will define tables automatically. You must call save or update to flush changes to the db.The subs search and search_like query on column contents.

The connection method defines how to connect to the database. The table method defines a table in the database. Method sequence defines a sequence to fetch the next primary key. Methid create creates a new row, find_or_create does as expected, delete deletes a row and cascades through related records. To find a row by key use the method retrieve. There is also retrieve_all. Methods search and search_like retrieve by non-key columns. There are trigger methods that can be called: before_create, after_create, before_set_$column, after_set_$column, before_update, after_update, before_delete, after_delte. The method add_constraint method can restrict updates to pass a test. There are methods normalize_column_values and add_column_values. The method accesor_name rewrites a field name into a database column name. Update flushes column changes to the datbase. The methods dbi_commit and dbi_rollback perform transactions. The method has_a defines a foreign key relation. The method has_many defines the reverse relation. The might_have defines a one or zero relation. The add_constructor method allows you to add accessor methods defined by sql queries. The retrieve_from_sql method allows you to do raw sql.

Columns can be defined as Primary, Essential, or Other. Essential columns are always fetched, Other columns are fetched with a separate select as needed. Class::DBI::SbstractSearch is a wrapper around SQL::Abstract, which allows complex queries.

Rose::DB::Object is another ORM, like Class:DBI. I prefer it to DBIx::Class because it is faster, newer, and actually supported. It is well designed. Output of its loader generates Perl code that can be saved and edited. It contains an abstract database object, that maps devel, test, and production database through a single interface. It handles dates through the DateTime package. Create a subclass of Rose::DB, e.g. Application::DB. Applicatiom::RDBO subclasses Rose::DB::Object and Application::RDBO::Table maps a table, Application::RDBO::Table::Manager maps its manager. Mappings have conventions, but they can be overridden. The make_modules method in the loader creates Perl code. You need to call save to add a new row to the database. Load fetches from database by primary key. Queries can have complex structure. The manager object defines iterator methods. It will handle inner and left joins where needed.

Google "site:stonehenge.com DBI" or Rose::DB or DBM::Deep for more info on the topics of this talk.

Posted on Fri, 29 Jun 2007 Tags:

Perl Tutorial

Perl Best Practices

Indent like K & R: opening brace at end of first line, closing brace on line by itself. Put a space betweeb keyword and paren. No space between sub name and arguments or array and index. Don't use parens for built ins if possible, to distingush them from from subs. Separate complex expressions from enclosing brackets. Add whitespace around binary arguments, but not after prefix unary operator. Put a semicolon after every statement, even at the end of a block. Exception: if then else on single line. Include a comma after every list element, including the last. Damian says 78 char lines, but I say keep shorter. This make cut and paste easier. Indent with spaces, not tabs. If you use tabs, put them all at the start of line. Never put two statements on a single line. Code in commented praragraphs. Put a one line description before each paragraph. Align corresponding elements vertically. Break long expressions before an operator. If it simplifies things, break up expressions. It helps in debugging. Break an expression at the lowest precendence operator. Parenthesize long lists.

Related things should have similar names. Multiword variable names should start with adjectives. Sub names should start with verbs. Boolean should be named after their test, often starting with is or has. Give arrays plural names, but hashes singular names. Use underscores to separate words in ids. Capitalize package names and make constants all caps. If you abbreviate, use a prefix, unless there is a convention. Don't abbreviate if it makes the meaning ambiguous. Don't use ambiguous names like last, set, left, or right. Prefix internal variables with an underscore.

Use q{} for the empty string, because it stands out. Use the Readonly instead of use constant. Use underscores in long numbers, they are ignored. Use here docs for multiline strings. Always use single or double quotes around the heredoc terminator. Use => for key value pairs. Instead of using commas for a sequence, use a do block. Reserve and, or, not for outer control flow. Parenthesize lists. Use any() from List::MoreUtils. Use

my %ACTIONS = map {$_ => 1} @ACTIONS;

for testing membership.

Use lexicals, not package variables. Always localize any temporarily modified package variable. Initialize any localized var, to avoid getting undef. Always localize changes to special variables.

my %slurp = do {local $/; <HANDLE>};

If you "use English", "use English qw(-no_match_vars)". Localize $_ if you use it in a sub. Otherwise you can break calling code. Use negative indices to count from the end of the array. Use slices when getting related hash array elements.

@frames[-1,-2,-3]  = @active{'top', 'prev', 'backup'};

Only use postfix if for flow control. Avoid C style for loops except for complex expressions. Avoid subscripting arrays and hashes in loops. Use Data::Alias or Lexical::Alias for aliasing subscripted items in a list or hash. Always use my with foreach variables. Use map instead of foreach to apply an expression or function to a list. Use grep and first to search a list. First is in List::Util. But transform a list in place with foreach.

$_ = make_bigger($_) for @items;

If a map block is complex, make it a sub. Modifying $_ in map or grep modifies the input is usually a bad idea. Instead make a copy of $_. Use a table lookup instead of cascaded elsif tests. Don't use do-while loops. Instead use a redo in a naked block. Reject as many cases as early as possible in a loop with next. Single exit loops are a canard. Use redo if you want to reprocess the same item in a list. Label loops used with last, next, or redo. The label should be the name of the thin being processed.

User docs should go into POD. Maintainer doc can go into comments or a separate POD. Consider using Module::Starter. Place POD at the end of a file after an __END__. Use end of line comments for the weird stuff. Comment anything that has puzzled or tricked you. Have someone proffread your docs.

Don't do a special func to sort in reverse order, reverse the sort result. Use scalar reverse to reverse the chars in a string. Us unpack to parse fixed width fields. Use split for simple variable width fields. For CSV files, there's Text::CSV_XS. Avoid string eval. It's harder to debug, slower, and can create security holes. Sort::Maker is your friend. Values can be used as an lvalue.

$_ *= 2 for values %somehash;

Use Time::HiRes for subsecond sleeping. "Time::HiRes qw(sleep);". But "select undef, undef, undef, $interval;" also works. To disambiguate a code block from a hash ref, put a semicolon as the first char in the block. Scalar::Util and List::Util are your friends.

Subs are good. Always use parens on subroutine calls. Don't have a subroutine with the same name as a built in. Upack @_ into named arguments. Use named parameters if a sub takes more than three args. Use a hashref.

my %options = %{+shift};

Use defined to test for existence.

if(defined(my $directory = shift)) 

or

if (@_) {my $directory = shift;}

Set default arguments early. Indicate a scalar return with "return scalar". If a sub returns a list, make it act intuitively in scalar context. Don't use use subroutine prototypes. Always use explicit return. Use "return;" to return an undefined value.

Don't use bareword filehandles, use indirect filehandles. In modern Perl:

open my $foo, "<", $omeinut or die;

To print to a filehandle in an aggregate, enclose it in braces. Use IO::File to open files. Use the three arg form, for safety's sake. Add error checking or die! Include the $! as well.

die "Cannot open $filename: $!";

Use a while loop to walk throuh a file, not a foreach loop. Process files a line at a time, if possible. Slupr using a do block:

my $entire_file = do {local $/; <$in>};

Avoid reading from STDIN, reac from ARGV instead. Prompt for interactive input. But don't prompt if it's part of a pipeline. You test for interactivity with -t. Use IO::Prompt. Use progress indicators for long running programs. Estimate percentage of progress if possible. To set autoflush on a file use the autoflush method in FileHandle.

Use weaken to handle cyclic structures. Weaken up references. Weaken is in Scalar::Util.

Always use the x modifier on regexps. Use m modifier for multiline strings. \z always matches end of string. It doesn't match \n at end of string, like $. Prefer m{} for multiline regexp. Don't use other match delimeters. Use square brackets to escape single characters. Use capturing parens only when capturing. Use "(?:...)". Always test a match before before using match variables. Prefer named captures. Tokenize input by using "inchworming":

/\G.../gc

Build complex regexps from simpler pieces. Regexp::Common contains common regexps. Prefer character classes to single character alternations. Refactor your regexps for efficiency.

Don't return funny values, die instead. Use fatal to turn built in errors into dies. Remember system() returns zero if everything works, so negate it before testing the return value. Use croak if error when checking input. make your error messages meainingful to the user. It is all you are likely to get in a bug report. Throw an object exception instead of a string in order to pass info to the catcher. Related exceptions can be grouped by inheritance. Exception::Class for object exceptions. Capture $@ in your error handler, because your code can modify $@.

Posted on Thu, 28 Jun 2007 Tags:

Last Day Talks

Web Services 101

Check out the modules SOAP::Lite and SOAP::WSDL. Th WSDL module is flaky. But I think SOAP is yesterday's technology. REST uses the verbs GET, PUT, DELETE, HEAD. XML::Simple can transfor the return structure into a Perl data structure. XML::Xpath is more efficient, but more complicated. S3 is Amazon's object stoe. It hosts data which can be publicly available. EC2 sells computer cycles for batch processing and makes the results available via S3. Check out Yahoo's javascript tools.

Dynamic Web Translation

We use petal, a perl version of TAL. it comes with Petal:I18n, which uses Gnu gettext, gnu's internationalization code. I18NFool is a utility that scans marked up code and creates a gettext database. The drawback is the extra computation needed. To solve this problem, you can run Petal::I18N ahead of time. There is a problem with data stored in datbases, error messages, etc. Perl uncode is still a problem because not all modules handle it correctly.

Perl Date Time Project

The motivation for the Datetime packagewas rationalizing the different packages, none of which handled time zones properly. DateTime::TimeZone has the Olson database. It is maintained by David Olson. There are canonical time zone names consisting of continent or ocean and city. OS epoch times differ and have a limited range. Times should always be stored in UTC. For presentation use the viewer's local time zone. It handles arithmetic, conversion between calendars, and recurrences. Recurrences support a next method. Recurrences support set operations, such as union, intersection, and complement.

Bag of Tricks

Use ajax for server side validation before submit.

Lightning Talks

String::EscapeCage is a package for caging strings, so that it escapes strings input to the program and dies if unescaped strings are used.

Continuity is a continuation based web framework. Each user runs in a separate thread. It resembles Seaside.

File::Copy does not preserve permissions. It cannot do recursive or secure copies. The optional third argument, which is the buffer size can bite you. So use cp instead.

Devel::Cover reports on code coverage of test cases.

The game development industry in moving to C#.

Posted on Wed, 27 Jun 2007 Tags:

More YAPC Talks

Workflow Automation

We have no control over the processes our workflow talks to, so we must handle chanes to services, service down or incorrect, etc. Intermediate data is written to files, files are organized by folder per step, with error subfolders.

Test Harness

A discussion of Object::Execute. The idea is to abstract out the boilerplate test harness code. O::E is a little language for constructing tests. $DB::single =1; sets a breakpoint in the debugger. We create a closure from the object method and its arguments so it can be rerun.

How not to do an interview

The purpose of a resume is to get an interview. See if it looks good as a whole and have someone review it. There is a "European model" resume.

Gantry and Bigtop

Bigtop generates a default user interface from the database schema. Gantry is easy to configure and easy to create an application. The Gantry Book is available from lulu.

Catalyst

Uses DBIx for the ORM. it supports many databases and templating languages. perlbal is a Perl based load balancer used by LiveJournal and Fox. Catalyst blogs are aggregated by planet blog aggregator.

Managing Large Web Applications

Always release full builds, not individual files. We release .tgz files. The CPAN shell will grab the latest version, a problem. We wrote our own installer. Specify the version of everything. Perltidy made all the indentation consistent. We started out with Config::ApacheFormat. Things you rarely change shouldn't be in a config file. We infer default values from config values. This requires config dumper. Most projects require two branches. The main branch is the development branch. But it must be stable, tested. When releasing tag the main branch. Maintenance work for production is a branch from the release. When doing a maintenance release, merge the release branch back into the main branch. Tag 2.1 on the 2.x branch and merge 2.0 --> 2.1 on the main branch. We wound up using Test::Class, Test:: Harness and Test::WWW::Mechanize and Selenium.

Mochkit

Mochikit has a functional programming interface. listMin gets the minimum value in a list. The formating has numberFormatter that returna a function that applies formatting. It creates iteratprs. Mochikit has great documentation. It doesn't clobber other javascript toolkits. Mootools has a very small footprint and may be better for small javascipt applications.

Posted on Tue, 26 Jun 2007 Tags:

YAPC Talks

Programming is Hard Let's Go Scripting!

Memes say that scripting is defined various ways, but I think it means easy access. It is a social not a technical concept. Basic was one of the first scripting languages and was the first language I learned. The first scripting I wrote was JAM. It was an inside out language like PHP, only before HTML. In college I used lisp, my own personl McCarthy era. The frustrations of Unix shell scripting led to Perl. After Perl came Tcl. Tcl stayed with the Unix mind set of sets of tools and lacked an extension mechanism. PHP is making the same mistake Perl did, only slower.

Languages differ not so much in what you can say, but in what you have to say. If a language forces you to day something, you can't say it concisely. Binding is when you decide which function will be called. Most scripting languages are late binding. C++ makes you have to say a function is virtual to get late binding. In Perl all methods are virtual, in Perl 6 they will be optimized. Perl 6 will support multiple dispatch. The routines themselves decide which will be called, which is messy, but probably to the best. In Perl 6 scalar evaluation is eager and list evaluation is lazy. Perl 6 supports strong typing in order to support multiple dispatching. Perl 6 delays the decision of whether an expression is scalar ot list until run time, which fixes some bugs in Perl 6. Perl 6 introduces "twigils" which control scoping, like sigils in Ruby.

Perl Critic

Static analysis of Perl source code. Does not compile code. It's like lint. It runs as a utility when the module is installed. It prints the page number from Perl Best Practices. It has different severity levels, from 1 to 5, with % the most severe. It is configurable from the command line with the -exclude command. Or you can create a .perlcriticrc file in your home directory. "exclude=" uses pattern matching [-Policy] must be explicit. Policies also accept arguments for fine grained control. You can set the severity level of policies. Themes allow you to group policies together. Themes act as filters and can be used as boolean expressions. You can turn off checking with "## no critic" and "## use critic" pragmas. There is also the "use critic;" pragma. "use Test::PerlCritic".

Exception Handling, Logging, and Parameter Validation

Failures should be predictable and happen early. Never assume your code has ood input. Your users will not check return values. Perl exception handling id sone with die and eval blocks. $@ contains the die string. It is cleared by a try block. It is always defined but has a zero length if no error. Since 5.005 $@ can be a ref and is stringified. But some dies are string so you must chack to see if it is a reference. with Scalar::Util:blessed. Don't use $SIG{__DIE__} because some modules will break. Copy $@ before doing error handling, because it can be modified by code called in the eval block. Use Exception::Class. You declare classes, which can have arbitrary fields.

Log::Dispatch is a loging module that supports multiple types of output and error levels. Log::Log4Perl is like log4j.

Params::Validate validates sub parameters. It checks type, required, class, and value against regex. You can set a callback for checking values. You can set defaults for values. There is also Data::FormValidator for checking cgi-bin parameters. MooseX::Method allows parameter checking for Moose class methods.

Posted on Mon, 25 Jun 2007 Tags:

Back to PAR

I spent the morning hacking on the who is a diva problem, so we can set their supervisor differently, and the afternoon trying to arrange my conference travel and the par deployment meeting.

Posted on Thu, 21 Jun 2007 Tags:

Training

I've been in Python training yesterday and today. When I wasn't in class, I wrote two new ldap reports for Sheryl Bruff and Lisa Morris.

Posted on Wed, 20 Jun 2007 Tags:

Testing and Coding

I put in the fix for goalfreeze date calculation. Then I started working on moving the reviewer update into the Java code. I finished up by testing Greg's code modifications.

Posted on Mon, 18 Jun 2007 Tags:

Finishing Testing

I spent yesterday and today on the PAR testing. There have been no serious problems. I've made a few small changes to make sure the ldap synch doesn't hang. I investigated a problem with subversion permissions that turned out to be a false alarm.

Posted on Thu, 14 Jun 2007 Tags:

Hanging Code

I spent the day looking for the reason why the ldap synch hangs. So far it has proved elusive.

Posted on Mon, 11 Jun 2007 Tags:

Next posts