Saturday, November 16, 2019

Good Programming: Error Handling

Error handling is one of the holy grails of software development and it's easy to see why if you consider that bad error handling can bring down an entire system.
Think I'm exaggerating? I work on an eCommerce solution which fails completely if a product page has no valid image. That means the customer gets a 500 error page instead of the opportunity to buy a product because someone forgot to upload an image.

I've even had discussions with developers who neglect error handling and say: "Well, if this happens, then it's *supposed* to blow up". What "this happens" means varies from context to context but, in my example above, it means "a product has no image" so let's crash.

Now, I'd like to believe that this is oftentimes an oversight rather than a conscious decision. The programmer focuses on the Ideal Path and simply neglects to consider (or is never told) what should happen when the Ideal Path gets bumpy.

Modern coding practices such as chaining feed into this. It looks absolutely fabulous to chain methods together in code... until we get a single method returning unexpectedly and our favourite error, the nullPointerException, brings down the system.

    database()
      .read('something')
      .call('something', user.property()) // Boom!
      .thenDoThis('never happens');

Of course you can use chaining responsibly but you can't always avoid code you call behaving badly. Even if you wrote the code you are calling, it may misbehave, someone may change it, you never know when something UNEXPECTED can occur and tear down your work of art.

The solution? Expect the unexpected! Be pessimistic. Like any good engineer.

In my first job out of Uni I worked for a small bio-startup in Roslyn, Scotland. Yes, Dolly the Sheep, was pretty much sharing office space with us. My boss, a no-nonsense Yorkshireman kept asking me why so much of my code was error handling. I replied that you never know what might happen and he replied: I pay you to code business logic and what I'm getting is 80% error handling and paranoid disaster mitigation. Well, not in those terms exactly: he probably told me to "get yer finger ot and staaat deliverin' t'goods afoor A thump you".

What is the goal of error or exception handling? IMHO it's to be transparent to the user and the operator of the system about what just happened and, ideally, what can be done.

In simple terms it means telling the user that what they tried to do did not work and how to proceed or work around the problem. An example is to show a message "Your file could not be saved, please try again!". This is obviously not very helpful if the problem persists but if the problem is intermittant (network outage to google drive) then it may just work the second or third time around.

Now that we've shown the user what's going on we need a way of recording this incident so that the operator or owner of the software product is also aware of the outage and can take appropriate action. We may log error NET307 to record a network save issue.

In the example above there may be no action the operator or owner can take: Google Drive cannot help it if their user's WiFi is dodgy can they? Or can they? If the product owner is aware that 50% of users run into NET307 each week they may work towards mitigating the pain associated with not being able to save a document as the user's train enters a tunnel and network access disappears.

Thus a sound logging, monitoring and reporting of the error feeds back into the product development cycle to produce new features such as offline saving. The document is saved to a local storage mechanism if the network is not available and later synced up to the online service when the network again becomes available.

Examples of bad exception handling abound: user's seeing stack traces or misleading messages about what just happened. Operators seeing the wrong stack trace or line number (yes, I'm looking at you PowerShell). Proper error handling is about the developer understanding what just happened in their code and translating this into a user-friendly message, an operator friendly log and, ideally, an owner-friendly metric. In this way the target audience is best served.

I am tired of hearing that "it works" when a crappy piece of software runs under some (ideal) conditions. It's like saying that half a barrel floats: indeed it does but you wouldn't want to cross the Atlantic in it. Your software may run in ideal conditions but the world is not ideal.

This becomes more apparent the more connected the world becomes and the more distributed our systems are. If your software fails completely (e.g. the nodeJS process crashes out and your service is dead) because of an HTTP timeout due to a slow network then you need to think about better error handling.

Some people say: "But my module/component is only responsible for X and not for the network. I assume the network is working or my component is useless."

This is a valid argument: it does not make sense for each software component to try manage the failure of the dependencies of all it's dependencies. I do think that each component should manage the failure of it's dependencies. This can mean catching the exceptions of components you use and either passing the exception through or enriching it with additional meaning for your context.

Throwing exceptions was once state of the art but I have come to question the approach in past years. Working for many years with ABAP, the programming language used within SAP, I and others built up layers of classes which all used exceptions harmoniously. The neat thing was it pushed error handling to the bottom (the CATCH block) of our code. Our code could proceed along the happy Ideal Path and readability was improved.

The challenges were that we sometimes had to wrap legacy, old fashioned functions and methods, which did not use exceptions. On the other hand, legacy code which didn't work with exceptions needed to TRY....CATCH our fancy schmancy code and it was ugly.

When I moved to a new company I decided to drop exceptions completely and take a different approach: logging and simple return codes. In this approach all code should log errors and warnings consistently (using a global Singleton log object) and then exit if they encounter an error. When they execute successfully they return a boolean ('X' in ABAP means true) - in all other cases they return nothing ('' == false).

ABAP has a really neat keyword called CHECK which will exit the current block or function if they encounter anything that is not true. So code looks like this:

    CHECK obj->method1().
    CHECK function_call.
    CHECK some=>static_method( 'foo').

The net result is that we completely avoid any error handling in intermediate classes and only need to consider two approaches:

  • Low-level functions should log errors and exit
  • Application-level programs should read the error log and inform the user
There are some downsides (like managing a global log object) but, in general, I'm quite happy with the results. We save all messages which occur in all our programs (even SUCCESS or INFO messages) and can even enable tracing by saving DEBUG messages. These messages are sent to ElasticSearch and can be analysed in Kibana meaning we get consolidated logging and metrics "for free".

In summary I highly recommend considering all stakeholders (users, operators and owners) when developing and being pessimistic about whether our dependencies will behave or not. Log out as much as possible at the correct level and show the user what happened and how to proceed.

A final word to error codes: a unique number is a good thing to have but it can be difficult to manage across many applications. I prefer a kind of hybrid code which is kindof human readable. NETFAIL707 is more useful than 0x003030301.

Monday, December 5, 2016

View and Log Output with Tee

Ever wanted to view and log the output of a command at the same time? I'm sure it's available natively in Linux but not, AFAIK, in Windows. So we need a little tool.

Create a file called tee.cmd and save it somewhere in your PATH. The contents are as follows:
@ECHO OFF
POWERSHELL -c "Tee-Object -FilePath %1"
Now just pipe your command to "tee" and pass the name of the file you want to log to.

Example:

DIR | tee op.txt
You will get a directory listing and it will also be logged to op.txt.

Sunday, November 20, 2016

Visual PowerShell with Visual PowowShell v0.0.1

I'm excited to showcase the UI for PowowShell. Check it out here or feel free to fork or browse the code on GitHub. It's just a mockup to get an idea of the look and feel of designing a pipeline as a sequence of steps in a matrix.

Tuesday, November 15, 2016

PowowShell: A Vision for a Visual Progamming Language for PowerShell

I use PowerShell a lot and, though I'm a programmer at heart, it would sometimes be nice to have a library of re-usable script components which you can just wire up together in new ways.

PowowShell (Github) is my attempt at making such a system. Indeed I want to take it a step further and make it into a Visual Programming Language (like NodeRed) where you just drag and drop components onto a workspace, wire them up and press play.

PowowShell lets you take any command-line utility and wrap it up as a Powow Component. This component can then be used as a Step in a Powow Pipeline. The entire Pipeline is itself a powershell script and also a Component and thus you can use Pipelines as Components within other Pipelines.

A Common Interface

Components have 3 aspects to their interface:
  • PARAMETERS: Basic settings or values for your component. You define these when you add a Component to your Pipeline though they can be dynamic values determined at runtime.
  • INPUT: The type of data that your component accepts (piped in). This must be a string* but it can be any type of serialized data (e.g. JSON, CSV, XML, whatever).
  • OUTPUT: The type of data that your component produces (pipes out). Again, only strings*!
* All INPUT and OUTPUT is a string in PowowShell. This may sound like a limitation but it ultimately ensures components can talk to each other. A major weakness of PowerShell cmdlets today.

Pipelines

The pipeline is defined by a pipeline.json file (this will be built visually like a flowchart in later versions) which could look something like this:
{
 "id": "pipeline1",
 "name": "Send mail to young voters",
 "description": "Read voters.txt, get all voters under 21yrs of age and output the name, age and email as a JSON array",
 "parameters": [],
 "input": {},
 "output": {},
 "steps": [
  {
   "id":"A",
   "name":"Read Voters File",
   "reference":"../components/ReadFile.ps1",
   "input":"",
   "parameters": {
    "Path": "./data/voters.txt"
   }
  },
  {
   "id":"B",
   "name":"Convert2JSON",
   "reference":"../components/CSV2JSON.ps1",
   "input": "A",
   "parameters": {
    "Delimiter": "|",
    "Header": "{\"name\", \"age\", \"email\", \"source\"}"
   }
  },
  {
   "id":"C",
   "name":"Select Name and Email",
   "reference":"../components/SelectFields.ps1",
   "input": "B",
   "parameters": {
    "Fields": "{\"name\", \"age\", \"email\"}"
   }
  }
 ]
 
}

Basically the pipeline has some meta data (name, description) but the real meat is the list of steps. These steps are executed in the order they occur and they get their input from the step they define. In this case:
A -> B -> C
But it would also be possible to pass A's output to another component B1 as in
A -> B -> C
  -> B1

The key thing to notice is that each step references a component. These components are your re-usable pieces of code and should do something useful and configurably reusable.

Components

As an example consider a FileList.ps1 component. It takes two PARAMETERS "Path" and "Filter" and lists the files in that directory. It does not take any INPUT and it's OUTPUT is an array of File data. Such a component is an Extractor in ETL language as it is a source node for the data flow.

There is a built-in cmdlet in PowerShell called Get-ChildItem which lists files. It's also known as "dir" or "ls". We will be using this cmdlet but wrapping it up nicely so that we can use it in a pipeline. In particular we will be making it output a JSON Array. Here's the code:

[Parameter(Mandatory=$true,HelpMessage="The path to the files")][string]$Path,
[string]$Filter
$files = @()
Get-ChildItem -Path $Path -Filter $Filter -Recurse:$Recurse |
  ForEach-Object {
    $f = @{
    name=$_.Name
    fullName=$_.FullName
    size=$_.Length
  }
  $files += New-Object -TypeName PSObject -Property $f
}
$files | ConvertTo-JSON

This component, saved as .\components\FileList.ps1 can be run as follows from POWERSHELL:
.\components\FileList.ps1 -Path C:\temp -Filter *.txt
It will produce an output similar to the following:
[
    {
        "fullName":  "C:\\temp\\test.txt",
        "name":  "test.txt",
        "size":  491899
    },
    {
        "fullName":  "C:\\temp\\op.txt",
        "name":  "op.txt",
        "size":  413431
    }
}

Enough details for now. The idea is that components downstream can accept this array of objects and process them in some way. One might want to delete the files, search them for text, email them to somebody. The possibilities are endless.

Of course the whole system stands or falls on reusability. If our downstream component does not understand what that JSON is, or expects a different format well then we need to do some work. This is where Transform Components come it. We may well need to write many such transform components but the system will come with some basic ones like CSV2JSON and JSON2XML. As soon as we need specialized JSON or XML we will have to write our own components.

Looking ahead I could imagine 2 types of generic transform components:

  1. A JSON2JSON adaptor which allows you to simply remap fields (visually).
  2. A generic JavaScript component which allows you to write a javascript function which transforms your data.

Tuesday, April 16, 2013

NGINX Awesomeness: A Simple, Powerful Web Server and More

A 2MB exe which runs a web server to rival Apache? Well, maybe not, but if you Like Simple then you will Like NGINX. In terms of power it kicks Mongoose's ass I have to say because that critter, whom I am fond  of, was pretty limited. So, ff you want to run multiple sites, reverse proxy, finely control everything HTTP and more then NGINX is your new best friend

Download and run and you have a webserver on http://localhost serving up whatever is in your HTML folder. I've stripped down the conf/nginx.conf file in order to learn and explain it here:


http {
    server {
        listen       80;
        server_name  localhost;
        location / {
            root   html;
            index  index.html index.htm;
        }
    }
}
events {
    worker_connections  1024;
}

This obviously sets up an HTTP server on your machine listening on port 80. It's actually just the explicit version of the default settings and so equivalent to:


http {
    server {}
}
events {}

If you like being Spartan!

Next: want a second server (host name)? Just add a new server section:
server {
    server_name  mydomain.com;
    location / {
        root   /somewhere/else/;
        autoindex on;
    }
}
The autoindex setting provides an automatic directory listing (great for local test servers).

Of course you can configure logging (even per server), HTTP settings (e.g. keep alive stuff), mime types, SSL, hostname wildcards, virtual directories, error pages, gzip (nice), deny access (for .htaccess you need a converter) and all the usual stuff you need. But what get's me excited is the reverse proxy stuff.

Our company uses an expensive, high tech appliance for reverse proxy (and load balancing) functionality. This helps keep our web servers out of the DMZ and provides a neat front so that we can have multiple different web servers behind one application on one domain. The downside is you need a degree in it to configure it. NGINX is Simple - remember! Watch this:

Let's say you have another web server running PHP and you want NGINX to reverse proxy it. Simply pass all URLs ending in .php to that server with the location command's RegEx functionality:


location ~ \.php$ {
    proxy_pass   http://myphp.mydomain.com:8080;
}
That's awesome IMO. You need some serious expertise and clicking to get that done in our enterpise level applicance whose name begin's with "F" and ends with "5" but shall otherwise remain unnamed.

So sharpen up your RegEx skills and run your whole organisation's web infrastructure through one domain if you like.

Wednesday, April 10, 2013

OptiBrowser: Managing Links in a Multi-Browser World

As a web developer I have several browsers installed on my PC. This is not only for testing apps on different browsers but allows you to run the same app as a different user at the same time - something many web apps don't offer because they store session state in a cookie which is shared among windows of the same browser.

That's all fine but what about the default browser? Each OS allows you one and only one default browser. This means that when you click a link in an email it will always open in your default browser. If you're like me, and your default browser is internet explorer (don't ask), you find yourself copying the link and pasting it into Chrome.

Enter OptiBrowser!



OptiBrowser is your new default browser and yet it's not a browser at all. It's simply a menu which let's you choose which browser you want to start the link you just clicked in. Press <Enter> and it will start in your default browser. Of course this only applies to links you click outside of the browser - links clicked in a browser say in that browser.

Check it out on GitHub (cawoodm/optibrowser) and, if you have any issues post them there. It's written in C# (.NET 4) and compiled and tested under Windows 7 so far. Feel free to fork and improve.

Thursday, February 7, 2013

Crafty Tennis: A Component-Entity Game in JavaScript

Been dreaming of making a game for some time now and have been very interested in Component-Entity Systems which are a great way of keeping complexity down in a game.

I won't go through all the details here but suffice it to say you can keep complexity linear by either  adding new entities to your game (more "things" in the game) or by adding new components (more "functionality"). You then simply tell each new entity you add which components it has.


// Player Left (with AI)
Crafty.sprite(32, "img/padleft.run.png", {
  padleft: [0, 0]
});
Crafty.e("Paddle, 2D, DOM, Color, Multiway, Bound, AI, padleft, SpriteAnimation")
.color('rgb(255,0,0)')
.attr({ x: 20, y: H/2, w: 32, h: 32, player: 1 })
.bound({minX: 0, minY: 0, maxX: W/2, maxY: H})
.multiway(4, { W: -90, S: 90, D: 0, A: 180 })
.difficulty(3)
.bind('NewDirection', runner)
.animate('run', 0, 0, 5) // From x=0, y=0 to x=5 (6 frames)
;


Check it out on GitGub (cawoodm/tennis) or download it - it's actually quite fun to play!


Of course all this development was made possible by the excellent (and free) in-browser editor called Scripted - it's a poor man's Sublime Text.