Sunday, November 20, 2016

Visual PowerShell with Visual PowowShell v0.0.1

I'm excited to showcase the UI for PowowShell. Check it out here or feel free to fork or browse the code on GitHub. It's just a mockup to get an idea of the look and feel of designing a pipeline as a sequence of steps in a matrix.

Tuesday, November 15, 2016

PowowShell: A Vision for a Visual Progamming Language for PowerShell

I use PowerShell a lot and, though I'm a programmer at heart, it would sometimes be nice to have a library of re-usable script components which you can just wire up together in new ways.

PowowShell (Github) is my attempt at making such a system. Indeed I want to take it a step further and make it into a Visual Programming Language (like NodeRed) where you just drag and drop components onto a workspace, wire them up and press play.

PowowShell lets you take any command-line utility and wrap it up as a Powow Component. This component can then be used as a Step in a Powow Pipeline. The entire Pipeline is itself a powershell script and also a Component and thus you can use Pipelines as Components within other Pipelines.

A Common Interface

Components have 3 aspects to their interface:
  • PARAMETERS: Basic settings or values for your component. You define these when you add a Component to your Pipeline though they can be dynamic values determined at runtime.
  • INPUT: The type of data that your component accepts (piped in). This must be a string* but it can be any type of serialized data (e.g. JSON, CSV, XML, whatever).
  • OUTPUT: The type of data that your component produces (pipes out). Again, only strings*!
* All INPUT and OUTPUT is a string in PowowShell. This may sound like a limitation but it ultimately ensures components can talk to each other. A major weakness of PowerShell cmdlets today.

Pipelines

The pipeline is defined by a pipeline.json file (this will be built visually like a flowchart in later versions) which could look something like this:
{
 "id": "pipeline1",
 "name": "Send mail to young voters",
 "description": "Read voters.txt, get all voters under 21yrs of age and output the name, age and email as a JSON array",
 "parameters": [],
 "input": {},
 "output": {},
 "steps": [
  {
   "id":"A",
   "name":"Read Voters File",
   "reference":"../components/ReadFile.ps1",
   "input":"",
   "parameters": {
    "Path": "./data/voters.txt"
   }
  },
  {
   "id":"B",
   "name":"Convert2JSON",
   "reference":"../components/CSV2JSON.ps1",
   "input": "A",
   "parameters": {
    "Delimiter": "|",
    "Header": "{\"name\", \"age\", \"email\", \"source\"}"
   }
  },
  {
   "id":"C",
   "name":"Select Name and Email",
   "reference":"../components/SelectFields.ps1",
   "input": "B",
   "parameters": {
    "Fields": "{\"name\", \"age\", \"email\"}"
   }
  }
 ]
 
}

Basically the pipeline has some meta data (name, description) but the real meat is the list of steps. These steps are executed in the order they occur and they get their input from the step they define. In this case:
A -> B -> C
But it would also be possible to pass A's output to another component B1 as in
A -> B -> C
  -> B1

The key thing to notice is that each step references a component. These components are your re-usable pieces of code and should do something useful and configurably reusable.

Components

As an example consider a FileList.ps1 component. It takes two PARAMETERS "Path" and "Filter" and lists the files in that directory. It does not take any INPUT and it's OUTPUT is an array of File data. Such a component is an Extractor in ETL language as it is a source node for the data flow.

There is a built-in cmdlet in PowerShell called Get-ChildItem which lists files. It's also known as "dir" or "ls". We will be using this cmdlet but wrapping it up nicely so that we can use it in a pipeline. In particular we will be making it output a JSON Array. Here's the code:

[Parameter(Mandatory=$true,HelpMessage="The path to the files")][string]$Path,
[string]$Filter
$files = @()
Get-ChildItem -Path $Path -Filter $Filter -Recurse:$Recurse |
  ForEach-Object {
    $f = @{
    name=$_.Name
    fullName=$_.FullName
    size=$_.Length
  }
  $files += New-Object -TypeName PSObject -Property $f
}
$files | ConvertTo-JSON

This component, saved as .\components\FileList.ps1 can be run as follows from POWERSHELL:
.\components\FileList.ps1 -Path C:\temp -Filter *.txt
It will produce an output similar to the following:
[
    {
        "fullName":  "C:\\temp\\test.txt",
        "name":  "test.txt",
        "size":  491899
    },
    {
        "fullName":  "C:\\temp\\op.txt",
        "name":  "op.txt",
        "size":  413431
    }
}

Enough details for now. The idea is that components downstream can accept this array of objects and process them in some way. One might want to delete the files, search them for text, email them to somebody. The possibilities are endless.

Of course the whole system stands or falls on reusability. If our downstream component does not understand what that JSON is, or expects a different format well then we need to do some work. This is where Transform Components come it. We may well need to write many such transform components but the system will come with some basic ones like CSV2JSON and JSON2XML. As soon as we need specialized JSON or XML we will have to write our own components.

Looking ahead I could imagine 2 types of generic transform components:

  1. A JSON2JSON adaptor which allows you to simply remap fields (visually).
  2. A generic JavaScript component which allows you to write a javascript function which transforms your data.