pdi bag of tricks...

After using PDI for a while, you start to encounter some common problems. PDI crashes, databases die, connections get reset, all sorts of
interesting things can happen in complex systems.

As a general rule, when building PDI jobs that should behave monotonically I always strive to find a way to make a job re-playable and
idempotent. This can be tricky given an unlimited input set over time.

Probabilistic data structures to the rescue!

To do this, at work we created a PDI bloom filter step (thanks Fabio!). This article will go over how it works and it's use cases.

Read more

unserializing php from pdi

Here's a quick post that explains how to do something which may not be obvious.

The scenario: You've got some serialized data stored in a not-so-portable data interchange format (serialized PHP),
and would like the data to be made available as part of a PDI transformation.

Read more

building a datawarehouse for testing

Overview

A common problem when starting a new project is getting fixtures in place to facilitate testing of reporting functionality and refining data models. To ease this, I've created a PDI job that creates the dimension tables, and populates a fact table.

Read more

tcpdump tip viewing a packet stream data payload

Here is an alias that I've used often to view packet payloads using tcpdump which filters out all the overhead packets (just contains payloads). I usually stick the following lines into my .bashrc on all the servers I install. alias tcpdump_http="tcpdump 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' -A -s0" alias tcpdump_http_inbound="tcpdump 'tcp dst port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' -A -s0" alias tcpdump_http_outbound="tcpdump 'tcp src port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) !

Read more

i see packets...

While studying for the GCIA certification, I put together the following reference to be able to eyeball packets and see at a glance what's inside a hex packet dump.

Read more

complex event processing to detect click fraud

Here's another use-case for CEP: Detecting uniqueness over time. A use-case for this type of pattern is identifying click fraud.

Once more, to see how to get everything up and running, see my previous posts.

In our fictitious scenario, we're going to assume we want to see a stream of incoming data filtered to only output unique data given a subset of uniqueness over a 24 hour period.

Read more

complex event processing for fun and profit

As an exercise to keep my mind nimble, here.s a write-up on how to use the power of computers to take over the world by out-foxing those slow moving meatbags who do stock trading and compete with skynet on making the most possible profit.

The pieces of this puzzle are:

  • A messaging backbone (we.ll use AMQP with the RabbitMQ broker)
  • A complex event processing engine (Esper)
  • A way to express our greed (EPL statements)
  • A software that ties this all together called new-hope (partially written by yours truly)
  • A feed of stock prices
  • An app to view the actions we must take.

    Let's get everything installed.

On centos with the EPEL repo available:

Read more