Cogito Interruptus Vulgaris

Building a Datawarehouse for Testing

- - posted in Data warehouse, ETL, PDI

Overview

A common problem when starting a new project is getting fixtures in place to facilitate testing of reporting functionality and refining data models. To ease this, I’ve created a PDI job that creates the dimension tables, and populates a fact table.

Tcpdump Tip: Viewing a Packet Stream Data Payload

- - posted in code, ops, techtips

Here is an alias that I’ve used often to view packet payloads using tcpdump which filters out all the overhead packets (just contains payloads).

I usually stick the following lines into my .bashrc on all the servers I install.

1
2
3
alias tcpdump_http="tcpdump 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' -A -s0" 
alias tcpdump_http_inbound="tcpdump 'tcp dst port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' -A -s0" 
alias tcpdump_http_outbound="tcpdump 'tcp src port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' -A -s0" 

You can pass as argument the interface you want to listen on (defaults to eth0) via a ‘-i eth0:1’ for example. It snarfs in the payload, so it’s easy to follow what’s going on.

An equally viable alternative is to install tcpflow.

I See Packets…

- - posted in security, techtips

While studying for the GCIA certification, I put together the following reference to be able to eyeball packets and see at a glance what’s inside a hex packet dump.

Detecting Unique Patterns Over Time With Complex Event Processing

- - posted in CEP, ESP, Esper, new-hope

Here’s another use-case for CEP: Detecting uniqueness over time. A use-case for this type of pattern is identifying click fraud.

Once more, to see how to get everything up and running, see my previous posts.

In our fictitious scenario, we’re going to assume we want to see a stream of incoming data filtered to only output unique data given a subset of uniqueness over a 24 hour period.

Complex Event Processing for Fun and Profit

- - posted in CEP, ESP, Esper, code

As an exercise to keep my mind nimble, here.s a write-up on how to use the power of computers to take over the world by out-foxing those slow moving meatbags who do stock trading and compete with skynet on making the most possible profit.

The pieces of this puzzle are:

  • A messaging backbone (we.ll use AMQP with the RabbitMQ broker)
  • A complex event processing engine (Esper)
  • A way to express our greed (EPL statements)
  • A software that ties this all together called new-hope (partially written by yours truly)
  • A feed of stock prices
  • An app to view the actions we must take.

Creating Forensic Images

- - posted in ops, security, techtips

Often reading big disks is a time consuming endeavor. To minimize the number of times you need to read the data, here’s a tip for reading the image using dd, compressing it, and checksumming it.

1
2
3
4
dd if=/dev/sda | pv | tee >( md5sum > box.dd.md5 ) | \
tee >( sha1 > box.dd.sha1  ) | tee box.dd | gzip | \
tee box.dd.gz | tee >( md5sum >box.dd.gz.md5 ) | \
sha1 >box.dd.gz.sha1

This is going to be a pretty CPU hungry process. If you.ve got lots of cores, you can further speed things up by using ‘pigz’ (parallel gzip) instead of gzip.

As a side note, this is a generic approach when you need to pipe the output of one program to many others simultaneously.

Compressing MySQL Binary Logs

- - posted in code, ops, techtips

Under normal circumstances, master servers in a replication can be setup to automatically rotate binary logs using the expire_logs_days my.cnf configuration setting.

However when it is known that slaves are in sync, it can be beneficial to pro-actively reduce on-disk size using compression. This can be especially useful in high-churn environments where binary logs grow quickly.

Grab the script:

1
git clone git://github.com/marksteele/mysql-dba-tools.git