Getting started

The ActiveWarehouse ETL component provides a means of getting data from multiple data sources into your data warehouse. The links in the side bar provide additional information on ETL.

Here’s how to get rolling:

  1. Install the Gem

    Get to your command line and type sudo gem install activewarehouse-etl on Linux or OS X or type gem install activewarehouse-etl on Windows.

    ActiveWarehouse ETL depends on ActiveSupport, ActiveRecord, adapter_extensions and FasterCSV. If necessary you may have to approve the installation of these dependencies if they are not already installed.

    You can also download the packages in Zip, Gzip, or Gem format from the ActiveWarehouse files section on RubyForge. For the brave you can get the latest ETL code from the Github repository. To get the code from the Github repository you may use the following command line: git clone git://

  2. Create Control Files

    Create the ETL control files. The control files define the source, transformation and destination rules for the ETL process. See the .ctl files in the test directory for examples.

  3. Execute the etl command

    Execute the etl command passing the control file name as the argument. For example: etl source1.ctl

What's There Now?

Right now the ETL component has the following functionality:

  • Fixed-width and delimited file parsing
  • File and database source
  • File and database destination
  • Virtual source fields, which can be populated via output from Ruby code
  • Support for pre- and post-processing code
  • Multiple-input file parsing
  • Transform pipeline
  • Transform with a block
  • Included transformations: SHA1, Decode, Date to String, String to Date, Type Transform
  • ETL Domain Specific Language (DSL) control files
  • Bulk loading (currently implemented for MySQL)
  • Foreign key lookup
  • Error reporting
  • Recovery from errors
  • Error threshold setting

What's Coming?

No idea.

Copyright 2006-2009, All Rights Reserved