Pure Ruby ETL
The ActiveWarehouse ETL component provides a means of getting data from multiple data sources into your data warehouse. The links in the side bar provide additional information on ETL.
Here’s how to get rolling:
Install the Gem
Get to your command line and type
sudo gem install activewarehouse-etlon Linux or OS X or type
gem install activewarehouse-etlon Windows.
ActiveWarehouse ETL depends on ActiveSupport, ActiveRecord, adapter_extensions and FasterCSV. If necessary you may have to approve the installation of these dependencies if they are not already installed.
You can also download the packages in Zip, Gzip, or Gem format from the ActiveWarehouse files section on RubyForge. For the brave you can get the latest ETL code from the Github repository. To get the code from the Github repository you may use the following command line:
git clone git://github.com/aeden/activewarehouse-etl.git.
Create Control Files
Create the ETL control files. The control files define the source, transformation and destination rules for the ETL process. See the .ctl files in the test directory for examples.
Execute the etl command
Execute the etl command passing the control file name as the argument. For example:
What's There Now?
Right now the ETL component has the following functionality:
- Fixed-width and delimited file parsing
- File and database source
- File and database destination
- Virtual source fields, which can be populated via output from Ruby code
- Support for pre- and post-processing code
- Multiple-input file parsing
- Transform pipeline
- Transform with a block
- Included transformations: SHA1, Decode, Date to String, String to Date, Type Transform
- ETL Domain Specific Language (DSL) control files
- Bulk loading (currently implemented for MySQL)
- Foreign key lookup
- Error reporting
- Recovery from errors
- Error threshold setting