There might be plenty of free ETL tools out there but they come with either
- A challenging installation process (e.g. SpagoBI ETL), or
- A demanding footprint in terms of disk space, and memory (e.g. Oracle Data Integrator) on your regular desktop/laptop specs, or
- A restricted open-source/community licensing (e.g. RapidMiner)
that would prevent you from owning a decent one to help with your own data analysis, or mashups.
It looks like the quest for a decent and capable ETL tool is now finally answered by the latest offering from Talend. Their Talend Open Studio for Data Integration comes as a Free Open Source capable of connecting to tons of different kinds of data sources (including the popular fantastic four of Oracle, MySQL, SQLServer, and DB2). It also connects to your Amazon Aurora, and Redshift databases so you have everything covered more or less!
For images used in this post, right click on them to view full size if you wish.
The installation of the Talend Open Studio tool is painless. It requires only 3GB RAM, and 3GB disk space to get it up and running. So, as a BI/DW consultant, you can put all of your BI dashboard solutions built on cleansed data, and showcase to your clients on-the-go!
In the rest of this post, I would like to share with you some of aspects of the tool that make it in the list of my favorites!
- Installation is very simple. Just download the software, unzip it, and the Studio executable is ready for you to click on to do your ETL things immediately!
- Click on TOS_DI-win-x86_64.exe and start it.
- When you are in the tool, you will see the wide range of databases that it can connect to as shown on the right most panel.
- If you are new to ETL coding with Talend, go here for refresher tutorials https://www.talendforge.org/tutorials/menu.php
- As the simplest and minimal ETL code example, you just start as follows
- Create your database connections for your required source, and target
- Retrieve metadata from your database connections
- Design a job to get data from source to target
For the following screenshots, my demo scenario is to extract data from a table in the Microsoft SQL Server 2012 database to its target in a pluggable database in Oracle 12c.
- Right click on Metadata/Db Connection/Create connection to set up the source database connection SQLServer2012
- Give it a name such as SQLServer2012, complete the connection details, and click on ‘Check‘ to validate it.
- Repeat the above steps to define your target database connection.
Here is something that sets it apart from all other vendors, and is an awesome feature that I have been looking for (see screenshot shown below) – it can identify the missing drivers, then go and get the appropriate drivers for you, download and install them ready for use in your ETL code immediately. For other vendors, you have to figure out where to get the jar files, and then where to insert them in the convoluted directory structure created by the tool, and start/restart certain things to make the jar files finally visible in your code! Kudos to the Talend software engieers!
- So you now also have the Oracle connection configured.
- For each configured connection, pull in its metadata for the tool.
- Repeat the above steps for the target database connection
You are now ready to build a simple workflow to move data from a table in SQLServer2012 to the target Oracle12 database
- Expand the source connection SQLServer2012, and drag the required source table (e.g. D_AGENT) to the design canvas (i.e. the top middle panel). Click OK to confirm the recommend action of tMSSqlInput (i.e. you are inputting data from D_AGENT).
- Then expand the target Oracle connection and drag the required target table (e.g. D_AGENT) to the design canvas. Click OK to confirm the recommended action tOracleOuput (you might need to select it if it is not correctly assumed by the tool).
- Connect the source and target tables together to complete the data flow.
then view and confirm the column mappings
then tweak the ‘Action on table‘ to truncate the target table before loading it with new data
- You are now ready to run your first ETL code.
- From the above results, you have copied 734 rows in 1.11 s from the SQL Server table to the target table in Oracle 12c. Nothing could be any simpler!
Have fun exploring the tool further with different complexities.