The ETL-tools are validated on the following categories
√ | Infrastructure | √ | Functionality | √ | Usability |
√ | Platforms supported | √ | Debugging facilities | √ | Data Quality / profiling |
√ | Performance | √ | Future prospects | √ | Reusability |
√ | Scalability | √ | Batch vs Real-time | √ | Native connectivity |
Pentaho Kettle vs Talend
Pentaho
- Pentaho is a commerical open-source BI suite that has a product called Kettle for data integration.
- It uses an innovative meta-driven approach and has a strong and very easy-to-use GUI.
- The company started around 2001 (2002 was when kettle was integrated into it).
- It has a strong community of 13,500 registered users.
- It has a stand-alone java engine that process the jobs and tasks for moving data between many different databases and files.
- It can schedule tasks (but you need a schedular for that - cron).
- It can run remote jobs on "slave servers" on other machines.
- It has data quality features: from its own GUI, writing more customised SQL queries, Javascript and regular expressions.
Talend
- Talend is an open-source data integration tool (not a full BI suite).
- It uses a code-generating approach. Uses a GUI, but within Eclipse RC.
- It started around October 2006
- It has a much smaller community then Pentaho but has 2 finance companies supporting it.
- It generates java or perl code which you later run on your server.
- It can schedule tasks (also with using schedulars like cron).
- It has data quality features: from its own GUI, writing more customised SQL queries and Java.
Comparison - (from my understanding)
- Pentaho is faster (twice as fast maybe) then Talend.
- Pentaho's GUI is easier to use then Talend's GUI and takes less time to learn.
My impression
Pentaho is easier to use because of its GUI.
Talend is more a tool for people who are making already a Java program and want to save lots and lots of time with a tool that generates code for them.
Assuming Pentaho made it to the next round....
Pentaho Kettle vs Informatica
Informatica- Informatica is a very good commercial data integration suite.
- It was founded in 1993
- It is the market share leader in data integration (Gartner Dataquest)
- It has 2600 customers. Of those, there are fortune 100 companies, companies on the Dow Jones and government organization.
- The company's sole focus is data integration.
- It has quite a big package for enterprises to integrate their systems, cleanse their data and can connect to a vast number of current and legacy systems.
- Its very expensive, will require training some of your staff to use it and probably require hiring consultants as well. (I hear Informatica consultants are well paid).
- Its very fast and can scale for large systems. It has "Pushdown Optimization" which uses an ELT approach that uses the source database to do the transforming - like Oracle Warehouse Builder.
Comparison
- Pentaho's Javascipt is very powerful when writing transformation tasks.
- Informatica has many more enterprise features, for example, load balancing between database servers.
- Pentaho's GUI requires less training then Informatica.
- Penatho doesn't require huge upfront costs as Informatica does. (that part you saw coming, I'm sure)
- (edited)Informatica is faster then Pentaho. Infromatica has Pushdown Optimization, but with some tweaking to Pentaho and some knowledge of the source database, you can improve the speed of Pentaho. (also see line below)
- (new)You can place Pentaho Kettle on many different servers (as many as you like, its free) and use it as a cluster.
- Informatica has much better monitoring tools then Pentaho.
PDI also has database load balancing. Can you elaborate on how well Informatica handles big data vs Pentaho? i.e. HDFS, HBase, Hive, MongoDB, Cassandra etc?
ReplyDeleteinteresting blog. It would be great if you can provide more details about it. Thanks you
ReplyDeleteCassandra Training Courses
good information informatica Online Course
ReplyDeleteThank you for such an interesting blog.It would really be great if you also provide some more information about
ReplyDeleteInformatica.Thanks.
Thank you for the information.
ReplyDeleteAngular JS online training
Angular JS training
App V online training
App V training
Application packaging online training
Application packaging training
Blockchain online training
Blockchain training
C online training
C training
Data power online training
Data power training
Data Stage online training
Data Stage training
Dynamic CRM online training
Dynamic CRM training
Ethical hacking online training
Ethical hacking training
Nice post
ReplyDeleteAngular JS online training
Angular JS training
App V online training
App V training
Application packaging online training
Application packaging training
Blockchain online training
Blockchain training
C online training
C training
Data power online training
Data power training
Data Stage online training
Data Stage training
Dynamic CRM online training
Dynamic CRM training
Ethical hacking online training
Ethical hacking training
Great article
ReplyDeleteSSIS training
SSRS training
tableau training
Teradata training
It keeps me to engage on the content and it gives good explanation of the topics.
ReplyDeletePopular Java Frameworks
Open-Source Framework
The blog which you have shared is more creative... Waiting for your upcoming data...
ReplyDeleteFuture Scpoe Of Cloud Computing
Future Of Cloud Computing
perde modelleri
ReplyDeleteSms Onay
mobil ödeme bozdurma
Nft Nasil Alinir
ankara evden eve nakliyat
Trafik sigortasi
dedektor
web sitesi kurma
ASK KİTAPLARİ
smm panel
ReplyDeleteSmm Panel
iş ilanları
İNSTAGRAM TAKİPÇİ SATIN AL
HIRDAVATÇI BURADA
beyazesyateknikservisi.com.tr
Servis
tiktok jeton hilesi