Pentaho vs Microsoft BI Stack [closed]

By : Tomas

My company is heavily invested in the MS BI Stack (SQL Server Reporting Services, -Analysis Services and -Integration Services), but I want to have a look at what the seemingly most talked about open-source alternative Pentaho is like.

I've installed a version, and I got it up and running quite painlessly. So that's good. But I haven't really the time to start using it for actual work to get a thorough understanding of the package.

Have any of you got any insights into what are the pros and cons of Pentaho vs MS BI, or any links to such comparisons?

Much appreciated!

By : Tomas


I recently tried pentaho open source BI. I found it to be extremely clumsy. It was not very intuitive and development time took much longer.

It is quite different from either Oracle or ms BI solutions. Maybe the enterprise edition is better.

By : q Tran

a couple of points to add

  • Although there is a window version of all Pentaho tools the setup in windows is onerous. Pentaho (especially the server start and stop which is separate from the GUI tool) is typically used in Linux, not windows shop, and there is steep learning curve going from Windows to Linux.
  • any tool has a learning curve when you shift to it. when you get used to always clicking OK and refreshing metadata when you have problems, SSIS isn't that bad. Pentaho can be flaky, too.

Tool questions need to be addressed in terms of larger cultural questions - what kind of shops use open source tools? in my experience i've found that althsough Microsoft shops seem more rigid, when you have trouble with a connection string in a Microsoft shop you can get help.. in Pentaho and Linux shops its more DYI.

BTW, watch out for Pentaho sales guys doing demos - all the things they show are a lot harder to get working than it seems! :)

Great information here? I have not tried Pentaho but and planning on checking it out. I am a seasoned MS BI consultant, using it since 1998. SSIS is very fast and very powerful but the criticisms are spot on. I found the following issues with SSIS:

(1) It is hard to debug, you get cryptic errors that may not give you any hint about what and where the problem really is.

(2) Per a prior comment, it is the shittiest development environment ever! I have no clue what they are thinking.

(a) Create a table with a 100 or more columns and put a merge join on it. Now go back in and try to make an update to the merge join (like pull a new column through). It can take several minutes, even on the fastest machine after you click ok on the merge join to save your change. I have a huge dataflow with lots of wide records and many merge joins. Adding one column to the dataflow takes more than half a day. I update a merge join and then have to go do something else and check back 5-10 minutes later to see if it is done. Microsoft's response to this is to break up your package into multiple packages, place the data in a table or binary between them. Well if you are going to disk between all the steps, you may was well do the whole thing in SQL! One of the main purposes of an ETL tool is to all this stuff in memory and avoid disk I/O.

(b) The designer outright crashes sometimes, losing all your work since last save (I do ctrl-S in my sleep now because of this)

(c) I had to figure out a hack and generate SSIS package XML in Excel for wide records. I have a Healthcare client where 600+ column records are commonplace. If you try to define a file format with 600 columns in SSIS, you have to type every single column in one at a time!!! Even MS access allows you to cut and paste a layout from a spreadsheet into a file layout, but not SSIS. So I had to generate the XML from the layout and paste the XML code into the right place in the package. Ugly way to do it but it saved entire days of work and lots of errors.

(d) Similar to (c), if you need to trim all your columns and you have say 600+ of them, guess what? In the derived column component, you have to type trim(column1) 600+ times! I now do all simple transforms like this in the SQL query to get the data, since that can easily be generated from an Excel sheet.

(e) There are many quirky things, components that turn invisible, sometimes you open the package and all the components are completely re-arranged incoherently.

(f) The FTP feature, possibly one of the most common things you need in ETL, is weak and only supports plain vanilla FTP which nobody uses. Everyone these days uses SFTP, FTPS, https, etc... So almost every implementation requires using a 3rd party commend line driven file transfer app the package has to call.

(g) Trying to CYA, similar to the ridiculous security in Windows Vista, Microsoft has made it exceedingly difficult to actually promote an SSIS package from one environment to another. It defaults to this stupid thing of "encrypting sensitive information with user key" security which means it must run under the same account in the environment you are moving it to as the environment you developed it, something that is rarely the case. There are better ways to configure but it always try to revert to this completely useless security protection.

(h) Lastly most of these problems are now in there 3rd version, clearly indicating Microsoft has no plan to fix them.

(i) Debugging is not nearly as easy as other languages.

SSIS still has a great many benefits, but not without some serious pain.

By : Lee

This video can help you solving your question :)
By: admin