I've been using the Cascalog query language for Hadoop map/reduce jobs for a while. The learning curve involves coming up to speed on the rich set of powerful operation creation macros (essentially, they are various techniques for creating user defined functions). As such, I've put together the following chart to help describe the different operation types in Cascalog. The idea is to provide some guidance on which type is appropriate for a particular job; along with examples and notes on usage and performance.
Cascalog jobs/queries are written in Clojure (a lisp like language that runs on the Java VM), so all examples below are Clojure code.
Let me know if you see any mistakes, or if you have suggestions for further details that could be added to make the chart more useful.
For reference, this thread also has a description of the def macros from Nathan, and more examples.
2 comments:
thanks for this! very useful.
Thanks for this, very useful!
Post a Comment