Learning Spark 2.0 Knowledge Dump

RMAG news

This post will serve as a continuous knowledge dump regarding the “Learning Spark 2.0” book, where I’ll dump certain quotes that I find relevant (and hopefully you will too :]!)

In Spark’s supported languages, columns are objects with public methods (represented by the Column type).

Code example that uses expr(), withColumn, and col():

blogsDF.withColumn(“Big Hitters”, (expr(“Hits > 10000”))).show()
The above adds a new column, Big Hitters, based on the conditional expression, noting that expr(…) part can be changed with: col(“Hits”) > 1000

Leave a Reply

Your email address will not be published. Required fields are marked *