Spark 05: List Action Movies with Spark flatMap

Welcome to the fifth article in the series of Apache Spark tutorials. In this article, you will learn the application of flatMap transform operation. After the introduction to flatMap operation, a sample Spark application is developed to list all action movies from the MovieLens dataset.

Spark 05: List Action Movies with Spark flatMap

In the previous articles, we have used the map transform operation which transforms an entity into another entity where the transformation is one-to-one. For example, suppose you have a String RDD named lines, applying lines.map(x => x.toUpperCase) operation creates a new String RDD with the same number of records but with uppercase string literals as shown below:
Read More

Install Ballerina on Linux


Ballerina is a new open source JVM based language specially designed for integration purposes by WSO2 the world's #1 open source integration vendor. In this article, you will see how to manually install Ballerina on Linux systems. Visit the official website and download the installer for your system. There is an installer for Windows, Mac, Debian-based Linux and Fedora-based Linux. I prefer to install Ballerina manually because it is universal for all Linux operating systems out there.

Install Ballerina on Linux

Read More

Spark 04: Key Value RDD and Average Movie Ratings

In the first article of this series: Spark 01: Movie Rating Counter, we created three RDDs (data, filteredData and ratingData) each contains a singular datatype. For example, data and filteredData were String RDDs and the ratingRDD was a Float RDD. However, it is common to use an RDD which can store complex datatypes especially Key-Value pairs depending on the requirement. In this article, we will use a Key-Value RDD to calculate the average rating of each movie in our MoviLens dataset.  Those who don't have the MovieLens dataset, please visit the Spark 01: Movie Rating Counter article to setup your environment.

Spark 04: Key Value RDD and Average Movie Ratings

As you already know, the ratings.csv file has the fields movieId and rating. A given movie may get different ratings from different users. To get the average ratings of each movie, we need to add all ratings of each movie individually and divide the sum by the number of ratings.

Read More

Spark 03: Understanding Resilient Distributed Dataset

You are not qualified as an Apache Spark developer until you know what is a Resilient Distributed Dataset (RDD). It is the fundamental technique to represent data in the Spark memory. There are advanced data representation techniques like DataFrame built on top of RDD. However, it is always better to start with the most basic dataset: RDD. RDD is nothing other than a data structure with some special properties or features.

Spark 03: Understanding Resilient Distributed Dataset

We all know that Apache Spark is a distributed general-purpose cluster-computing framework. There are some common problems faced in a distributed environment including but not limited to:
  1. Remote access of data is expensive
  2. High chance of failure
  3. Runtime errors are expensive and hard to track
  4. Wasting computing power is way too expensive
RDD is designed to address the abovementioned problems. In the following section, you will see the properties of RDD and how it solves these problems.
Read More

Spark 02: Scala Cheat Sheet for Java Developers

This article introduces Scala to those Java developers who don't know Scala. I assume here that you already know Java (preferably Java 8) so that you can compare the features of Scala with Java. Please be informed that this article is not an end to end Scala tutorial. I am covering only the fundamentals of Scala which are used in my Apache Spark tutorials.

Spark 02: Scala Cheat Sheet for Java Developers

First of all, remember that Scala is a JVM based language which is running on top of your regular Java Virtual Machine. The whole purpose of Scala is providing a convenient functional programming language (at that time Java 8 wasn't there). However since Scala is built on top of Java, you can access Java libraries and API from your Scala code.

To play with Scala, please setup Scala on IntelliJ IDEA or install a command line Scala version in your system.
Read More

Spark 01: Movie Rating Counter

Apache Spark is a must to know framework for big data developers. This is an attempt to write a series of articles on Apache Spark to train you from zero to hero. In this series of articles, I will use the latest Apache Spark release which is 2.4.0 as of 2019 January. In the first few articles, we will code and test Apache Spark on IntelliJ Idea. As you may already know, Apache Spark is developed using Scala and of course, there are APIs available for other languages including Java and Python. Still, Scala has preferred over other languages for its performance and compact code. Therefore, you need to prepare the environment first.

Spark 01: Movie Rating Counter

My articles will be based on Frank Kane's course on Udemy: Apache Spark 2 with Scala - Hands On with Big Data! I highly recommend his tutorial if you prefer for a video tutorial.

Setup the Environment

Apache Spark 2.4.0 depends on Scala 2.12 which in turn depends on Java 1.8. Scala is unlike Java not very version compatible language. Therefore, please take special care on choosing versions.

Step 1:
Install Oracle Java Development Kit 1.8 on your system. Linux users can follow this article: Install Oracle JDK 8 on Linux

Step 2:
Install the latest IntelliJ IDEA. Again, Linux users can follow my article: Install IntelliJ IDEA on Linux

Step 3:
Install Scala plugin in IntelliJ IDEA. Regardless of your operating system, you can follow the article Setup Scala on IntelliJ IDEA.

Read More

Android: Custom Font in Kotlin

Sometimes you may want to use custom font in your Android application for aesthetic requirement or to show a message in a different language. I have already written an article on how to use a Custom Font in Android four years ago. Since it has been a long time and a reader wondered if that code still works, I am writing this new article using Kotlin. However, the underlying technique hasn't change over the years and you can still use my previous article.

Step 1:
Create an Android application "Custom Font" with Kotlin support.
Read More

Contact Form

Name

Email *

Message *