Smelly instanceof Operator

Smelly instanceof Operator

The instanceof operator in Java is used to check if a given reference is an instance of (aka object of) a given class. Though it is useful in some situations, it is a bad practice to use the instanceof operator. Whenever I see instanceof in my student's projects or a code review I raise the alarm. This article explains why the instanceof operator is considered a bad practice and how to avoid it.
Read More

Resume Tips for Software Engineers

Years ago as an international student preparing my resume to get my first job in Canada, I had a lot of questions and I did search a lot on how to make a resume that fits Canadian employer's requirements and style. Things have changed. Eventually, I got offers from some high tech companies and landed in a good job and now I am interviewing candidates who are like I was a couple of years ago. After looking at more and more resumes, I decided to share my experience here with a hope that will make someone's life better.


For whom this article is for? Well! for anyone looking for a new job in Canada. Especially if you are  new immigrant who has no idea about what Canadian employers are looking for, this article is tailored to your requirements. Though my experience is limited to Canada the companies I applied for are mostly US-based companies so I hope it can be applied anywhere in North America. This article is targeting only those who are in the software industry. I don't know how much it will overlap with other industries. The rest of the article is divided into two topics: 1. Resume Sections, 2. Resume Format. The first topic covers what to include and not to include in your resume and the second topic provides some formatting tips to make your resume get you a call from the recruiter.
Read More

Presto SQL: Join Algorithms

Presto is a distributed big data SQL engine initially developed by Facebook and later open-sourced and being led by the community. The last article Presto SQL: Types of Joins covers the fundamentals of join operators available in Presto and how they can be used in SQL queries. With that knowledge, you can now learn the internals of Presto and how it executes join operations internally. This article presents how Presto executes join operations and the algorithms used to join tables.


Read More

Presto SQL: Types of Joins

SQL Join is one of the most important and expensive SQL operation and require deep understanding from database engineers to write efficient SQL queries. From database engineers' perspective, understanding how join operation works help them to optimize them for efficient execution. This article, explains the join operations supported in the open source distributed computing engine: Presto SQL. This article is based on now archived prestodb.rocks blog which I referred to learn the Join algorithms of Presto.



Read More

Read Carbondata Table from Apache Hive

Apache Carbondata an indexed columnar data store heavily depends on Apache Spark but also supports other Big Data frameworks like Apache Hive and Presto. This article explains how to read a Carbondata table created in Apache Spark from Apache Hive in two sections: 1. How to create a table in HDFS using Apache Spark, 2. How to read the Carbondata table from Apache Hive.

Read Carbondata Table from Apache Hive
Requirements:
  • Oracle JDK 1.8
  • Apache Spark
  • Apache Hadoop (Carbondata officially support Hive 2.x. In this article, Apache Hadoop 2.7.7 is used)
  • Apache Hive (Carbondata officially support Hive 2.x. So better to stick to 2.x version. In this article, Apache Hive 2.3.6 is used to demonstrate the integration)
  • Carbondata libraries
Please follow the Integrate Carbondata with Apache Spark Shell article to compile Carbondata from source and integrate it with Apache Spark. This article is written based on the assumption that you have already followed all the steps from the above-mentioned article.

Read More

Integrate Carbondata with Apache Spark Shell

Apache Carbondata an indexed columnar data store solution for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc. This article is written to provide a quick start guide on how to integrate Carbondata with Apache Spark Shell. Why another article while there is a quick start guide on the official website? Things are not always as smooth as expected. In my experience, integrating Carbondata with Apache Spark using pre-built binaries didn't work as expected. So here is the quick start tutorial.

Integrate Carbondata with Apache Spark Shell
Requirements:
Carbondata requires Java 1.7 or 1.8 to run and Apache Maven to build from source. Please make sure that you have Oracle JDK 1.8, supporting Apache Maven and Git to setup Carbondata. If you don't have Oracle JDK or Apache Maven installed in your system, please follow the given links below to install them first.

Read More

Install the latest Oracle JDK on Linux


Even though OpenJDK is available in Linux repositories, some applications strictly require Oracle Java Development Kit. This article shows you how to manually install Oracle JDK 13 on your Linux system. This article uses JDK 13$java_update_no to demonstrate the installation. In the provided commands, replace the version specific paths and file names according to your downloaded version.
Oracle provides deb and rpm installers
If your Linux distribution is using DEB package format like Debian, you can download and install the $java_version$java_update_no_linux-x64_bin.deb file using the following command:
sudo dpkg -i $java_version$java_update_no_linux-x64_bin.deb
If your  Linux distribution is using RPM package format like Cent OS, you can download and install the $java_version$java_update_no_linux-x64_bin.rpm file using the following command:
sudo rpm -ivh $java_version$java_update_no_linux-x64_bin.rpm

However, this article explains the manual installation method which is applicable for all Linux distributions out there. Personally, I prefer the manual installation because I have more control over the changes made in the system.

Install Oracle JDK 13 on Linux

Read More

Install MySQL 8 on Ubuntu/Linux Mint

Ubuntu official software repository provides MySQL 5.x which can be installed by following the article: Install MySQL with phpMyAdmin on Ubuntu. However the latest release of MySQL: 8.x, requires you to manually add the software repository into your system which makes the installation process little tricky. This article walks you through the end-to-end installation process of MySQL 8.


Read More

Android: List External Storage Files

This article explains how to list files from the external storage (SD Card) in Android. Though you can list files recursively using a simple method, the new Runtime Permission Model introduced in Android 6 makes it a little difficult. Let's dive into the code and see how we can list all the files recursively.

Android: List External Storage Files

As I mentioned earlier, I am using Kotlin for Android development since it is the future of Android. If you are using Java, just copy and paste the code into your class method by method. The Android Studio will translate the method into Java for you.
Read More

ANTLR Hello World! - Arithmetic Expression Parser

ANTLR Hello World! - Arithmetic Expression Parser

Ever wondered how all these programming languages understand what you write? This article reveals the truth: Language Parsing. It is often referred to as parsing, syntax analysis, or syntactic analysis. Regardless of the term, it is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The following diagram depicts the language parsing process:

Language Parser

As you can see, the Language Parser (which is part of the compiler) takes an input (which is the source code), validates it against the Language Grammar and produces an Abstract Syntax Tree (commonly known as AST which is representing the source code in a tree structure).

ANTLR (ANother Tool for Language Recognition) is a tool to define such grammar and to build a parser automatically using that grammar. It also provides two high-level design patterns to analyze the AST: Visitor and Listener. ANTLR is being used by several languages and frameworks including Ballerina, Siddhi, and Presto SQL. This article introduces ANTLR to you using a hello world application to evaluate basic mathematical expressions as a string.

Read More

Install the latest Eclipse on Linux

This article shows you the way to install the latest version of Eclipse on Linux. There are other ways to install Eclipse using scripts to automate the installation. However, I prefer the manual installation method explained in this article so that you know where your files go. Later if you want to remove the Eclipse, it is just two commands as explained at the end of the article.

If you do not have Java in your system, follow this link and install the Java first.


Step 1:
Download the desired version of Eclipse from the official site:



Step 2:
Open the Terminal (Ctrl + Alt + T) and enter the following command to change the directory.
cd /opt

Step 3:
Enter the command given below to extract the Eclipse from ~/Downloads directory. If your downloaded file is in any other directory, replace the last parameter by the actual file path.
sudo tar -xvzf ~/Downloads/eclipse-jee-2019-03-R-linux-gtk-x86_64.tar.gz

Step 4:
Open another Terminal (Ctrl + Alt + T) and enter the following command to create a shortcut file for eclipse.
gedit eclipse.desktop

Step 5:
In the opened gedit, copy and paste the following text.
[Desktop Entry]
Name=Eclipse
Type=Application
Exec=/opt/eclipse/eclipse
Terminal=false
Icon=/opt/eclipse/icon.xpm
Comment=Integrated Development Environment
NoDisplay=false
Categories=Development;IDE;
Name[en]=Eclipse
Name[en_US]=Eclipse


Step 6:
Save and close the gedit.

Step 7:
Enter the following command in the terminal to install the shortcut.
sudo desktop-file-install eclipse.desktop

Now search for Eclipse in the dashboard and open it.



Upgrade Eclipse

If you have already installed Eclipse using the above method and would like to upgrade the Eclipse to the latest version, just remove the Eclipse from /opt director and follow Steps 1 to 3 from the installation process.
sudo rm -rf /opt/eclipse/eclipse.desktop



Remove Eclipse

Removing the Eclipse installed as described in this article is just two lines of commands.

Step 1:
First, remove the menu entry you created in Step 7.
sudo rm /usr/share/applications/

Step 2:
Delete the /opt/eclipse folder.
sudo rm -rf /opt/eclipse

Read More

Javalin: A Tiny but Mighty Framework

Two years ago, I wrote an article Microservices in a minute using the open source framework MSF4J. Today I came across another framework Javalin: another lightweight framework to develop lightweight web applications with less or no effort. We already have plenty of web frameworks including the shining star Spring. What makes Javalin different is its simplicity. In addition, it can be used as a microservice framework or a tiny web framework to serve a web application with static files. In Javalin developers' words:

Javalin’s main goals are simplicity, a great developer experience, and first-class interoperability between Kotlin and Java.

Comparing Javalin with Spring is like comparing a shaving blade with a Wenger 16999 Swiss Army Knife Giant, but it does what it is supposed to do. If you want to quickly add a REST endpoint for a quick demo or if you just need a simple web framework without any additional gimmicks like Dependency Injection or Object Relational Mapping, consider Javalin. It is easy to learn and lighter to run.


In this article, you will see how to use Javalin as a web framework to serve a contact-us page and how to build a CRUD micro-service using Javalin.

Requirements:

Read More

Serve TensorFlow Models in Java

TensorFlow is a famous machine learning framework from Google and a must to know asset for machine learning engineers. Even though Python is recommended to build TensorFlow models, Google offers Java API to use TensorFlow in Java. Still, Python is the easiest language to build TensorFlow models, even for Java developers (learn Python, my friend). However, enterprise applications developed in Java may require the artificial intelligence offered by a trained TensorFlow model. In this article, you will learn how to load and use a simple TensorFlow model exported from Python.

Serve TensorFlow Models in Java
Read More

Spark 06: Broadcast Variables

If you read the Spark 04: Key-Value RDD and Average Movie Ratings article, you might wonder what to do with popular movie IDs printed at the end. A data analyst cannot ask his/her users to manually check those IDs in a CSV file to find the movie name. In this article, you will learn how to map those movie IDs to movie names using Apache Spark's variable broadcasting.

Spark 06: Broadcast Variables

Suppose you want to share a read-only data that can fit into memory with every worker in your Spark cluster, broadcast that data. The broadcasted variable will be distributed only once and cached in every worker node so that it can be reused any number of times. More about broadcasting will be covered later in this article after the code example.
Read More

Apache Maven for Beginners

Apache Maven is a build tool widely being used by Java developers to manage project dependencies, control build process and automate tests. Apache Maven makes our life easier especially in building a complex Java project. However, beginners stay away from Apache Maven as I did years ago just because they find it complex to learn and use. This article simplifies the concept of Apache Maven and introduces Maven in a smooth way to beginners. In this article, you will see how you can use Apache Maven to manage your project dependencies using a simple Java project as an example. The article is structured into two main topics: Apache Maven in Eclipse and Apache Maven in IntelliJ IDEA. Of course, you can use Apache Maven without any IDEs. However, I stick with IDEs to make it simple for beginners. Other applications of Apache Maven like build management and test automation will be covered in another article.
Let's begin with manual dependency management using a simple calculator application. Suppose you want to develop a Calculator that receives a simple arithmetic expression like "2 + 3 * 5" as input and prints the output in the console. It is a complex task to evaluate such a String input and calculate the result by ourselves. Fortunately, there is a library: exp4j which can evaluate a String expression and return the output.
Read More

Spark 05: List Action Movies with Spark flatMap

Welcome to the fifth article in the series of Apache Spark tutorials. In this article, you will learn the application of flatMap transform operation. After the introduction to flatMap operation, a sample Spark application is developed to list all action movies from the MovieLens dataset.

Spark 05: List Action Movies with Spark flatMap

In the previous articles, we have used the map transform operation which transforms an entity into another entity where the transformation is one-to-one. For example, suppose you have a String RDD named lines, applying lines.map(x => x.toUpperCase) operation creates a new String RDD with the same number of records but with uppercase string literals as shown below:
Read More

Install Ballerina on Linux


Ballerina is a new open source JVM based language specially designed for integration purposes by WSO2 the world's #1 open source integration vendor. In this article, you will see how to manually install Ballerina on Linux systems. Visit the official website and download the installer for your system. There is an installer for Windows, Mac, Debian-based Linux and Fedora-based Linux. I prefer to install Ballerina manually because it is universal for all Linux operating systems out there.

Install Ballerina on Linux

Read More

Spark 04: Key-Value RDD and Average Movie Ratings

In the first article of this series: Spark 01: Movie Rating Counter, we created three RDDs (data, filteredData and ratingData) each contains a singular datatype. For example, data and filteredData were String RDDs and the ratingRDD was a Float RDD. However, it is common to use an RDD which can store complex datatypes especially Key-Value pairs depending on the requirement. In this article, we will use a Key-Value RDD to calculate the average rating of each movie in our MoviLens dataset.  Those who don't have the MovieLens dataset, please visit the Spark 01: Movie Rating Counter article to setup your environment.

Spark 04: Key Value RDD and Average Movie Ratings

As you already know, the ratings.csv file has the fields movieId and rating. A given movie may get different ratings from different users. To get the average ratings of each movie, we need to add all ratings of each movie individually and divide the sum by the number of ratings.

Read More

Spark 03: Understanding Resilient Distributed Dataset

You are not qualified as an Apache Spark developer until you know what is a Resilient Distributed Dataset (RDD). It is the fundamental technique to represent data in the Spark memory. There are advanced data representation techniques like DataFrame built on top of RDD. However, it is always better to start with the most basic dataset: RDD. RDD is nothing other than a data structure with some special properties or features.

Spark 03: Understanding Resilient Distributed Dataset

We all know that Apache Spark is a distributed general-purpose cluster-computing framework. There are some common problems faced in a distributed environment including but not limited to:
  1. Remote access of data is expensive
  2. High chance of failure
  3. Runtime errors are expensive and hard to track
  4. Wasting computing power is way too expensive
RDD is designed to address the abovementioned problems. In the following section, you will see the properties of RDD and how it solves these problems.
Read More

Spark 02: Scala Cheat Sheet for Java Developers

This article introduces Scala to those Java developers who don't know Scala. I assume here that you already know Java (preferably Java 8) so that you can compare the features of Scala with Java. Please be informed that this article is not an end to end Scala tutorial. I am covering only the fundamentals of Scala which are used in my Apache Spark tutorials.

Spark 02: Scala Cheat Sheet for Java Developers

First of all, remember that Scala is a JVM based language which is running on top of your regular Java Virtual Machine. The whole purpose of Scala is providing a convenient functional programming language (at that time Java 8 wasn't there). However since Scala is built on top of Java, you can access Java libraries and API from your Scala code.

To play with Scala, please setup Scala on IntelliJ IDEA or install a command line Scala version in your system.
Read More

Spark 01: Movie Rating Counter

Apache Spark is a must to know framework for big data developers. This is an attempt to write a series of articles on Apache Spark to train you from zero to hero. In this series of articles, I will use the latest Apache Spark release which is 2.4.0 as of 2019 January. In the first few articles, we will code and test Apache Spark on IntelliJ Idea. As you may already know, Apache Spark is developed using Scala and of course, there are APIs available for other languages including Java and Python. Still, Scala has preferred over other languages for its performance and compact code. Therefore, you need to prepare the environment first.

Spark 01: Movie Rating Counter

My articles will be based on Frank Kane's course on Udemy: Apache Spark 2 with Scala - Hands On with Big Data! I highly recommend his tutorial if you prefer for a video tutorial.

Setup the Environment

Apache Spark 2.4.0 depends on Scala 2.12 which in turn depends on Java 1.8. Scala is unlike Java not very version compatible language. Therefore, please take special care on choosing versions.

Step 1:
Install Oracle Java Development Kit 1.8 on your system. Linux users can follow this article: Install Oracle JDK 8 on Linux

Step 2:
Install the latest IntelliJ IDEA. Again, Linux users can follow my article: Install IntelliJ IDEA on Linux

Step 3:
Install Scala plugin in IntelliJ IDEA. Regardless of your operating system, you can follow the article Setup Scala on IntelliJ IDEA.

Read More

Android: Custom Font in Kotlin

Sometimes you may want to use custom font in your Android application for aesthetic requirement or to show a message in a different language. I have already written an article on how to use a Custom Font in Android four years ago. Since it has been a long time and a reader wondered if that code still works, I am writing this new article using Kotlin. However, the underlying technique hasn't change over the years and you can still use my previous article.

Step 1:
Create an Android application "Custom Font" with Kotlin support.
Read More

Contact Form

Name

Email *

Message *