Apache Maven for Beginners
Install Oracle JDK 12 on Linux
Apache Spark Tutorial
Install Ballerina on Linux
Complex Event Processing - An Introduction

Presto SQL: Join Algorithms

Presto is a distributed big data SQL engine initially developed by Facebook and later open-sourced and being led by the community. The last article Presto SQL: Types of Joins covers the fundamentals of join operators available in Presto and how they can be used in SQL queries. With that knowledge, you can now learn the internals of Presto and how it executes join operations internally. This article presents how Presto executes join operations and the algorithms used to join tables.


Read More

Presto SQL: Types of Joins

SQL Join is one of the most important and expensive SQL operation and require deep understanding from database engineers to write efficient SQL queries. From database engineers' perspective, understanding how join operation works help them to optimize them for efficient execution. This article, explains the join operations supported in the open source distributed computing engine: Presto SQL. This article is based on now archived prestodb.rocks blog which I referred to learn the Join algorithms of Presto.



Read More

Read Carbondata Table from Apache Hive

Apache Carbondata an indexed columnar data store heavily depends on Apache Spark but also supports other Big Data frameworks like Apache Hive and Presto. This article explains how to read a Carbondata table created in Apache Spark from Apache Hive in two sections: 1. How to create a table in HDFS using Apache Spark, 2. How to read the Carbondata table from Apache Hive.

Read Carbondata Table from Apache Hive
Requirements:
  • Oracle JDK 1.8
  • Apache Spark
  • Apache Hadoop (Carbondata officially support Hive 2.x. In this article, Apache Hadoop 2.7.7 is used)
  • Apache Hive (Carbondata officially support Hive 2.x. So better to stick to 2.x version. In this article, Apache Hive 2.3.6 is used to demonstrate the integration)
  • Carbondata libraries
Please follow the Integrate Carbondata with Apache Spark Shell article to compile Carbondata from source and integrate it with Apache Spark. This article is written based on the assumption that you have already followed all the steps from the above-mentioned article.

Read More

Integrate Carbondata with Apache Spark Shell

Apache Carbondata an indexed columnar data store solution for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc. This article is written to provide a quick start guide on how to integrate Carbondata with Apache Spark Shell. Why another article while there is a quick start guide on the official website? Things are not always as smooth as expected. In my experience, integrating Carbondata with Apache Spark using pre-built binaries didn't work as expected. So here is the quick start tutorial.

Integrate Carbondata with Apache Spark Shell
Requirements:
Carbondata requires Java 1.7 or 1.8 to run and Apache Maven to build from source. Please make sure that you have Oracle JDK 1.8, supporting Apache Maven and Git to setup Carbondata. If you don't have Oracle JDK or Apache Maven installed in your system, please follow the given links below to install them first.

Read More

Install the latest Oracle JDK on Linux


Even though OpenJDK is available in Linux repositories, some applications strictly require Oracle Java Development Kit. This article shows you how to manually install Oracle JDK 13 on your Linux system. This article uses JDK 13$java_update_no to demonstrate the installation. In the provided commands, replace the version specific paths and file names according to your downloaded version.
Oracle provides deb and rpm installers
If your Linux distribution is using DEB package format like Debian, you can download and install the $java_version$java_update_no_linux-x64_bin.deb file using the following command:
sudo dpkg -i $java_version$java_update_no_linux-x64_bin.deb
If your  Linux distribution is using RPM package format like Cent OS, you can download and install the $java_version$java_update_no_linux-x64_bin.rpm file using the following command:
sudo rpm -ivh $java_version$java_update_no_linux-x64_bin.rpm

However, this article explains the manual installation method which is applicable for all Linux distributions out there. Personally, I prefer the manual installation because I have more control over the changes made in the system.

Install Oracle JDK 13 on Linux

Read More

Install MySQL 8 on Ubuntu/Linux Mint

Ubuntu official software repository provides MySQL 5.x which can be installed by following the article: Install MySQL with phpMyAdmin on Ubuntu. However the latest release of MySQL: 8.x, requires you to manually add the software repository into your system which makes the installation process little tricky. This article walks you through the end-to-end installation process of MySQL 8.


Read More

Android: List External Storage Files

This article explains how to list files from the external storage (SD Card) in Android. Though you can list files recursively using a simple method, the new Runtime Permission Model introduced in Android 6 makes it a little difficult. Let's dive into the code and see how we can list all the files recursively.

Android: List External Storage Files

As I mentioned earlier, I am using Kotlin for Android development since it is the future of Android. If you are using Java, just copy and paste the code into your class method by method. The Android Studio will translate the method into Java for you.
Read More

ANTLR Hello World! - Arithmetic Expression Parser

ANTLR Hello World! - Arithmetic Expression Parser

Ever wondered how all these programming languages understand what you write? This article reveals the truth: Language Parsing. It is often referred to as parsing, syntax analysis, or syntactic analysis. Regardless of the term, it is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The following diagram depicts the language parsing process:

Language Parser

As you can see, the Language Parser (which is part of the compiler) takes an input (which is the source code), validates it against the Language Grammar and produces an Abstract Syntax Tree (commonly known as AST which is representing the source code in a tree structure).

ANTLR (ANother Tool for Language Recognition) is a tool to define such grammar and to build a parser automatically using that grammar. It also provides two high-level design patterns to analyze the AST: Visitor and Listener. ANTLR is being used by several languages and frameworks including Ballerina, Siddhi, and Presto SQL. This article introduces ANTLR to you using a hello world application to evaluate basic mathematical expressions as a string.

Read More

Install the latest Eclipse on Linux

This article shows you the way to install the latest version of Eclipse on Linux. There are other ways to install Eclipse using scripts to automate the installation. However, I prefer the manual installation method explained in this article so that you know where your files go. Later if you want to remove the Eclipse, it is just two commands as explained at the end of the article.

If you do not have Java in your system, follow this link and install the Java first.


Step 1:
Download the desired version of Eclipse from the official site:



Step 2:
Open the Terminal (Ctrl + Alt + T) and enter the following command to change the directory.
cd /opt

Step 3:
Enter the command given below to extract the Eclipse from ~/Downloads directory. If your downloaded file is in any other directory, replace the last parameter by the actual file path.
sudo tar -xvzf ~/Downloads/eclipse-jee-2019-03-R-linux-gtk-x86_64.tar.gz

Step 4:
Open another Terminal (Ctrl + Alt + T) and enter the following command to create a shortcut file for eclipse.
gedit eclipse.desktop

Step 5:
In the opened gedit, copy and paste the following text.
[Desktop Entry]
Name=Eclipse
Type=Application
Exec=/opt/eclipse/eclipse
Terminal=false
Icon=/opt/eclipse/icon.xpm
Comment=Integrated Development Environment
NoDisplay=false
Categories=Development;IDE;
Name[en]=Eclipse
Name[en_US]=Eclipse


Step 6:
Save and close the gedit.

Step 7:
Enter the following command in the terminal to install the shortcut.
sudo desktop-file-install eclipse.desktop

Now search for Eclipse in the dashboard and open it.



Upgrade Eclipse

If you have already installed Eclipse using the above method and would like to upgrade the Eclipse to the latest version, just remove the Eclipse from /opt director and follow Steps 1 to 3 from the installation process.
sudo rm -rf /opt/eclipse/eclipse.desktop



Remove Eclipse

Removing the Eclipse installed as described in this article is just two lines of commands.

Step 1:
First, remove the menu entry you created in Step 7.
sudo rm /usr/share/applications/

Step 2:
Delete the /opt/eclipse folder.
sudo rm -rf /opt/eclipse

Read More

Javalin: A Tiny but Mighty Framework

Two years ago, I wrote an article Microservices in a minute using the open source framework MSF4J. Today I came across another framework Javalin: another lightweight framework to develop lightweight web applications with less or no effort. We already have plenty of web frameworks including the shining star Spring. What makes Javalin different is its simplicity. In addition, it can be used as a microservice framework or a tiny web framework to serve a web application with static files. In Javalin developers' words:

Javalin’s main goals are simplicity, a great developer experience, and first-class interoperability between Kotlin and Java.

Comparing Javalin with Spring is like comparing a shaving blade with a Wenger 16999 Swiss Army Knife Giant, but it does what it is supposed to do. If you want to quickly add a REST endpoint for a quick demo or if you just need a simple web framework without any additional gimmicks like Dependency Injection or Object Relational Mapping, consider Javalin. It is easy to learn and lighter to run.


In this article, you will see how to use Javalin as a web framework to serve a contact-us page and how to build a CRUD micro-service using Javalin.

Requirements:

Read More

Serve TensorFlow Models in Java

TensorFlow is a famous machine learning framework from Google and a must to know asset for machine learning engineers. Even though Python is recommended to build TensorFlow models, Google offers Java API to use TensorFlow in Java. Still, Python is the easiest language to build TensorFlow models, even for Java developers (learn Python, my friend). However, enterprise applications developed in Java may require the artificial intelligence offered by a trained TensorFlow model. In this article, you will learn how to load and use a simple TensorFlow model exported from Python.

Serve TensorFlow Models in Java
Read More

Spark 06: Broadcast Variables

If you read the Spark 04: Key-Value RDD and Average Movie Ratings article, you might wonder what to do with popular movie IDs printed at the end. A data analyst cannot ask his/her users to manually check those IDs in a CSV file to find the movie name. In this article, you will learn how to map those movie IDs to movie names using Apache Spark's variable broadcasting.

Spark 06: Broadcast Variables

Suppose you want to share a read-only data that can fit into memory with every worker in your Spark cluster, broadcast that data. The broadcasted variable will be distributed only once and cached in every worker node so that it can be reused any number of times. More about broadcasting will be covered later in this article after the code example.
Read More

Apache Maven for Beginners

Apache Maven is a build tool widely being used by Java developers to manage project dependencies, control build process and automate tests. Apache Maven makes our life easier especially in building a complex Java project. However, beginners stay away from Apache Maven as I did years ago just because they find it complex to learn and use. This article simplifies the concept of Apache Maven and introduces Maven in a smooth way to beginners. In this article, you will see how you can use Apache Maven to manage your project dependencies using a simple Java project as an example. The article is structured into two main topics: Apache Maven in Eclipse and Apache Maven in IntelliJ IDEA. Of course, you can use Apache Maven without any IDEs. However, I stick with IDEs to make it simple for beginners. Other applications of Apache Maven like build management and test automation will be covered in another article.
Let's begin with manual dependency management using a simple calculator application. Suppose you want to develop a Calculator that receives a simple arithmetic expression like "2 + 3 * 5" as input and prints the output in the console. It is a complex task to evaluate such a String input and calculate the result by ourselves. Fortunately, there is a library: exp4j which can evaluate a String expression and return the output.
Read More

Contact Form

Name

Email *

Message *