Read Carbondata Table from Apache Hive

Apache Carbondata an indexed columnar data store heavily depends on Apache Spark but also supports other Big Data frameworks like Apache Hive and Presto. This article explains how to read a Carbondata table created in Apache Spark from Apache Hive in two sections: 1. How to create a table in HDFS using Apache Spark, 2. How to read the Carbondata table from Apache Hive.

Read Carbondata Table from Apache Hive
Requirements:
  • Oracle JDK 1.8
  • Apache Spark
  • Apache Hadoop (Carbondata officially support Hive 2.x. In this article, Apache Hadoop 2.7.7 is used)
  • Apache Hive (Carbondata officially support Hive 2.x. So better to stick to 2.x version. In this article, Apache Hive 2.3.6 is used to demonstrate the integration)
  • Carbondata libraries
Please follow the Integrate Carbondata with Apache Spark Shell article to compile Carbondata from source and integrate it with Apache Spark. This article is written based on the assumption that you have already followed all the steps from the above-mentioned article.

Read More

Integrate Carbondata with Apache Spark Shell

Apache Carbondata an indexed columnar data store solution for fast analytics on big data platform, e.g.Apache Hadoop, Apache Spark, etc. This article is written to provide a quick start guide on how to integrate Carbondata with Apache Spark Shell. Why another article while there is a quick start guide on the official website? Things are not always as smooth as expected. In my experience, integrating Carbondata with Apache Spark using pre-built binaries didn't work as expected. So here is the quick start tutorial.

Integrate Carbondata with Apache Spark Shell
Requirements:
Carbondata requires Java 1.7 or 1.8 to run and Apache Maven to build from source. Please make sure that you have Oracle JDK 1.8, supporting Apache Maven and Git to setup Carbondata. If you don't have Oracle JDK or Apache Maven installed in your system, please follow the given links below to install them first.

Read More

Install the latest Oracle JDK on Linux


Even though OpenJDK is available in Linux repositories, some applications strictly require Oracle Java Development Kit. This article shows you how to manually install Oracle JDK 13 on your Linux system. This article uses JDK 13$java_update_no to demonstrate the installation. In the provided commands, replace the version specific paths and file names according to your downloaded version.
Oracle provides deb and rpm installers
If your Linux distribution is using DEB package format like Debian, you can download and install the $java_version$java_update_no_linux-x64_bin.deb file using the following command:
sudo dpkg -i $java_version$java_update_no_linux-x64_bin.deb
If your  Linux distribution is using RPM package format like Cent OS, you can download and install the $java_version$java_update_no_linux-x64_bin.rpm file using the following command:
sudo rpm -ivh $java_version$java_update_no_linux-x64_bin.rpm

However, this article explains the manual installation method which is applicable for all Linux distributions out there. Personally, I prefer the manual installation because I have more control over the changes made in the system.

Install Oracle JDK 13 on Linux

Read More

Contact Form

Name

Email *

Message *