Presto SQL for Newbies

In the series of Presto SQL articles, this article explains what is Presto SQL and how to use Presto SQL for newcomers. Presto is a high performance, distributed SQL query engine for big data. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB. One can even query data from multiple data sources within a single query.

Let's begin with what is Presto. Presto is a massively parallel programming engine that allows users to execute against any databases. If you define a database as software that stores data and processes it, Presto does not fall under the database category. Rather I prefer to call it a data or computing engine because Presto itself does not provide a storage solution. Instead, Presto focuses on how to query different data sources such as MySQL, SQLServer, Hive, Cassandra even possibly CSV files. Presto achieves such flexibility of querying anything using its plugin architecture as shown below:

In the future if you find a new database to be supported by Presto, you only need to write a new connector to connect that database with Presto. Though it looks like connectors doing the heavy lifting here, actually connectors only provide simple API to connect to the database. For example, connectors tell Presto what are the tables available in the underlying database and how to read raw data from them. Given that information, Presto decides how to process those data and respond to a user's request. The coolest thing here is that you can join a table from one database with a table in another database. For example, consider a bank has account details in MySQL database and transaction history in Hive, they don't need to migrate data from one database to another to join them. Presto supports SQL like the following query out of the box:

SELECT acc.account_no as account_no, trans.amount
FROM mysql.bank.accounts acc LEFT JOIN hive.bank.transactions trans
    ON acc.account_no = trans.account_no
WHERE trans.amount > 1000;


Read More

Install Oracle JDK 14 on Linux


Even though OpenJDK is available in Linux repositories, some applications strictly require Oracle Java Development Kit. This article shows you how to manually install Oracle JDK $java_version on your Linux system. This article uses JDK 14$java_update_no to demonstrate the installation. In the provided commands, replace the version specific paths and file names according to your downloaded version.
Oracle provides deb and rpm installers
If your Linux distribution is using DEB package format like Debian, you can download and install the jdk-$java_version$java_update_no_linux-x64_bin.deb file using the following command:
sudo dpkg -i jdk-$java_version$java_update_no_linux-x64_bin.deb
If your  Linux distribution is using RPM package format like Cent OS, you can download and install the jdk-$java_version$java_update_no_linux-x64_bin.rpm file using the following command:
sudo rpm -ivh jdk-$java_version$java_update_no_linux-x64_bin.rpm

However, this article explains the manual installation method which is applicable for all Linux distributions out there. Personally, I prefer the manual installation because I have more control over the changes made in the system.

Install Oracle JDK $java_version on Linux

Read More

Setup Presto SQL Development Environment

Presto SQL a massively parallel processing big-data engine grasps the attention of many big-data developers. This article is for those who like to set up a development environment for the Presto SQL community edition. The below-mentioned steps are applicable for any Presto variations including Presto DB with class names and file names replaced by equivalent names.

Requirements:

Setup Presto SQL Development Environment


Read More

Install the latest Oracle JDK on Linux


Even though OpenJDK is available in Linux repositories, some applications strictly require Oracle Java Development Kit. This article shows you how to manually install Oracle JDK $java_version on your Linux system. This article uses JDK 14$java_update_no to demonstrate the installation. In the provided commands, replace the version specific paths and file names according to your downloaded version.
Oracle provides deb and rpm installers
If your Linux distribution is using DEB package format like Debian, you can download and install the jdk-$java_version$java_update_no_linux-x64_bin.deb file using the following command:
sudo dpkg -i jdk-$java_version$java_update_no_linux-x64_bin.deb
If your  Linux distribution is using RPM package format like Cent OS, you can download and install the jdk-$java_version$java_update_no_linux-x64_bin.rpm file using the following command:
sudo rpm -ivh jdk-$java_version$java_update_no_linux-x64_bin.rpm

However, this article explains the manual installation method which is applicable for all Linux distributions out there. Personally, I prefer the manual installation because I have more control over the changes made in the system.

Install Oracle JDK $java_version on Linux

Read More

Contact Form

Name

Email *

Message *