Presto SQL for Newbies

In the series of Presto SQL articles, this article explains what is Presto SQL and how to use Presto SQL for newcomers. Presto is a high performance, distributed SQL query engine for big data. Its architecture allows users to query a variety of data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Kafka, and MongoDB. One can even query data from multiple data sources within a single query.

Let's begin with what is Presto. Presto is a massively parallel programming engine that allows users to execute against any databases. If you define a database as software that stores data and processes it, Presto does not fall under the database category. Rather I prefer to call it a data or computing engine because Presto itself does not provide a storage solution. Instead, Presto focuses on how to query different data sources such as MySQL, SQLServer, Hive, Cassandra even possibly CSV files. Presto achieves such flexibility of querying anything using its plugin architecture as shown below:

In the future if you find a new database to be supported by Presto, you only need to write a new connector to connect that database with Presto. Though it looks like connectors doing the heavy lifting here, actually connectors only provide simple API to connect to the database. For example, connectors tell Presto what are the tables available in the underlying database and how to read raw data from them. Given that information, Presto decides how to process those data and respond to a user's request. The coolest thing here is that you can join a table from one database with a table in another database. For example, consider a bank has account details in MySQL database and transaction history in Hive, they don't need to migrate data from one database to another to join them. Presto supports SQL like the following query out of the box:

SELECT acc.account_no as account_no, trans.amount
FROM mysql.bank.accounts acc LEFT JOIN hive.bank.transactions trans
    ON acc.account_no = trans.account_no
WHERE trans.amount > 1000;


Read More

Contact Form

Name

Email *

Message *