Presto SQL: Types of Joins

SQL Join is one of the most important and expensive SQL operation and require deep understanding from database engineers to write efficient SQL queries. From database engineers' perspective, understanding how join operation works help them to optimize them for efficient execution. This article, explains the join operations supported in the open source distributed computing engine: Presto SQL. This article is based on now archived prestodb.rocks blog which I referred to learn the Join algorithms of Presto.



Almost all well known databases support the following five types of joins: Cross Join, Inner Join, Left Join, Right Join and Full Join. This article walks through these five types of joins from the end users perspective and finally explains two more internal joins which are not obvious to the end user.

Types of Joins

Cross Join

Cross Join returns the Cartesian product of both tables without any conditions. In other words, every single row from the left table will be joined with every single row from the right table. Cross join is more expensive than other joins as it does Cartesian product between two tables.
presto:tiny> SELECT * FROM (VALUES 1, 2) t("left"), (VALUES 3, 4) u("right");
 left | right 
------+-------
    1 |     3 
    2 |     3 
    1 |     4 
    2 |     4 
(4 rows)


Inner Join

Inner join joins two tables under a condition. Every row from a table will be joined with every row from the other table only when the given condition is satisfied. In Presto SQL, INNER JOIN, JOIN and , separated tables with a WHERE clause all treated as inner join. If the query contains comma separated tables with where clause as given below, it will be read as a Cross Join with a filter applied after the join operation and internally rewritten by a Presto optimizer to inner join query.
presto:tiny> SELECT * FROM (VALUES 1, 2) t("left"), (VALUES 1, 1, 2) u("right") 
             WHERE t."left" = u."right";
 left | right 
------+-------
    1 |     1 
    1 |     1 
    2 |     2 
(3 rows)


Left Outer Join

Like inner join, Left Outer join joins every row from the left table with matching rows from the right table. If no match found for a row from the left table, it will be completed with null values filled in columns from the right table. In Presto SQL the keyword OUTER is optional in the LEFT OUTER JOIN operation. In other words LEFT JOIN and LEFT OUTER JOIN mean the same.
presto:tiny> SELECT * FROM (VALUES 1, 2) t("left") 
             LEFT OUTER JOIN (VALUES 1, 1) u("right") 
             ON t."left" = u."right";
 left | right
------+-------
    1 |     1
    1 |     1
    2 | NULL
(3 rows)


Right Outer Join

Right outer join joins every row from the right table with the left table similar to the Left Outer Join. If no match found for a row from the right table, it will be completed with null values for columns from the left table. In Presto SQL the keyword OUTER is optional in the RIGHT OUTER JOIN operation. In other words RIGHT JOIN and RIGHT OUTER JOIN mean the same.
presto:tiny> SELECT * FROM (VALUES 1, 2) t("left") 
             RIGHT OUTER JOIN (VALUES 1, 2, 3) u("right") 
             ON t."left" = u."right";
 left | right 
------+-------
    1 |     1 
    2 |     2 
 NULL |     3 
(3 rows)


Full Outer Join

Full Outer Join can be seen as the combination of Left Outer Join and Right Outer Join. It joins every row from each side with matching rows from the other side. If no match found for a row from the left table, it will be completed with null values for columns from the right table. Similarly if there is no match for a row from the right side, it will be completed with null values for the columns from the left table. In Presto SQL the keyword OUTER is optional in the FULL OUTER JOIN operation. In other words FULL JOIN and FULL OUTER JOIN mean the same.
presto:tiny> SELECT * FROM (VALUES 1, 2) t("left") 
             FULL OUTER JOIN (VALUES 1, 3) u("right") 
             ON t."left" = u."right";
 left | right
------+-------
    1 |     1
    2 | NULL
 NULL |     3
(3 rows)

In addition to the above mentioned join types, there are two join types named: Semi Join and Anti Join. These two joins are used to support some other SQL operations which an end user might not consider as a join.


Semi Join

Semi Join evaluates the join criteria as any other join criteria but only returns the records from one table if there is a match. For example, the IN/EXISTS operations are performed using a semi join internally. There are two variants of semi join: Left Semi Join and Right Semi Join. As the name suggests, Left Semi Join evaluates the condition and returns rows from the left table if there is a match for those rows. Similarly, the Right Semi Join returns only the matching rows from the right table. The following query using the IN predicate is mapped to a Semi Join operation internally in Presto database engine.
presto:tiny> SELECT * FROM (VALUES 1, 2) t(one) WHERE t.one IN (VALUES 1, 1, 3);
 one 
-----
   1 
(1 row)
As you can see, there are two tables involved in the SQL query but we want the rows only from the left table.


Anti Join

Anti Join is another implicit join to return rows from a table when the join criteria does not match. It is similar to Semi Join in returning only rows from one table but the Join criteria is opposite to the Semi Join. For example, the following NOT IN query can be evaluated using an Anti Join.
presto:tiny> SELECT * FROM (VALUES 1, 2) t(one) WHERE t.one NOT IN (VALUES 1, 1, 3);
 one 
-----
   2
(1 row)

Though Anti Join can be seen as a different join type, internally Presto treats it as a Semi Join by negating the join criteria. Not only the Semi Join, any join operation can be composed using other joins. For example, a left outer join can be rewritten as a right outer join by reordering the tables. The flexibility of rewriting a join into another join allows database engineers to do several cost based optimizations on the logical plan built from the submitted SQL query. Even for end users, if the database cannot perform cost based optimizations and join reordering automatically due to statistics limitations and configurations, the knowledge on how to rewrite a naive join operation into an optimal form can save few minutes to hours of execution time.

More technical details about how Presto executes these joins internally will be covered in another article. If you have any questions regarding join types or if you like to share your feedback, please comment below.
Previous
Next Post »

Contact Form

Name

Email *

Message *