The Optimized Row Columnar (ORC) file format provides a highly efficient way
to store Hive data. It was designed to overcome the limitations of the other Hive
file formats. Using ORC files improves performance when Hive is reading,
writing, and processing data. There are hundreds of computing engine from Hive to Presto to read and write
ORC files. When it comes to reading or writing ORC files using core Java,
there is no enough help except the
official document. This article is for you if you are looking forward to writing your own code
to read or write ORC files.
In this article, we will create a simple ORC writer and reader to write ORC
files and to read from ORC files. Later the ORC writer and the reader will be
enhanced to support any common ORC types with some minor optimizations.
Requirements:
- Oracle JDK 8 or the latest version
- Apache Maven
-
IntelliJ IDEA
or
Eclipse
with Maven support