Understanding the Parquet file format
Published: September 27, 2021.
Apache Parquet is a column storage file format used by many Hadoop systems. This post describes what Parquet is and the tricks it uses to minimise file size. We also discuss how to use Parquet, within an R workflow.
