Tag: ORC
-
Empirical: Parquet vs ORC compression benchmarks
Parquet and ORC are the heavyweights of columnar storage in modern data engineering, each designed for high-performance analytics on massive datasets. In this post, we empirically benchmark both formats under post-2024 workloads, comparing compression ratios, read/write throughput, CPU utilization, and query latency across common engines like Spark, Trino, and DuckDB. The results shed light on…
