Further Resources
I hope that you found this book useful and that you now have a better understanding of the internals of query engines. If there are topics that you feel haven’t been covered adequately, or at all, I would love to hear about it so I can consider adding additional content in a future revision of this book.
Feedback can be posted on the public forum on the Leanpub site, or you can message me directly via twitter at @andygrove_io.
Open-Source Projects
There are numerous open-source projects that contain query engines and working with these projects is a great way to learn more about the topic. Here are just a few examples of popular open-source query engines.
- Apache Arrow
- Apache Calcite
- Apache Drill
- Apache Hadoop
- Apache Hive
- Apache Impala
- Apache Spark
- Facebook Presto
- NVIDIA RAPIDS Accelerator for Apache Spark
YouTube
I only recently discovered Andy Pavlo’s lecture series, which is available on YouTube (here). This covers much more than just query engines, but there is extensive content on query optimization and execution. I highly recommend watching these videos.
Sample Data
Earlier chapters reference the New York City Taxi & Limousine Commission Trip Record Data data set. The yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
When this book was first published, the data was provided in CSV format, but has now been converted to Parquet format. It is still possible to find the CSV versions of these files online. As of December 2025, the following locations contain this data:
- https://github.com/DataTalksClub/nyc-tlc-data/releases
- https://catalog.data.gov/dataset/2019-yellow-taxi-trip-data
- https://www.kaggle.com/code/haydenbailey/newyork-yellow-taxi
The KQuery project contains source code for converting these CSV files into Parquet format. This book is also available for purchase in ePub, MOBI, and PDF format from https://leanpub.com/how-query-engines-work
Copyright © 2020-2025 Andy Grove. All rights reserved.