Just found this article Batch fetching - optimizing object graph loading
We have some tables with 15K ~ 20K partitions. If I run a query scanning a lot of partitions, Hive could use more than 10 minutes to commit the mapred job.
The problem is caused by ObjectStore.getPartitionsByNames when Hive semantic analyzer tries to prune partitions. This method sends a lot of queries to our MySQL database to retrieve ALL information about partitions. Because MPartition and MStroageDescriptor are converted into Partition and StorageDescriptor, every field will be accessed during conversion, in other words, even the fields has nothing to do with partition pruning, such as BucketCols. In our case, 10 queries for each partition will be sent to the database and each query may take 40ms.
This is known ORM 1+N problem. But it is really bad user experience.
Actually we assembly Partition objects manually, it would only need about 10 queries for a group of partitions (default size is 300). In our environment, it only needs 40 seconds for 30K partitions: 30K / 300 * 10 * 40.
I tried to this way:
- Fetch MPartition with fetch group and fetch_size_greedy, so one query can get MPartition's primary fields and MStorageDescriptor cached.
- Get all descriptors into a list "msds", run another query to get MStorageDescriptor with filter like "msds.contains(this)", all cached descriptors will be refreshed in one query instead of n queries.
This works well for 1-1 relations, but not on 1-N relation like MPartition.values. I didn't find a way to populate those fields in just one query.
Because JDO mapping doesn't work well in the conversion (MPartition - Partition), I'm wondering if it is worth doing like this:
- Query each table in SQL directly PARTITIONS, SDS, etcs.
- Assembly Partition objects
This is a hack and the code will be really bad. But I didn't find JDO support "FETCH JOIN" or "Batch fetch".