I needed to figure out who is called
HiveMetaStore.get_partition_with_auth because it scans each partition one by one, and will be a performance issue when you have thousands partitions.
12/02/29 12:39:14 INFO ppd.OpProcFactory: ((local_dt = '2011-11-01') and (local_dt < '2011-11-08'))
12/02/29 12:39:14 INFO ppd.OpProcFactory: Processing for TS(293)
12/02/29 12:39:14 INFO ppd.OpProcFactory: Pushdown Predicates of TS For Alias : lz.lz_omniture_hit
12/02/29 12:39:14 INFO ppd.OpProcFactory: ((local_dt = '2011-11-01') and (local_dt < '2011-11-08'))
12/02/29 12:39:14 INFO metastore.HiveMetaStore: 27: get_partition_names : db=lz tbl=lz_omniture_hit
12/02/29 12:39:14 INFO HiveMetaStore.audit: ugi=platdev ip=unknown-ip-addr cmd=get_partition_names : db=lz tbl=lz_omniture_hit
12/02/29 12:39:14 INFO metastore.HiveMetaStore: 27: get_partition_with_auth : db=lz tbl=lz_omniture_hit[2011-01-01,AT]
12/02/29 12:39:14 INFO HiveMetaStore.audit: ugi=platdev ip=unknown-ip-addr cmd=get_partition_with_auth : db=lz tbl=lz_omniture_hit[2011-01-01,AT]
12/02/29 12:39:14 INFO metastore.HiveMetaStore: 27: get_partition_with_auth : db=lz tbl=lz_omniture_hit[2011-11-01,AR]
12/02/29 12:39:14 INFO HiveMetaStore.audit: ugi=platdev ip=unknown-ip-addr cmd=get_partition_with_auth : db=lz tbl=lz_omniture_hit[2011-11-01,AR]
12/02/29 12:39:14 INFO metastore.HiveMetaStore: 27: get_partition_with_auth : db=lz tbl=lz_omniture_hit[2011-11-01,AT]
12/02/29 12:39:14 INFO HiveMetaStore.audit: ugi=platdev ip=unknown-ip-addr cmd=get_partition_with_auth : db=lz tbl=lz_omniture_hit[2011-11-01,AT]
12/02/29 12:39:14 INFO metastore.HiveMetaStore: 27: get_partition_with_auth : db=lz tbl=lz_omniture_hit[2011-11-01,AU]
12/02/29 12:39:14 INFO HiveMetaStore.audit: ugi=platdev ip=unknown-ip-addr cmd=get_partition_with_auth : db=lz tbl=lz_omniture_hit[2011-11-01,AU]
12/02/29 12:39:14 INFO metastore.HiveMetaStore: 27: get_partition_with_auth : db=lz tbl=lz_omniture_hit[2011-11-01,BE]
Unfortunately, I cannot use Eclipse on the machine where I run beeswax server because X window is not installed. JDB is the only choice, but it really not a good tool for debugging.
Here is how I use JDB to debug Beeswax server:
- Start Beeswax server in debug mode, add this in apps/beeswax/beeswax_server.sh before hadoop jar is called
export HADOOP_OPTS="-Dlog4j.configuration=log4j.properties -Xdebug -Xrunjdwp:transport=dt_socket,address=4444,server=y,suspend=n"
...
echo Executing $HADOOP_HOME/bin/hadoop jar $BEESWAX_JAR "$@"
exec $HADOOP_HOME/bin/hadoop jar $BEESWAX_JAR "$@"
- Start JDB using jdb -attach 4444
- Add source file paths
> use /etc/hive/conf:$HOME/git/hue/apps/beeswax/src/beeswax/../../../../desktop/conf:/usr/lib/hadoop/conf:/usr/lib/hadoop:$HOME/git/hue/apps/beeswax/java/src/main:$HOME/hadoop/hive-0.7.1-cdh3u1/src/ql:$HOME/hadoop/hive-0.7.1-cdh3u1/src/metastore:$HOME/hadoop/hive-0.7.1-cdh3u1/src/common
- List all threads: threads
Beeswax-8[1] threads
Group system:
(java.lang.ref.Reference$ReferenceHandler)0xec0 Reference Handler cond. waiting
(java.lang.ref.Finalizer$FinalizerThread)0xebf Finalizer cond. waiting
(java.lang.Thread)0xebe Signal Dispatcher running
Group main:
(java.lang.Thread)0xec1 main running
(java.util.TimerThread)0xebd Timer thread for monitoring ugi cond. waiting
(java.lang.Thread)0xebc MetaServerThread running
(java.lang.Thread)0xebb Evicter sleeping
(java.lang.Thread)0xeba pool-2-thread-1 running
(org.apache.hadoop.util.Daemon)0xeb9 LeaseChecker sleeping
(java.lang.Thread)0xeb8 pool-3-thread-1 running
(org.apache.hadoop.util.Daemon)0xeb7 LeaseChecker sleeping
(org.apache.hadoop.util.Daemon)0xeb6 LeaseChecker sleeping
(org.apache.hadoop.util.Daemon)0xeb5 LeaseChecker sleeping
(org.apache.hadoop.util.Daemon)0xed1 LeaseChecker sleeping
(java.lang.Thread)0xef6 Beeswax-8 running (at breakpoint)
(org.apache.hadoop.ipc.Client$Connection)0xef9 IPC Client (47) connection to chelhadedw002/10.184.39.97:8020 from platdev cond. waiting
(java.lang.Thread)0xefa sendParams-5 cond. waiting
- Go to a thread: thread 0xebc
- Add breakpoints:
- Stop at the specified line:
stop at com.cloudera.beeswax.BeeswaxServiceImpl:689
- Stop in a method:
stop in com.cloudera.beeswax.BeeswaxServiceImpl.query
- Stop in a method of nested class:
stop in org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition_with_auth
Here is what I found:
MetaServerThread[1] stop in org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition_with_auth
Set breakpoint org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition_with_auth
MetaServerThread[1] cont
>
Breakpoint hit: "thread=Beeswax-8", org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition_with_auth(), line=1,463 bci=0
Beeswax-8[1] where
[1] org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partition_with_auth (HiveMetaStore.java:1,463)
[2] org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getPartitionWithAuthInfo (HiveMetaStoreClient.java:657)
[3] org.apache.hadoop.hive.ql.metadata.Hive.getPartition (Hive.java:1,293)
[4] org.apache.hadoop.hive.ql.metadata.Hive.getPartition (Hive.java:1,258)
[5] org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune (PartitionPruner.java:229)
[6] org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan (GenMapRedUtils.java:551)
[7] org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan (GenMapRedUtils.java:514)
[8] org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan (GenMapRedUtils.java:125)
[9] org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process (GenMRRedSink1.java:76)
[10] org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch (DefaultRuleDispatcher.java:89)
[11] org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch (DefaultGraphWalker.java:88)
My Hive version is
hadoop-hive-cdh3u1. The code of
PartitionPruner is not good enough and it was replaced in
hadoop-hive-cdh4b1.
No comments:
Post a Comment