Assume you have a table has three rows, what is the result in the output file /tmp/my_table_count? 3? Actually it is not. There is a control sequence "ESC[?1034h" on my terminal.
$ impala-shell -B -q "select count(1) from my_table" > /tmp/my_table_count
$ xxd /tmp/my_table_count
0000000: 1b5b 3f31 3033 3468 320a .[?1034h2.
It will cause a problem when you use the result in a script which tries to update a partition's numRows in Impala.
local a=$(impala-shell -B -q "select count(1) from my_table where part_col='2014-08-28'")
impala-shell -q "alter table my_table partition(part_col='2014-08-20') set tblproperties('numRows'='$a')"
If you run this script, you will get the wrong value for #Rows due to the escape control sequence.
Query: show table stats my_table
+------------+-------+--------+--------+--------------+---------+
| part_col | #Rows | #Files | Size | Bytes Cached | Format |
+------------+-------+--------+--------+--------------+---------+
| 2014-08-28 | -1 | 2 | 2.87KB | NOT CACHED | PARQUET |
| Total | -1 | 2 | 2.87KB | 0B | |
+------------+-------+--------+--------+--------------+---------+
Returned 2 row(s) in 0.06s
You can fix it by unsetting TERM like this:
local a=$(TERM= impala-shell -B -q "select count(1) from my_table where part_col='2014-08-28'")
impala-shell -q "alter table my_table partition(part_col='2014-08-20') set tblproperties('numRows'='$a')"
No comments:
Post a Comment