Thursday, August 28, 2014

Impala-shell may have control sequence in its output

Assume you have a table has three rows, what is the result in the output file /tmp/my_table_count? 3? Actually it is not. There is a control sequence "ESC[?1034h" on my terminal.
$ impala-shell -B -q "select count(1) from my_table" > /tmp/my_table_count
$ xxd /tmp/my_table_count
0000000: 1b5b 3f31 3033 3468 320a                 .[?1034h2.
It will cause a problem when you use the result in a script which tries to update a partition's numRows in Impala.
  local a=$(impala-shell -B -q "select count(1) from my_table where part_col='2014-08-28'")
  impala-shell -q "alter table my_table partition(part_col='2014-08-20') set tblproperties('numRows'='$a')"
If you run this script, you will get the wrong value for #Rows due to the escape control sequence.
Query: show table stats my_table
+------------+-------+--------+--------+--------------+---------+
| part_col   | #Rows | #Files | Size   | Bytes Cached | Format  |
+------------+-------+--------+--------+--------------+---------+
| 2014-08-28 | -1    | 2      | 2.87KB | NOT CACHED   | PARQUET |
| Total      | -1    | 2      | 2.87KB | 0B           |         |
+------------+-------+--------+--------+--------------+---------+
Returned 2 row(s) in 0.06s
You can fix it by unsetting TERM like this:
  local a=$(TERM= impala-shell -B -q "select count(1) from my_table where part_col='2014-08-28'")
  impala-shell -q "alter table my_table partition(part_col='2014-08-20') set tblproperties('numRows'='$a')"

No comments:

Post a Comment