关于hive数据定义和数据操作(四详解Hive数据类型)

关于hive数据定义和数据操作(四详解Hive数据类型)(1)

Hive中数据类型分为数值类型、字符串类型、日期类型、复合类型以及其他类型。下面分别进行介绍。

1.Hive的原生数据类型1.1数值类型(7种)
  • TINYINT (1-byte signed integer, from -128 to 127)
  • SMALLINT (2-byte signed integer, from -32,768 to 32,767)
  • INT/INTEGER (4-byte singed integer, from -2,147,483,648 to 2,147,483,647)
  • BIGINT(8-byte singe integer, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)
  • FLOAT(4-byte single precision floating point number)
  • DOUBLE(8-bye double precision floating point number)
  • DECIMAL
  • 1.2.日期时间类型
  • TIMESTAMP: 格式yyyy-MM-dd HH:mm:ss.fffffffff, 小数后为九位,也就是说精度为纳秒级别。
  • DATE: DATE类型描述年月日,格式为yyyy-MM-dd。 范围为0000.01.01~9999.12.31。
  • 1.3.字符串类型
  • STRING
  • 最为常用,批示字符串类型。

  • VARCHAR
  • VARCHAR的长度指示器范围为1~65535,如果值长度超过定义长度,该字符串会被自动截断

  • CHAR
  • CHAR的长度是固定的,不足部分使用空格补足。CHAR的长度指示器范围为1~255

    2.Hive复合数据类型
  • 数组类型: ARRAY<data_type>
  • map类型: MAP<primitive_type, data_type>
  • 结构体类型:STRUCT<col_name : data_type [COMMENT col_comment], ...>
  • Union类型: UNIONTYPE<data_type, data_type, ...>
  • 3.Hive数据类型的使用
    • 原生类型

    hive> create table t_test(a string, b int, c bigint, d float, e double, f tinyint, g smallint);

    • 日期类型

    1,zhangsan,1986-09-09 2,lisi,1988-02-28 3,wangwu,1992-17-16 hive> create table t_customer(id int, name string, birthday date) row format delimited fields terminated by ','; # 导入数据 load data local inpath '/root/customer.txt' into table t_customer; # 查看表结构 hive> desc t_customer; OK id int name string birthday date Time taken: 0.115 seconds, Fetched: 3 row(s) # 查看数据 hive (mydb)> select * from t_customer; OK t_customer.id t_customer.name t_customer.birthday 1 zhangsan 1986-09-09 2 lisi 1988-02-28 3 wangwu 1993-05-16 Time taken: 0.228 seconds, Fetched: 3 row(s)

    • 复合数据类型 ARRAY<data_type>

    战狼2,吴京:吴刚:龙母,2017-08-16 三生三世十里桃花,刘亦菲:痒痒,2017-08-20 知否知否应是绿肥红瘦,赵丽颖:李易峰,2017-12-20 #建表 create table t_movie(name string, actors array<string>,publish_date date) row format delimited fields terminated by ',' collection items terminated by ':'; # 导入数据 hive (mydb)> load data local inpath '/home/hadoop/movie.txt' into table t_movie; Loading data to table mydb.t_movie OK Time taken: 0.385 seconds # 查询数据 hive (mydb)> select * from t_movie; OK t_movie.name t_movie.actors t_movie.publish_date 战狼2 ["吴京","吴刚","龙母"] 2017-08-16 三生三世十里桃花 ["刘亦菲","痒痒"] 2017-08-20 知否知否应是绿肥红瘦 ["赵丽颖","李易峰"] 2017-12-20 Time taken: 0.227 seconds, Fetched: 3 row(s) hive (mydb)> select name, actors[0] as main_actor from t_movie; OK name main_actor 战狼2 吴京 三生三世十里桃花 刘亦菲 知否知否应是绿肥红瘦 赵丽颖 Time taken: 0.737 seconds, Fetched: 3 row(s) hive (mydb)> select name, actors from t_movie where array_contains(actors,'吴刚'); OK name actors 战狼2 ["吴京","吴刚","龙母"] Time taken: 0.374 seconds, Fetched: 1 row(s) hive (mydb)> select name, size(actors) as size from t_movie; OK name size 战狼2 3 三生三世十里桃花 2 知否知否应是绿肥红瘦 2 Time taken: 0.27 seconds, Fetched: 3 row(s)

    • 复合数据类型 MAP<primitive_type, data_type>

    # 有如下数据,存入sanguo.txt 1,关羽,武力:97#智力:92#忠诚度:88,蜀国,29 2,刘备,武力:77#智力:94#忠诚度:99,蜀国,34 3,张飞,武力:94#智力:81#忠诚度:90,蜀国,28 4,赵云,武力:98#智力:93#忠诚度:94,蜀国,35 5,马超,武力:93#智力:78#忠诚度:85,蜀国,29 6,黄忠,武力:91#智力:83#忠诚度:86,蜀国,60 7,许褚,武力:90#智力:60#忠诚度:91,魏国,33 8,典韦,武力:97#智力:55#忠诚度:94,魏国,40 9,张辽,武力:90#智力:82#忠诚度:92,魏国,28 # 建表语句 hive (mydb)> create table t_sanguo(id int, name string, lables map<string, int>, country string, age int) > row format delimited fields terminated by ',' > collection items terminated by '#' > map keys terminated by ':'; # 上传数据 hive (mydb)> load data local inpath '/home/hadoop/sanguo.txt' > overwrite into table t_sanguo; # 查询 hive (mydb)> select * from t_sanguo; OK t_sanguo.id t_sanguo.name t_sanguo.lables t_sanguo.country t_sanguo.age 1 关羽 {"武力":97,"智力":92,"忠诚度":88} 蜀国 29 2 刘备 {"武力":77,"智力":94,"忠诚度":99} 蜀国 34 3 张飞 {"武力":94,"智力":81,"忠诚度":90} 蜀国 28 4 赵云 {"武力":98,"智力":93,"忠诚度":94} 蜀国 35 5 马超 {"武力":93,"智力":78,"忠诚度":85} 蜀国 29 6 黄忠 {"武力":91,"智力":83,"忠诚度":86} 蜀国 60 7 许褚 {"武力":90,"智力":60,"忠诚度":91} 魏国 33 8 典韦 {"武力":97,"智力":55,"忠诚度":94} 魏国 40 9 张辽 {"武力":90,"智力":82,"忠诚度":92} 魏国 28 Time taken: 0.333 seconds, Fetched: 9 row(s) hive (mydb)> select id, name, lables["武力"], country, age from t_sanguo; OK id name _c2 country age 1 关羽 97 蜀国 29 2 刘备 77 蜀国 34 3 张飞 94 蜀国 28 4 赵云 98 蜀国 35 5 马超 93 蜀国 29 6 黄忠 91 蜀国 60 7 许褚 90 魏国 33 8 典韦 97 魏国 40 9 张辽 90 魏国 28 Time taken: 0.306 seconds, Fetched: 9 row(s)

    注:可以在Hive中用命令hive>show functions; 查看所有与map相关的方法(map_keys, map_values)

    • 复合数据类型 STRUCT < col_name : data_type,... >

    # 假如有下列数据,建立文件actors.txt 1,zhaowei,39:female:bejing:huanzhugege 2,zhourunfa,62:male:hk:yingxiongbense 3,qiushuzheng,54:female:taiwan:ludingji 4,zhouxingchi,58:male:hk:tangbohudianqiuxiang # 建表 hive (mydb)> create table t_actors(id int, name string, info struct<age:string, city:string, magnum_opus:string>) > row format delimited fields terminated by ',' > collection items terminated by ':'; # 导入数据 hive (mydb)> load data local inpath '/home/hadoop/actors.txt' overwrite into table t_actors; Loading data to table mydb.t_actors OK Time taken: 0.363 seconds # 查询表结构 hive (mydb)> desc t_actors; OK col_name data_type comment id int name string info struct<age:string,city:string,magnum_opus:string> Time taken: 0.073 seconds, Fetched: 3 row(s) # 查询 hive (mydb)> select * from t_actors; OK t_actors.id t_actors.name t_actors.info 1 zhaowei {"age":"39","city":"female","magnum_opus":"bejing"} 2 zhourunfa {"age":"62","city":"male","magnum_opus":"hk"} 3 qiushuzheng {"age":"54","city":"female","magnum_opus":"taiwan"} 4 zhouxingchi {"age":"58","city":"male","magnum_opus":"hk"} Time taken: 0.208 seconds, Fetched: 4 row(s) hive (mydb)> select name,info.city, info.magnum_opus from t_actors; OK name city magnum_opus zhaowei female bejing zhourunfa male hk qiushuzheng female taiwan zhouxingchi male hk Time taken: 0.221 seconds, Fetched: 4 row(s)

    4.数据类型转换

    # 新建文件timefile.txt 1,2019-08-21 11:12:13.1,2019-08-21,2019-08-21 11:12:13.1 4,2019-08-21 11:12:13.123456789,2019-08-21,2019-08-21 11:12:13.123456789 5,2019-08-21 11:12:13,2019-08-21,2019-08-21 11:12:13 6,2020-06-09 09:09:09.1234567891,2019-06-09,2019-06-09 09:09:09.1234567891 # 创建表 hive (mydb)> create table tf(id int, ts timestamp, ddate date, sdate string) > row format delimited > fields terminated by ',' # 导入数据 hive (mydb)> load data local inpath '/home/hadoop/timefile.txt' > overwrite into table tf; Loading data to table mydb.tf OK Time taken: 0.35 seconds # 查询表结构 hive (mydb)> desc tf; OK col_name data_type comment id int ts timestamp ddate date sdate string Time taken: 0.078 seconds, Fetched: 4 row(s) # 查询表数据 hive (mydb)> select * from tf; OK tf.id tf.ts tf.ddate tf.sdate 1 2019-08-21 11:12:13.1 2019-08-21 2019-08-21 11:12:13.1 4 2019-08-21 11:12:13.123456789 2019-08-21 2019-08-21 11:12:13.123456789 5 2019-08-21 11:12:13 2019-08-21 2019-08-21 11:12:13 6 NULL 2019-06-09 2019-06-09 09:09:09.1234567891 Time taken: 0.197 seconds, Fetched: 4 row(s)

    • timestamp转string

    hive (mydb)> select cast(ts as string) from tf; OK ts 2019-08-21 11:12:13.1 2019-08-21 11:12:13.123456789 2019-08-21 11:12:13 NULL Time taken: 0.229 seconds, Fetched: 4 row(s)

    • timestamp -> date

    hive (mydb)> select cast(ts as date) from tf; OK ts 2019-08-21 2019-08-21 2019-08-21 NULL Time taken: 0.189 seconds, Fetched: 4 row(s)

    • date -> string

    hive (mydb)> select cast(ddate as string) from tf; OK ddate 2019-08-21 2019-08-21 2019-08-21 2019-06-09 Time taken: 0.215 seconds, Fetched: 4 row(s)

    • date -> timestamp

    hive (mydb)> select cast(ddate as timestamp) from tf; OK ddate 2019-08-21 00:00:00 2019-08-21 00:00:00 2019-08-21 00:00:00 2019-06-09 00:00:00 Time taken: 0.271 seconds, Fetched: 4 row(s)

    • string -> timestamp

    hive (mydb)> select cast(sdate as timestamp) from tf; OK sdate 2019-08-21 11:12:13.1 2019-08-21 11:12:13.123456789 2019-08-21 11:12:13 2019-06-09 09:09:09.123456789 Time taken: 0.23 seconds, Fetched: 4 row(s)

    • string -> date

    hive (mydb)> select cast(sdate as date) from tf; OK sdate 2019-08-21 2019-08-21 2019-08-21 2019-06-09 Time taken: 0.175 seconds, Fetched: 4 row(s)

    ,

    免责声明:本文仅代表文章作者的个人观点,与本站无关。其原创性、真实性以及文中陈述文字和内容未经本站证实,对本文以及其中全部或者部分内容文字的真实性、完整性和原创性本站不作任何保证或承诺,请读者仅作参考,并自行核实相关内容。文章投诉邮箱:anhduc.ph@yahoo.com

      分享
      投诉
      首页