6577| 83
|
阿里巴巴内部Hive学习笔记 59页PDF 推荐阅读! |
1. HIVE 结构.......................................................................................................................................................... 6
1.1 HIVE 架构............................................................................................................................................ 6 1.2 Hive 和Hadoop 关系.................................................................................................................... 7 1.3 Hive 和普通关系数据库的异同........................................................................................................8 1.4 HIVE 元数据库.................................................................................................................................... 9 1.4.1 DERBY................................................................................................................................... 9 1.4.2 Mysql.................................................................................................................................. 10 1.5 HIVE 的数据存储..............................................................................................................................11 1.6 其它HIVE 操作.................................................................................................................................11 2. HIVE 基本操作...............................................................................................................................................12 2.1 create table..................................................................................................................................... 12 2.1.1 总述......................................................................................................................................12 2.1.2 语法......................................................................................................................................12 2.1.3 基本例子..............................................................................................................................14 2.1.4 创建分区..............................................................................................................................15 2.1.5 其它例子..............................................................................................................................16 2.2 Alter Table....................................................................................................................................... 17 2.2.1 Add Partitions...................................................................................................................17 2.2.2 Drop Partitions................................................................................................................. 17 2.2.3 Rename Table................................................................................................................... 17 2.2.4 Change Column............................................................................................................... 18 2.2.5 Add/Replace Columns................................................................................................... 18 2.3 Create View..................................................................................................................................... 18 2.4 Show................................................................................................................................................. 19 2.5 Load.................................................................................................................................................. 19 2.6 Insert................................................................................................................................................. 21 2.6.1 Inserting data into Hive Tables from queries.......................................................... 21 2.6.2 Writing data into filesystem from queries............................................................... 21 2.7 Cli....................................................................................................................................................... 22 2.7.1 Hive Command line Options........................................................................................22 2.7.2 Hive interactive Shell Command.................................................................................24 2.7.3 Hive Resources................................................................................................................. 24 2.7.4 调用python、shell 等语言............................................................................................. 25 2.8 DROP.................................................................................................................................................26 2.9 其它....................................................................................................................................................27 2.9.1 Limit.................................................................................................................................... 27 2.9.2 Top k....................................................................................................................................27 2.9.3 REGEX Column Specification........................................................................................27 3. Hive Select.....................................................................................................................................................27 3.1 Group By.......................................................................................................................................... 28 3.2 Order /Sort By................................................................................................................................ 28 4. Hive Join.........................................................................................................................................................29 5. HIVE 参数设置................................................................................................................................................31 6. HIVE UDF....................................................................................................................................................... 33 6.1 基本函数............................................................................................................................................33 6.1.1 关系操作符..........................................................................................................................33 6.1.2 代数操作符..........................................................................................................................34 6.1.3 逻辑操作符..........................................................................................................................35 6.1.4 复杂类型操作符..................................................................................................................35 6.1.5 内建函数..............................................................................................................................36 6.1.6 数学函数..............................................................................................................................36 6.1.7 集合函数..............................................................................................................................36 6.1.8 类型转换..............................................................................................................................36 6.1.9 日期函数..............................................................................................................................36 6.1.10 条件函数..............................................................................................................................37 6.1.11 字符串函数..........................................................................................................................37 6.2 UDTF..................................................................................................................................................39 6.2.1 Explode...............................................................................................................................39 7. HIVE 的MAP/REDUCE............................................................................................................................... 41 7.1 JOIN...................................................................................................................................................41 7.2 GROUP BY........................................................................................................................................42 7.3 DISTINCT..........................................................................................................................................42 8. 使用HIVE 注意点...........................................................................................................................................43 8.1 字符集................................................................................................................................................43 8.2 压缩....................................................................................................................................................43 8.3 count(distinct)................................................................................................................................ 43 8.4 JOIN...................................................................................................................................................43 8.5 DML 操作..........................................................................................................................................44 8.6 HAVING............................................................................................................................................ 44 8.7 子查询................................................................................................................................................44 8.8 Join 中处理null 值的语义区别...................................................................................................... 44 9. 优化与技巧......................................................................................................................................................47 9.1 全排序................................................................................................................................................47 9.1.1 例1.......................................................................................................................................48 9.1.2 例2.......................................................................................................................................51 9.2 怎样做笛卡尔积................................................................................................................................54 9.3 怎样写exist/in 子句........................................................................................................................54 9.4 怎样决定reducer 个数................................................................................................................... 55 9.5 合并MapReduce 操作................................................................................................................... 55 9.6 Bucket 与sampling......................................................................................................................56 9.7 Partition............................................................................................................................................57 9.8 JOIN...................................................................................................................................................58 9.8.1 JOIN 原则............................................................................................................................58 9.8.2 Map Join............................................................................................................................ 58 9.8.3 大表Join 的数据偏斜........................................................................................................ 60 9.9 合并小文件........................................................................................................................................62 9.10 Group By.......................................................................................................................................... 62 10. HIVE FAQ:.......................................................................................................................................... 62 Hive 是建立在Hadoop 上的数据仓库基础构架。它提供了一系列的工具,可以用来进行数据提取转化加载(ETL),这是一种可以存储、查询和分析存储在Hadoop 中的大规模数据的机制。Hive 定义了简单的类SQL 查询语言,称为QL,它允许熟悉SQL 的用户查询数据。同时,这个语言也允许熟悉MapReduce开发者的开发自定义的mapper 和reducer 来处理内建的mapper 和reducer 无法完成的复杂的分析工作。
购买主题
已有 1 人购买
本主题需向作者支付 4 金币 才能浏览
| |
发表于 2015-5-5 22:07:49
|
显示全部楼层
| ||
发表于 2015-8-22 08:51:05
|
显示全部楼层
| ||
发表于 2015-8-23 09:48:23
|
显示全部楼层
| ||
发表于 2015-9-8 21:48:19
|
显示全部楼层
| ||
发表于 2015-9-10 11:27:36
|
显示全部楼层
| ||
发表于 2015-9-21 07:41:35
|
显示全部楼层
| ||
发表于 2015-9-29 13:48:27
|
显示全部楼层
| ||
发表于 2015-10-27 14:53:20
|
显示全部楼层
| ||