介绍与部署
Hive
**Hive ** | **RDBMS ** | |
---|---|---|
查询语言 | HQL | SQL |
数据存储 | HDFS | Raw Device or Local FS |
执行 | MapReduce | Excutor |
执行延迟 | 高 | 低 |
处理数据规模 | 大 | 小 |
索引 | 有复杂的索引 |
Architecture

Workflow

步骤 | 操作 |
---|---|
1 | |
2 | |
3 | |
4 | |
5 | |
6 | |
7 | |
7.1 | |
8 | |
9 | |
10 |
Quick Start
Installation
笔者建议使用
$ tar zxvf apache-hive-0.14.0-bin.tar.gz
$ ls
然后需要将文件复制到
$ su -
passwd:
# cd /home/user/Download
# mv apache-hive-0.14.0-bin /usr/local/hive
# exit
然后我们需要将~/.bashrc
文件
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
export CLASSPATH=$CLASSPATH:/usr/local/Hadoop/lib/*:.
export CLASSPATH=$CLASSPATH:/usr/local/hive/lib/*:.
然后需要激活该文件
$ source ~/.bashrc
为了配置hive-env.sh
文件,其位于
$ cd $HIVE_HOME/conf
$ cp hive-env.sh.template hive-env.sh
编辑
export HADOOP_HOME=/usr/local/hadoop
Metastore
我们这里选择
$ cd ~
$ wget http://archive.apache.org/dist/db/derby/db-derby-10.4.2.0/db-derby-10.4.2.0-bin.tar.gz
$ tar zxvf db-derby-10.4.2.0-bin.tar.gz
$ ls
$ su -
passwd:
# cd /home/user
# mv db-derby-10.4.2.0-bin /usr/local/derby
# exit
下载完毕后需要为
export DERBY_HOME=/usr/local/derby
export PATH=$PATH:$DERBY_HOME/bin
Apache Hive
18
export CLASSPATH=$CLASSPATH:$DERBY_HOME/lib/derby.jar:$DERBY_HOME/lib/derbytools.jar
然后我们需要创建一个新的目录来存储
$ mkdir $DERBY_HOME/data
接下来我们需要配置
$ cd $HIVE_HOME/conf
$ cp hive-default.xml.template hive-site.xml
编辑
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby://localhost:1527/metastore_db;create=true </value>
<description>JDBC connect string for a JDBC metastore </description>
</property>
创建如下名为
javax.jdo.PersistenceManagerFactoryClass =
org.jpox.PersistenceManagerFactoryImpl
org.jpox.autoCreateSchema = false
org.jpox.validateTables = false
org.jpox.validateColumns = false
org.jpox.validateConstraints = false
org.jpox.storeManagerType = rdbms
org.jpox.autoCreateSchema = true
org.jpox.autoStartMechanismMode = checked
org.jpox.transactionIsolation = read_committed
javax.jdo.option.DetachAllOnCommit = true
javax.jdo.option.NontransactionalRead = true
javax.jdo.option.ConnectionDriverName = org.apache.derby.jdbc.ClientDriver
javax.jdo.option.ConnectionURL = jdbc:derby://hadoop1:1527/metastore_db;create = true
javax.jdo.option.ConnectionUserName = APP
javax.jdo.option.ConnectionPassword = mine
Verification
在启动
chmod g+w
完整的命令如下
$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$ $HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse
然后我们可以使用如下命令来验证
$ cd $HIVE_HOME
$ bin/hive
如果你成功地安装了
Logging initialized using configuration in jar:file:/home/hadoop/hive-0.9.0/lib/hive-common-0.9.0.jar!/hive-log4j.properties
Hive history file=/tmp/hadoop/hive_job_log_hadoop_201312121621_1494929084.txt
………………….
hive>
然后可以使用如下命令来列举所有的表
hive> show tables;
OK
Time taken: 2.798 seconds
hive>
Docker
笔者基于sequenceiq/hadoop-docker
以及
# 使用该Hadoop镜像
FROM sequenceiq/hadoop-docker:2.7.1
MAINTAINER WxChevalier
#Based on Inmobi Hive
#Builds the InMobi Hive from trunk
#Configure Postgres DB
#Starts Hive metastore Server
#Starts Hive Server2
# 安装Postgres作为Hive元数据存储
RUN apt-get update
RUN apt-get -yq install vim postgresql-9.3 libpostgresql-jdbc-java
# 创建元数据库 创建Hive用户并且分配权限privileges
USER postgres
RUN /etc/init.d/postgresql start &&\
psql --command "CREATE DATABASE metastore;" &&\
psql --command "CREATE USER hive WITH PASSWORD 'hive';" && \
psql --command "ALTER USER hive WITH SUPERUSER;" && \
psql --command "GRANT ALL PRIVILEGES ON DATABASE metastore TO hive;"
# 切回到默认的root用户
USER root
# 构建开发工具
RUN apt-get update
RUN apt-get install -y git libprotobuf-dev protobuf-compiler
# 安装Maven
RUN curl -s http://mirror.olnevhost.net/pub/apache/maven/binaries/apache-maven-3.2.1-bin.tar.gz | tar -xz -C /usr/local/
RUN cd /usr/local && ln -s apache-maven-3.2.1 maven
ENV MAVEN_HOME /usr/local/maven
ENV PATH $MAVEN_HOME/bin:$PATH
# 下载并且编译Hive
ENV HIVE_VERSION 0.13.4-inm-SNAPSHOT
RUN cd /usr/local && git clone https://github.com/InMobi/hive.git
RUN cd /usr/local/hive && /usr/local/maven/bin/mvn clean install -DskipTests -Phadoop-2,dist
RUN mkdir /usr/local/hive-dist && tar -xf /usr/local/hive/packaging/target/apache-hive-${HIVE_VERSION}-bin.tar.gz -C /usr/local/hive-dist
# 设定Hive环境信息
ENV HIVE_HOME /usr/local/hive-dist/apache-hive-${HIVE_VERSION}-bin
ENV HIVE_CONF $HIVE_HOME/conf
ENV PATH $HIVE_HOME/bin:$PATH
# 添加Postgres JDBC连接包
RUN ln -s /usr/share/java/postgresql-jdbc4.jar $HIVE_HOME/lib/postgresql-jdbc4.jar
# to avoid psql asking password, set PGPASSWORD
ENV PGPASSWORD hive
# 初始化Hive 元数据库
RUN /etc/init.d/postgresql start &&\
cd $HIVE_HOME/scripts/metastore/upgrade/postgres/ &&\
psql -h localhost -U hive -d metastore -f hive-schema-0.13.0.postgres.sql
# 复制配置文件、SQL以及数据文件
RUN mkdir /opt/files
ADD hive-site.xml /opt/files/
ADD hive-log4j.properties /opt/files/
ADD hive-site.xml $HIVE_CONF/hive-site.xml
ADD hive-log4j.properties $HIVE_CONF/hive-log4j.properties
ADD store_sales.* /opt/files/
ADD datagen.py /opt/files/
# 为Hive 启动配置设定权限
ADD hive-bootstrap.sh /etc/hive-bootstrap.sh
RUN chown root:root /etc/hive-bootstrap.sh
RUN chmod 700 /etc/hive-bootstrap.sh
# To overcome the bug in AUFS that denies postgres permission to read /etc/ssl/private/ssl-cert-snakeoil.key file.
# https://github.com/Painted-Fox/docker-postgresql/issues/30
# https://github.com/docker/docker/issues/783
# To avoid this issue lets disable ssl in postgres.conf. If we really need ssl to encrypt postgres connections we have to fix permissions to /etc/ssl/private directory everytime until AUFS fixes the issue
ENV POSTGRESQL_MAIN /var/lib/postgresql/9.3/main/
ENV POSTGRESQL_CONFIG_FILE $POSTGRESQL_MAIN/postgresql.conf
ENV POSTGRESQL_BIN /usr/lib/postgresql/9.3/bin/postgres
ADD postgresql.conf $POSTGRESQL_MAIN
RUN chown postgres:postgres $POSTGRESQL_CONFIG_FILE