Powering up Apache Spark Cluster for High-Scale Data Transformations
Then,
$SPARK_HOME/sbin/start-master.sh
Start a Worker Node — By default, workers connect to the master at spark://<hostname>:7077
$SPARK_HOME/sbin/start-worker.sh spark://:7077
Access the Web UI — Monitor the Spark cluster from the web UI at http://<master-IP>:8080
Hive Meta Store is required to store metadata into the tables and PostgreSql can be used as its backend database to ensure durability and reliability.
wget https://archive.apache.org/dist/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz tar -xzf apache-hive-3.1.3-bin.tar.gz mv apache-hive-3.1.3-bin /usr/local/hive export HIVE_HOME=/usr/local/hive export PATH=$PATH:$HIVE_HOME/bin source ~/.bashrc wget https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.0/postgresql-42.7.0.jar cp postgresql-42.7.0.jar $HIVE_HOME/lib/ cp postgresql-42.7.0.jar $SPARK_HOME/jars
sudo apt update sudo apt install postgresql -y systemctl start postgresql.service
su - postgres psql CREATE DATABASE hivemetastore; CREATE USER hiveuser WITH PASSWORD 'hivepassword'; GRANT ALL PRIVILEGES ON DATABASE hivemetastore TO hiveuser; \q
wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz tar -xzf hadoop-3.3.4.tar.gz mv hadoop-3.3.4 /usr/local/hadoop export HADOOP_HOME=/usr/local/hadoop export PATH=$PATH:$HADOOP_HOME/bin source ~/.bashrc
export HADOOP_OPTIONAL_TOOLS="hadoop-aws" hadoop fs -ls s3a://
cd $HIVE_HOME/bin $HIVE_HOME/bin/schematool -dbType postgres -initSchema su - postgres psql postgres=# \c hivemetastore #You are now connected to database "hivemetastore" as user "postgres". hivemetastore=# GRANT CREATE ON SCHEMA public TO hiveuser; #GRANT hivemetastore=# \q
cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf/ spark-shell - conf spark.sql.catalogImplementation=hive
$HIVE_HOME/bin/hive - service hiveserver2 &
/usr/local/hive/bin/hive - service metastore &
/usr/local/hive/bin/beeline -u jdbc:hive2://
-- # at Hive Prompt
>> CREATE TABLE test_table (id INT, name STRING);
>> spark.sql("SHOW TABLES").show()
>> spark.sql("SELECT * FROM test_table").show()
test_table gets created in Hive Meta Store (HMS) and this can be verified using Spark SQL Describe commands.
We use cookies to improve your experience on our site. By using our site, you consent to cookies.
Websites store cookies to enhance functionality and personalise your experience. You can manage your preferences, but blocking some cookies may impact site performance and services.
Essential cookies enable basic functions and are necessary for the proper function of the website.