Bigdata Charmers Gobblin

juju deploy bigdata-charmers-gobblin
Show information
You will need Juju 2.9 to be able to run this command. Learn how to upgrade to Juju 2.9.
Channel Version Revision Published Runs on
latest/stable 0 0 18 Mar 2021
Ubuntu 14.04

Platform:

Ubuntu
14.04

About

Data ingestion framework built on top of Hadoop Read more


Relevant links


Discuss this charm

Share your thoughts on this charm with the community on discourse.

Join the discussion

Overview

"Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., onto Hadoop." from the Gobblin wiki

Usage

This charm is uses the Hadoob base layer and the HDFS interface to pull its dependencies and act as a client to a Hadoop namenode. Here is how to deploy the Hadoop infrastructure:

juju quickstart apache-processing-mapreduce

Deploy the Gobblin charm and relate it to tha neme node:

juju deploy gobblin
juju add-relation gobblin plugin

Testing the deployment

Smoke test Gobblin

From the Gobblin unit, start the wikipedia ingestion demo job as the gobblin user:

juju ssh gobblin/0
cd /tmp
sudo su gobblin -c "gobblin-mapreduce.sh --conf wikipedia.pull --jars /usr/lib/gobblin/lib/gobblin-example.jar"

The output will be in hdfs under /user/gobblin/work/job-output/gobblin/example/wikipedia/WikipediaOutput/ . You can set the output directory through the --workdir flag.

List and get the job output file(s) in avro format.

hdfs dfs -ls /user/gobblin/work/job-output/gobblin/example/wikipedia/WikipediaOutput/<Your_Job_Id>
hdfs dfs -get /user/gobblin/work/job-output/gobblin/example/wikipedia/WikipediaOutput/<Your_Job_Id>/<Path_To_Output>/<Output.avro>

Transform to JSON.

curl -O http://central.maven.org/maven2/org/apache/avro/avro-tools/1.7.7/avro-tools-1.7.7.jar
java -jar avro-tools-1.7.7.jar tojson --pretty <Output.avro> > output.json

Contact Information

Help