Provide documentation in Readme.md for using hadoop for storage instead of local mode #7

jar349 · 2016-12-13T18:21:09Z

I'm using your docker images to create a hadoop cluster (defined in a docker-compose file). Now, I would like to add your hbase image, but it is configured to use local storage.

I could create my own image based on yours with a custom configuration file, or I could mount the config volume and place my own config file there for hbase to read. However, I think there's a simpler path: taking local or hdfs as an argument and doing the "right thing" on the user's behalf.

I am imagining something like command: hbase master local start or command: hbase master hdfs start where the values you'd need to configure site.xml to use hadoop would come from environment variables (-e HDFS_MASTER=<hostname>).

What do you think?

The text was updated successfully, but these errors were encountered:

davidonlaptop · 2016-12-13T19:18:23Z

I agree with you that the documentation is not clear on how to use the hadoop and hbase docker images together. Using the environment variables is interesting way that fits well with the Docker approach.

You should consider that you may lose data locality with this method. As far I know, Hadoop is not yet docker aware. So if the datanode and regionserver runs in separate containers, they will have different IP addresses and hbase will assume that the 2 services are not local on the same machine. Therefore, the data access may not be optimal.

However many people uses S3 in production, and Hadoop can't figure out data locality with S3 either.

Can you elaborate more on your use case?

jar349 · 2016-12-13T19:45:09Z

Use case:

Building a library of compose files that I can, ahem... compose together, a la: https://docs.docker.com/compose/extends/#/multiple-compose-files

I've already got a zookeeper quorum, and I've got a distributed hadoop cluster (using your hadoop image to provide a name node, data note, and secondary name node.

Now I want a set of files that I can compose on top of zookeeper/hadoop: hbase, spark, kylin, etc.

So, this would be for local development and testing. But my goal is to try to mimick a realistic setup, meaning: more than one zk instance, hadoop secondary name node, more than one hbase region server, hbase actually using hadoop instead of local file system, etc.

dav-ell · 2020-02-15T22:34:35Z

I'd also appreciate this. This is the best hbase docker repo I can find (that works with Thrift), and having this described easily in the README would make this repository immensely powerful. Starting with no knowledge of HBase or HDFS, I'd be able to spin up a near-production-ready HDFS-backed HBase DB in 10 minutes. You have to admit, that's pretty cool.

Don't forget all the students out there coming out of school, getting their feet wet with big data tools, and floundering because of their complexity. This would go a good ways toward helping them.

davidonlaptop · 2020-02-17T22:02:57Z

Hi Dav and John Ruiz, Sure! if you could be please submit a merge request, I'll have it approved and deployed.

…

-D

On Sat, Feb 15, 2020 at 5:34 PM dav-ell ***@***.***> wrote: I'd also appreciate this. This is the best hbase docker repo I can find (that works with Thrift), and having this described easily in the README would make this repository immensely powerful. Starting with no knowledge of HBase or HDFS, I'd be able to spin up a near-production-ready HDFS-backed HBase DB in 10 minutes. You have to admit, that's pretty cool. Don't forget all the students out there coming out of school, getting their feet wet with big data tools, and floundering because of their complexity. This would go a good ways toward helping them. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#7?email_source=notifications&email_token=AAACU3C3BXWAO4DKLKOPNZTRDBUXXA5CNFSM4CZQHYYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEL3YQLA#issuecomment-586647596>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAACU3FQPSH7P7QN73ZJFW3RDBUXXANCNFSM4CZQHYYA> .

dav-ell · 2020-02-18T14:23:57Z

Thanks! I'll see what I can do.

Do you happen to know how to do it already? My progress on Hadoop in Docker has been slow. sequenceiq's is super old, big-data-europe's was giving me errors, and harisekhon's seems to work perfectly, so I was using that. However, trying to connect HBase to it hasn't been straightforward.

I had to change the configuration file (hdfs-site.xml) from the default (which was writing to /tmp) to:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///data</value>
    </property>
</configuration>

in order for it to write to a new directory (that's easier for me to mount). Then I run it using something like:

docker run -d --name hdfs -p 8042:8042 -p 8088:8088 -p 19888:19888 -p 50070:50070 -p 50075:50075 -v $HOME/hdfs-data:/data -v $HOME/hdfs-site.xml:/hadoop/etc/hadoop/hdfs-site.xml harisekhon/hadoop

After that, I feel pretty confident about HDFS being setup properly. However, to connect HBase to it, the best I've got so far is changing the hdfs url to:

hdfs://ip-of-docker-container:8020/

Does that look right?

dav-ell · 2020-02-18T14:34:36Z

Actually, that worked. Have any corrections before I add it to the readme?

dav-ell · 2020-02-18T15:55:16Z

Pull request #10 added.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide documentation in Readme.md for using hadoop for storage instead of local mode #7

Provide documentation in Readme.md for using hadoop for storage instead of local mode #7

jar349 commented Dec 13, 2016 •

edited

Loading

davidonlaptop commented Dec 13, 2016

jar349 commented Dec 13, 2016

dav-ell commented Feb 15, 2020

davidonlaptop commented Feb 17, 2020 via email

dav-ell commented Feb 18, 2020

dav-ell commented Feb 18, 2020

dav-ell commented Feb 18, 2020

Provide documentation in Readme.md for using hadoop for storage instead of local mode #7

Provide documentation in Readme.md for using hadoop for storage instead of local mode #7

Comments

jar349 commented Dec 13, 2016 • edited Loading

davidonlaptop commented Dec 13, 2016

jar349 commented Dec 13, 2016

dav-ell commented Feb 15, 2020

davidonlaptop commented Feb 17, 2020 via email

dav-ell commented Feb 18, 2020

dav-ell commented Feb 18, 2020

dav-ell commented Feb 18, 2020

jar349 commented Dec 13, 2016 •

edited

Loading