Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to install the h5pp or the h5cpp-compiler on macOS? #78

Closed
xdotli opened this issue Feb 17, 2022 · 12 comments
Closed

How to install the h5pp or the h5cpp-compiler on macOS? #78

xdotli opened this issue Feb 17, 2022 · 12 comments

Comments

@xdotli
Copy link

xdotli commented Feb 17, 2022

I'm sorry if it sounds a stupid question, but I'm very new to C++ development. Since I'm on the mac I couldn't follow the linux commands provided at the download page. I wonder how I could use this library (for example what files to copy and paste into the /usr/local/include directory) in an arbitrary project working with dlib.

@steven-varga
Copy link
Owner

hmm... I don't quite have a mac-OS to work with, we could have a chat about this, if any interest? -- it should not be a biggie as LLVM tool chain works on mac.

@xdotli
Copy link
Author

xdotli commented Feb 17, 2022

@steven-varga Hi! And thank you for your reply.

I now could successfully compile the program by the following verbose command:
g++ -std=c++17 -o test cca.cpp -I /usr/local/include/ -L /usr/local/lib/ -lhdf5 -lhdf5_hl

But as I run the executible it gives the error of "unable to open dataset":

HDF5-DIAG: Error detected in HDF5 (1.12.1) thread 0:
  #000: H5D.c line 285 in H5Dopen2(): unable to open dataset
    major: Dataset
    minor: Can't open object
  #001: H5VLcallback.c line 1910 in H5VL_dataset_open(): dataset open failed
    major: Virtual Object Layer
    minor: Can't open object
  #002: H5VLcallback.c line 1877 in H5VL__dataset_open(): dataset open failed
    major: Virtual Object Layer
    minor: Can't open object
  #003: H5VLnative_dataset.c line 123 in H5VL__native_dataset_open(): unable to open dataset
    major: Dataset
    minor: Can't open object
  #004: H5Dint.c line 1483 in H5D__open_name(): not found
    major: Dataset
    minor: Object not found
  #005: H5Gloc.c line 442 in H5G_loc_find(): can't find object
    major: Symbol table
    minor: Object not found
  #006: H5Gtraverse.c line 837 in H5G_traverse(): internal path traversal failed
    major: Symbol table
    minor: Object not found
  #007: H5Gtraverse.c line 613 in H5G__traverse_real(): traversal operator failed
    major: Symbol table
    minor: Callback failed
  #008: H5Gloc.c line 399 in H5G__loc_find_cb(): object 'create then write' doesn't exist
    major: Symbol table
    minor: Object not found
libc++abi: terminating with uncaught exception of type h5::error::io::dataset::open: /usr/local/include/h5cpp/H5Dopen.hpp line#  30 : opening dataset failed...
[1]    71327 abort      ./test

Do you happen to know any common causes to this error? I have been working too long only to try to read some hdf file. I have converted the original file to several JSON file to read into my program now. However, the JSON files appear to be much larger than the HDF files, I wonder if you think it will have performance issues?

@steven-varga
Copy link
Owner

Can you list the version of the file? Would it be possible to try it with libhdf5 v1.10.6? BTW: no need for the hdf5_hl. Can you share the file?

Not sure what you are trying to do; JSON is not for HPC, has different properties/use cases. In fact the acronym gives it away: JavaScript Object Notation whereas HDF5 is like ext4 filesystem with a convenient API, and most importantly MPI-IO capability.

@xdotli
Copy link
Author

xdotli commented Feb 17, 2022

Sure. I'm using HDF 1.12.1. The code is below:

#include <iostream>
#include <dlib/matrix.h>
#include <dlib/statistics/cca.h>
#include <h5cpp/all>

using namespace std;
using namespace dlib;
template <class T>
using Matrix = dlib::matrix<T>;

int main()
{
  Matrix<short> M = h5::read<Matrix<short>>("1000hpa.h5", "create then write");
  return 0;
}

Regarding the choice of JSON. Well, I worked with JavaScript and Python the most, and I'm simply trying to read a 726*14729 matrix into my program so I though maybe dump the data in HDF into JSON and read the JSON into my program would be possible.

By the way I'm using the nlohmann/json where the library is in a single header file. Should I delete the JSON object once the data are stored in matrices?

@xdotli
Copy link
Author

xdotli commented Feb 17, 2022

I think I'm not expressing my concerns clearly enough. The computation-heavy part of my program would be in the calculation of the matrices, so I wonder if I could assume performance of the I/O part before is not as relevant?

@steven-varga
Copy link
Owner

It depends on the size of the matrix and convenience. Some of us do prototyping on some statistical platform: Julia/Matlab/R and then save/export the file in HDF5. It is convenient to load it from C++ regardless of the size, then proceed to fast implementation using C++ with some linear algebra library. -- this is one use case of H5CPP.

Alternatively you have 10GB - and up datasets, and you need efficient scalable IO. In the case the IO performance could be important.

Overall you can think of this question as walking on a Pareto front of implementation | maintenance | IO | runtime cost, which can only be answered (by a constrained optimisation mathematical program) if you have values ready.

@xdotli
Copy link
Author

xdotli commented Feb 17, 2022

Thank you so much for your answer. The matrix is 72614965, and my out put is supposed to be 6 726726 matrices.

If this prototyping to production workflow is so prevalent, then it's imperative for me to work out a way to make this library work on my computer. Because I'm a new big data research assistant at school, and while other team members do prototyping in R/Python, I have to deliver C++ code that implements their algorithm in parallelism.

Anyway, I used brew install hdf5 as my last attempt to make the h5cpp work for me, but the library still gave me the unable to open dataset error. Before I had errors like Undefined symbols for architecture x86_64: "***". But would you say I will have a better chance of making all of this work on a linux server? Because in that way I won't be trying to install all kinds of libraries everywhere. Thank you very much!

@steven-varga
Copy link
Owner

It works on POSIX with C++17, and as I mentioned before I am opened to a conference call (I don't have a mac). Avoid using OS package manager -- this is HPC, where SPACK is more likely being used. Instead install components from source. Here is a laundry list:

  • HDF5 1.10.6 (no C++ or high level library support)
  • openMPI 4.0.7
  • orangefs for parallel filesystem

I am working on a reference platform: a rental cluster on AWS EC2 with the proper settings and convenient vscode front end; but will take a few more weeks to bring it up online.

Let me know about the call

@xdotli
Copy link
Author

xdotli commented Feb 17, 2022

I actually noticed the issue #42 where you are testing h5cpp's compatibility with different compilers before submitting this issue. I'd love to go over the laundry list to install these components, but I have a deadline about 12 hours from now. I'll work on this in my next assignment and report it here.

Thanks again for your help! I will try setting up a server environment to do the job as well. I'm familiar with vim, so I think I'll test h5cpp library before making further setting.

@xdotli
Copy link
Author

xdotli commented Feb 18, 2022

@steven-varga Sorry to bother you again! I tried installing HDF5 1.10.6, but in this case I don't know where folder to put it. Should I dump them in the /usr/local/include?

@steven-varga
Copy link
Owner

H5CPP doesn't care where you install HDF5. As for H5CPP headers: copy them to /usr/local/include/h5cpp then in your make files use gcc -I/usr/local/include It is customary to install local packages to user local: ./configure --prefix=/usr/local. Below are the default settings (after configure):

Features:
---------
                   Parallel HDF5: no
Parallel Filtered Dataset Writes: no
              Large Parallel I/O: no
              High-level library: yes
                    Threadsafety: no
             Default API mapping: v110
  With deprecated public symbols: yes
          I/O filters (external): deflate(zlib)
                             MPE: no
                      Direct VFD: no
                         dmalloc: no
  Packages w/ extra debug output: none
                     API tracing: no
            Using memory checker: no
 Memory allocation sanity checks: no
             Metadata trace file: no
          Function stack tracing: no
       Strict file format checks: no
    Optimization instrumentation: no

And no need to link against libhdf5_hl.so instead use the templated h5::append operator on h5::pt_t<T> for packet table; much faster.

@xdotli
Copy link
Author

xdotli commented Feb 18, 2022

It turned out that the unable to open dataset error was due to a python program I forgot to shut down was reading it, too. Thus it somehow made the dataset unavailable. As soon as I quit the python script I can use h5cpp to open the file successfully.

Thank you @steven-varga for guiding me re-installing the libraries from the source and providing me the workflow around high-performance data I/O and computing. Your library is awesome!

@xdotli xdotli closed this as completed Feb 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants