Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial code and CI for trace data pipeline #368

Merged
merged 7 commits into from
Apr 10, 2024

Conversation

bantonsson
Copy link
Contributor

@bantonsson bantonsson commented Mar 25, 2024

What does this PR do?

The work done on a shared component data pipeline done during the R&D week.

Motivation

Add a base build and CI that we can continue to iterate on to refine the API and functionality of the shared data pipeline component.

Additional Notes

Please don't be to hard on the public APIs, and the implementation. They will evolve over time, but in the interest of not having a ginormous PR from a feature branch, we would like to get the machinery into main.

How to test the change?

Describe here in detail how the change can be validated.

For Reviewers

  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

@bantonsson bantonsson force-pushed the ban/cleaned-up-data-pipeline-poc branch 4 times, most recently from e72e1a4 to ad57c31 Compare March 27, 2024 15:29
@bantonsson bantonsson changed the title [WIP] Cleaned up POC branch Initial code and CI for trace data pipeline Mar 27, 2024
@bantonsson bantonsson requested a review from a team March 27, 2024 15:36
@bantonsson bantonsson marked this pull request as ready for review March 27, 2024 15:45
@bantonsson bantonsson requested review from a team as code owners March 27, 2024 15:45
@paullegranddc
Copy link
Contributor

Are we going to ship different shared libraries for data-pipeline and profiling?

@bantonsson
Copy link
Contributor Author

We are not there yet, and in some languages we definitely will since they don't use the profiler in libdatadog.

@ivoanjo
Copy link
Member

ivoanjo commented Mar 27, 2024

Not a blocker at this point, but more of an observation, note that shipping profiling AND data-pipeline separately for the same language will probably increase the binary size a lot more than if a unified libdatadog was produced ;)

@bantonsson
Copy link
Contributor Author

bantonsson commented Mar 27, 2024

Yes we are aware of the added binary size and are going to work on reducing this and combining the features into a single binary where it makes sense.

@bantonsson bantonsson force-pushed the ban/cleaned-up-data-pipeline-poc branch from 10654c7 to 824fa21 Compare March 28, 2024 08:12
@paullegranddc
Copy link
Contributor

Not a blocker at this point, but more of an observation, note that shipping profiling AND data-pipeline separately for the same language will probably increase the binary size a lot more than if a unified libdatadog was produced ;)

It's also going to be hard to load both of them in the same process because of symbols exposed in ddcomon being in both shared library

@ivoanjo
Copy link
Member

ivoanjo commented Mar 28, 2024

It's also going to be hard to load both of them in the same process because of symbols exposed in ddcomon being in both shared library

Good point! For Ruby we explicitly load the profiling native extension with RTLD_LOCAL and I believe that's a potential solution to this issue.

I'm saying this more out of curiosity / sharing notes :)

@bantonsson bantonsson force-pushed the ban/cleaned-up-data-pipeline-poc branch from 824fa21 to 782ae18 Compare March 28, 2024 16:17
set -eu

destdir="$1"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to fix the build logics before adding the tracing logics or after ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate a bit on what this would mean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is relative to the RFC + issue #353
The idea would be to have a single build where we can select what is required to build.
A good first step could be to unify the build scripts.
Ideally all logics would be embedded in cargo in the long term.
cargo build --feature trace-pipeline,profiling,telemetry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I don't want to block the PR, if you feel it is already valuable to have a build out there.
We should mainly do this to ensure we can load everything together (and to have a clean build process).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest commit. As discussed internally profiling library will be the main artifact and data-pipeline symbols will be exported there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we export the symbols here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot the last commit (pushed it but the prompt asked for the credentials while I was AFK) now it's complete.

@bantonsson bantonsson force-pushed the ban/cleaned-up-data-pipeline-poc branch 5 times, most recently from ee61413 to e18be6b Compare April 4, 2024 12:23
Copy link
Contributor

@gleocadie gleocadie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
can you remove this change please windows/libdatadog.csproj
It will generate a +300MB nuget package. We need to rethink how we package this on windows. CI (dd-trace-dotnet for example, this can be a problem on the runners - not enough disk space)

@bantonsson bantonsson requested a review from a team as a code owner April 8, 2024 10:20
@github-actions github-actions bot added the profiling Relates to the profiling* modules. label Apr 8, 2024
@@ -18,6 +18,10 @@ pub use crashtracker::*;
#[cfg(feature = "ddtelemetry-ffi")]
pub use ddtelemetry_ffi::*;

#[cfg(feature = "data-pipeline-ffi")]
#[allow(unused_imports)]
pub use data_pipeline_ffi::*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@hoolioh hoolioh force-pushed the ban/cleaned-up-data-pipeline-poc branch from 4a6a251 to 4b42ada Compare April 10, 2024 12:51
@hoolioh hoolioh force-pushed the ban/cleaned-up-data-pipeline-poc branch from 4b42ada to e80bf85 Compare April 10, 2024 13:30
hoolioh and others added 7 commits April 10, 2024 16:19
This is to avoid building a 300MB+ large nuget package.
* Add data-pipeline-ffi symbols in order to treat profiling shared library
as the main artifact.
* Add feature flag "data-pipeline-ffi" so data-pipeline items are
  included based on it.

* Add flag to handle features in build-profiling-ffi.sh.
* Remove data-pipeline-ffi script.

* Profiling tests are build with the data-pipeline feature
enabled hence trace_exporter source file is built linking against
profiling library.

Update LICENSE-3rdparty file.
@bantonsson bantonsson force-pushed the ban/cleaned-up-data-pipeline-poc branch from e80bf85 to d8dccce Compare April 10, 2024 14:26
ddog_CharSlice language_version = DDOG_CHARSLICE_C("10.0");
ddog_CharSlice language_interpreter = DDOG_CHARSLICE_C("X");
ddog_TraceExporter* trace_exporter = ddog_trace_exporter_new(
host,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have an endpoint object that could be useful to share

@@ -97,7 +97,7 @@ jobs:
- name: "Generate profiling FFI"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I think we could have a general all feature build.
The profiling workflow currently uses gitlab to release.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The important part is having at least one build per delivered combination.

Copy link
Contributor

@r1viollet r1viollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Minor suggestions. Exciting to see this progress 👏

@bantonsson bantonsson merged commit c461065 into main Apr 10, 2024
20 checks passed
@bantonsson bantonsson deleted the ban/cleaned-up-data-pipeline-poc branch April 10, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-build data-pipeline profiling Relates to the profiling* modules.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants