-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
upgrade to latest quickstart_etl from examples
- Loading branch information
Showing
15 changed files
with
130 additions
and
64 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -113,27 +113,27 @@ jobs: | |
# username: _json_key | ||
# password: ${{ secrets.GCR_JSON_KEY }} | ||
|
||
# Build "example_location" location. | ||
# Build "quickstart_etl" location. | ||
# For each code location, the "build-push-action" builds the docker | ||
# image and a "set-build-output" command records the image tag for each code location. | ||
# To re-use the same docker image across multiple code locations, build the docker image once | ||
# and specify the same tag in multiple "set-build-output" commands. To use a different docker | ||
# image for each code location, use multiple "build-push-actions" with a location specific | ||
# tag. | ||
- name: Build and upload Docker image for "example_location" | ||
- name: Build and upload Docker image for "quickstart_etl" | ||
if: steps.prerun.outputs.result != 'skip' | ||
uses: docker/build-push-action@v4 | ||
with: | ||
context: . | ||
push: true | ||
tags: ${{ env.IMAGE_REGISTRY }}:${{ env.IMAGE_TAG }}-example-location | ||
tags: ${{ env.IMAGE_REGISTRY }}:${{ env.IMAGE_TAG }}-quickstart-etl | ||
|
||
- name: Update build session with image tag for example_location | ||
- name: Update build session with image tag for quickstart_etl | ||
id: ci-set-build-output-example-location | ||
if: steps.prerun.outputs.result != 'skip' | ||
uses: dagster-io/dagster-cloud-action/actions/utils/[email protected] | ||
with: | ||
command: "ci set-build-output --location-name=data-eng-pipeline --image-tag=$IMAGE_TAG-example-location" | ||
command: "ci set-build-output --location-name=data-eng-pipeline --image-tag=$IMAGE_TAG-quickstart-etl" | ||
|
||
# Deploy all code locations in this build session to Dagster Cloud | ||
- name: Deploy to Dagster Cloud | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,4 @@ | ||
locations: | ||
- location_name: example_location | ||
- location_name: quickstart_etl | ||
code_source: | ||
package_name: my_dagster_project | ||
build: | ||
directory: ./ | ||
registry: <account-id>.dkr.ecr.us-west-2.amazonaws.com/branch-deployments-gh-action-test | ||
|
||
package_name: quickstart_etl |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,6 @@ | ||
[build-system] | ||
requires = ["setuptools"] | ||
build-backend = "setuptools.build_meta" | ||
|
||
[tool.dagster] | ||
module_name = "quickstart_etl" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
from dagster import ( | ||
Definitions, | ||
ScheduleDefinition, | ||
define_asset_job, | ||
load_assets_from_package_module, | ||
) | ||
|
||
from . import assets | ||
|
||
daily_refresh_schedule = ScheduleDefinition( | ||
job=define_asset_job(name="all_assets_job"), cron_schedule="0 0 * * *" | ||
) | ||
|
||
defs = Definitions( | ||
assets=load_assets_from_package_module(assets), schedules=[daily_refresh_schedule] | ||
) |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
import base64 | ||
from io import BytesIO | ||
from typing import List | ||
|
||
import matplotlib.pyplot as plt | ||
import pandas as pd | ||
import requests | ||
from dagster import MetadataValue, OpExecutionContext, asset | ||
from wordcloud import STOPWORDS, WordCloud | ||
|
||
|
||
@asset(group_name="hackernews", compute_kind="HackerNews API") | ||
def hackernews_topstory_ids() -> List[int]: | ||
"""Get up to 500 top stories from the HackerNews topstories endpoint. | ||
API Docs: https://github.com/HackerNews/API#new-top-and-best-stories | ||
""" | ||
newstories_url = "https://hacker-news.firebaseio.com/v0/topstories.json" | ||
top_500_newstories = requests.get(newstories_url).json() | ||
return top_500_newstories | ||
|
||
|
||
@asset(group_name="hackernews", compute_kind="HackerNews API") | ||
def hackernews_topstories( | ||
context: OpExecutionContext, hackernews_topstory_ids: List[int] | ||
) -> pd.DataFrame: | ||
"""Get items based on story ids from the HackerNews items endpoint. It may take 1-2 minutes to fetch all 500 items. | ||
API Docs: https://github.com/HackerNews/API#items | ||
""" | ||
results = [] | ||
for item_id in hackernews_topstory_ids: | ||
item = requests.get(f"https://hacker-news.firebaseio.com/v0/item/{item_id}.json").json() | ||
results.append(item) | ||
if len(results) % 20 == 0: | ||
context.log.info(f"Got {len(results)} items so far.") | ||
|
||
df = pd.DataFrame(results) | ||
|
||
# Dagster supports attaching arbitrary metadata to asset materializations. This metadata will be | ||
# shown in the run logs and also be displayed on the "Activity" tab of the "Asset Details" page in the UI. | ||
# This metadata would be useful for monitoring and maintaining the asset as you iterate. | ||
# Read more about in asset metadata in https://docs.dagster.io/concepts/assets/software-defined-assets#recording-materialization-metadata | ||
context.add_output_metadata( | ||
{ | ||
"num_records": len(df), | ||
"preview": MetadataValue.md(df.head().to_markdown()), | ||
} | ||
) | ||
return df | ||
|
||
|
||
@asset(group_name="hackernews", compute_kind="Plot") | ||
def hackernews_topstories_word_cloud( | ||
context: OpExecutionContext, hackernews_topstories: pd.DataFrame | ||
) -> bytes: | ||
"""Exploratory analysis: Generate a word cloud from the current top 500 HackerNews top stories. | ||
Embed the plot into a Markdown metadata for quick view. | ||
Read more about how to create word clouds in http://amueller.github.io/word_cloud/. | ||
""" | ||
stopwords = set(STOPWORDS) | ||
stopwords.update(["Ask", "Show", "HN"]) | ||
titles_text = " ".join([str(item) for item in hackernews_topstories["title"]]) | ||
titles_cloud = WordCloud(stopwords=stopwords, background_color="white").generate(titles_text) | ||
|
||
# Generate the word cloud image | ||
plt.figure(figsize=(8, 8), facecolor=None) | ||
plt.imshow(titles_cloud, interpolation="bilinear") | ||
plt.axis("off") | ||
plt.tight_layout(pad=0) | ||
|
||
# Save the image to a buffer and embed the image into Markdown content for quick view | ||
buffer = BytesIO() | ||
plt.savefig(buffer, format="png") | ||
image_data = base64.b64encode(buffer.getvalue()) | ||
md_content = f"![img](data:image/png;base64,{image_data.decode()})" | ||
|
||
# Attach the Markdown content as metadata to the asset | ||
# Read about more metadata types in https://docs.dagster.io/_apidocs/ops#metadata-types | ||
context.add_output_metadata({"plot": MetadataValue.md(md_content)}) | ||
|
||
return image_data |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
[metadata] | ||
name = my_dagster_project | ||
name = quickstart_etl |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,17 @@ | ||
from setuptools import find_packages, setup | ||
|
||
if __name__ == "__main__": | ||
setup( | ||
name="my_dagster_project", | ||
packages=find_packages(exclude=["my_dagster_project_tests"]), | ||
install_requires=[ | ||
"dagster", | ||
], | ||
) | ||
setup( | ||
name="quickstart_etl", | ||
packages=find_packages(exclude=["quickstart_etl_tests"]), | ||
install_requires=[ | ||
"dagster", | ||
"dagster-cloud", | ||
"boto3", | ||
"pandas", | ||
"matplotlib", | ||
"textblob", | ||
"tweepy", | ||
"wordcloud", | ||
], | ||
extras_require={"dev": ["dagit", "pytest"]}, | ||
) |