Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

63 google dataset search support #136

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

MBcode
Copy link
Contributor

@MBcode MBcode commented Oct 12, 2022

issue63 early functions and route for getting /summary.jsonld
that can be inserted in the dataset pages for crawling via reading
a /sitemap.xml route, but will need to be able to make those urls public 1st
Could go right from sitemap to summary.jsonld urls till can get in dataset page

MBcode added 10 commits October 5, 2022 16:56
should this be a route or..? right now it gets printed to stdout everytime the datasets are displayed
but needs to be inserted into the frontend
I'll work that up before next mtg, and check on dataset page placement
get_dataset_jsonld route, off of get_dataset now, so can get just one for the frontend rendering of that one dataset page
but the problem is this method gets a dict from mongo that has objids & datatimes which are not serializable, as originally needed
back to easy pydantic dso.json() mapping
ret script str; and  /jsonld to differentiate route
remove prints
try summary.jsonld  in case we want a summary.json (which I would use, and just map)
also
added /sitemap.xml that dumps (in case we want to cache it) then returns it
but none of it's urls can be crawled yet, till v2 gets a public setting
sitemap() -> str:
vs response_model
used black to get past lint test
took hack advice to make /summary.jsonld and /sitemap.xml open, but not working yet
@MBcode MBcode self-assigned this Oct 19, 2022
@@ -116,6 +116,16 @@
tags=["extractors"],
dependencies=[Depends(get_current_username)],
)
api_router.include_router(
extractors.router,
Copy link
Member

@longshuicy longshuicy Oct 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your newly added routes are in routers/datasets.py; but this new prefix is importing routes from extractors.
So i can actually access your routes under the /dataset/{id}/summary.jsonld but not here

I'd recommend you separate them out in a new router file (e.g. summary.py) and include them here. summary.router

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was left over, thanks, will skip the tags for now, and take your other advice to put those routes in another file

@@ -116,6 +116,16 @@
tags=["extractors"],
dependencies=[Depends(get_current_username)],
)
api_router.include_router(
extractors.router,
prefix="/summary.jsonld",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix is for the prefix :-p

e.g. prefix = "/dataset" means the routes will all start with /dataset s--> /datasets/{id}, /datasets/{id}/files etc

@@ -226,6 +270,56 @@ async def get_dataset(dataset_id: str, db: MongoClient = Depends(dependencies.ge
raise HTTPException(status_code=404, detail=f"Dataset {dataset_id} not found")


@router.get("/{dataset_id}/summary.jsonld", response_model=DatasetOut)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is under dataset, i can actually set break points and hit this route
image
image

But then it errors out and you should investigate
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during a cleanup I make sure it is a DatasetOut to safe to call .json on it now

return f.read()


@router.get("/sitemap.xml")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not work... when i try /datasets/sitemap.xml it try to cast sitemap.xml into an objectId, I'm guessing because it collides with the /datasets/{id} endpoint

if you separate these out to separate file and give different prefix than dataset, it might work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separated out into a sitemap.py file, &will make another pass soon, to see if it will open up the route

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different file now, w/no overlapping routes

MBcode and others added 2 commits October 20, 2022 10:07
took out old/overlapping routes, and check if DatasetOut to make sure can call .json on it
@lmarini lmarini modified the milestone: 2.0.alpha-2 Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants