Skip to content

Latest commit

 

History

History
261 lines (168 loc) · 10.7 KB

docsmd.md

File metadata and controls

261 lines (168 loc) · 10.7 KB

idresearch Functions

Function for idresearch.

  • idresearch.doc_stats(raw_abs)
    Abstract Statistics

    Finds most frequently used keywords in the abstract.

    • Parameters
      raw_abs (str) – string. Raw text of any article abstract.

    • Returns
      dictionary. Shows the word and number of occurences in the abstract.

    • Return type
      freq_ans (dict)

  • idresearch.export_reco_csv(url)
    Export Recommended Articles dataframe

    Exports a csv of all the recommended articles (url, abstract, year, publisher, citation_count)

    • Parameters
      url (str) – string. URL from semanticsscholar, arxiv, aclweb, acm, biorxiv are supported.
    • Exports:
      new_df (csv): csv. Exports a csv containing information on all recommended articles. Filename: recolist.csv
  • idresearch.get_doc(doc_url)
    Get-response

    Obtaining responses from semanticscholar api.

    • Parameters
      url (str) – string. URL from semanticsscholar, arxiv, aclweb, acm, biorxiv are supported.

    • Returns
      dictionary. Contains metadata of the main article query doc_paperId (str): string. Contains semanticscholar paperId reco_fox (dict): dictionary. Contains metadata of the recommended papers using SemanticScholar AI

    • Return type
      doc_fox (dict)

  • idresearch.get_ner(raw_abs)
    Get Name Entity Recognition

    Obtains Name Entity type: ORG and PRODUCT from the abstract.

    • Parameters
      raw_abs (str) – string. Raw text of any article abstract.

    • Returns
      list. Gives string type output within a list with ORG type entities. prod_ent (list): list. Gives string type output within a list with PRODUCT type entities.

    • Return type
      org_ent (list)

  • idresearch.get_reco_df(url)
    Get Recommended Articles dataframe

    Obtains a dataframe of all the recommended articles (url, abstract, year, publisher, citation_count)

    • Parameters
      url (str) – string. URL from semanticsscholar, arxiv, aclweb, acm, biorxiv are supported.

    • Returns
      dataframe. Dataframe contatinig information on all recommended articles.

    • Return type
      new_df (dataframe)

  • idresearch.main_abstract(url)
    Main Paper’s Abstract

    Obtains article abstract for the queried main paper

    • Parameters
      url (str) – string. URL from semanticsscholar, arxiv, aclweb, acm, biorxiv are supported.

    • Returns
      string. Abstract of the main queried paper.

    • Return type
      main_abs (str)

  • idresearch.plot_CitationCount_df(new_df)
    Plot Number of Papers vs Citation Count

    Plots a Number of Papers vs Citation Count Year histogram.

    • Parameters
      new_df (dataframe) – Pandas dataframe exported using get_reco_df function.

    • Returns
      Returns a matplotlib plot.

    • Return type
      plot

  • idresearch.plot_CitationCount_url(url)
    Plot Number of Papers vs Citation Count

    Plots a Number of Papers vs Citation Count histogram.

    • Parameters
      url (str) – string. URL from semanticsscholar, arxiv, aclweb, acm, biorxiv are supported.

    • Returns
      Returns a matplotlib plot.

    • Return type
      plot

  • idresearch.plot_YearTrend_df(new_df)
    Plot Number of Papers vs Publication Year

    Plots a Number of Papers vs Publication Year histogram.

    • Parameters
      new_df (dataframe) – Pandas dataframe exported using get_reco_df function.

    • Returns
      Returns a matplotlib plot.

    • Return type
      plot

  • idresearch.plot_YearTrend_url(url)
    Plot Number of Papers vs Publication Year

    Plots a Number of Papers vs Publication Year histogram.

    • Parameters
      url (str) – string. URL from semanticsscholar, arxiv, aclweb, acm, biorxiv are supported.

    • Returns
      Returns a matplotlib plot.

    • Return type
      plot

  • idresearch.reco_abstract(i, url)
    Recommended Paper’s Abstract

    Obtains article abstract for the queried recommended paper

    • Parameters

      • i (int) – integer. Denotes the index number as seen in the output from get_reco_df or export_reco_csv functions

      • url (str) – string. URL from semanticsscholar, arxiv, aclweb, acm, biorxiv are supported.

    • Returns
      string. Abstract of the queried recommended paper.

    • Return type
      reco_abs (str)

  • idresearch.reco_authors(url, num)
    Get list of authors

    Provides the list of authors and the number of times an author’s paper has been recommended in decending order.

    • Parameters

      • url (str) – string. URL from semanticsscholar, arxiv, aclweb, acm, biorxiv are supported.

      • num (int) – integer. Number of author names in the output.

    • Returns
      dictionary. Returns a dictionary with the Author name and number of times the author’s article has been recommended.

    • Return type
      occurence_common (dict)

  • idresearch.summarize_doc(raw_abs, n)
    Summarize the abstract document

    Summarizes the abstract by assigning weights to each sentence (based on common words and length of sentences).

    • Parameters

      • raw_abs (str) – string. Raw text of any article abstract.

      • n (int) – integer. Number of lines for the summary.

    • Returns
      string. Summary of the abstract in ‘n’ number of lines, based on the arguement.

    • Return type
      summary (str)