regex

daniel-kayode · Nov 28, 2021 · 70b9f64 · 70b9f64
1 parent 6e41dfd
commit 70b9f64
Show file tree

Hide file tree

Showing 4 changed files with 618 additions and 0 deletions.
diff --git a/Advanced/regex/regex_tutorial_exercise_answer.ipynb b/Advanced/regex/regex_tutorial_exercise_answer.ipynb
@@ -0,0 +1,154 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import re"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**1. Extract all twitter handles from following text. Twitter handle is the text that appears after https://twitter.com/ and is a single word. Also it contains only alpha numeric characters i.e. A-Z a-z , o to 9 and underscore _**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['elonmusk', 'teslarati', 'dummy_tesla', 'dummy_2_tesla']"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "text = '''\n",
+    "Follow our leader Elon musk on twitter here: https://twitter.com/elonmusk, more information \n",
+    "on Tesla's products can be found at https://www.tesla.com/. Also here are leading influencers \n",
+    "for tesla related news,\n",
+    "https://twitter.com/teslarati\n",
+    "https://twitter.com/dummy_tesla\n",
+    "https://twitter.com/dummy_2_tesla\n",
+    "'''\n",
+    "pattern = 'https://twitter\\.com/([a-zA-Z0-9_]+)'\n",
+    "\n",
+    "re.findall(pattern, text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**2. Extract Concentration Risk Types. It will be a text that appears after \"Concentration Risk:\", In below example, your regex should extract these two strings**\n",
+    "\n",
+    "(1) Credit Risk\n",
+    "\n",
+    "(2) Supply Rish"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['Credit Risk', 'Credit Risk']"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "text = '''\n",
+    "Concentration of Risk: Credit Risk\n",
+    "Financial instruments that potentially subject us to a concentration of credit risk consist of cash, cash equivalents, marketable securities,\n",
+    "restricted cash, accounts receivable, convertible note hedges, and interest rate swaps. Our cash balances are primarily invested in money market funds\n",
+    "or on deposit at high credit quality financial institutions in the U.S. These deposits are typically in excess of insured limits. As of September 30, 2021\n",
+    "and December 31, 2020, no entity represented 10% or more of our total accounts receivable balance. The risk of concentration for our convertible note\n",
+    "hedges and interest rate swaps is mitigated by transacting with several highly-rated multinational banks.\n",
+    "Concentration of Risk: Supply Risk\n",
+    "We are dependent on our suppliers, including single source suppliers, and the inability of these suppliers to deliver necessary components of our\n",
+    "products in a timely manner at prices, quality levels and volumes acceptable to us, or our inability to efficiently manage these components from these\n",
+    "suppliers, could have a material adverse effect on our business, prospects, financial condition and operating results.\n",
+    "'''\n",
+    "pattern = 'Concentration of Risk: ([^\\n]*)'\n",
+    "\n",
+    "re.findall(pattern, text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**3. Companies in europe reports their financial numbers of semi annual basis and you can have a document like this. To exatract quarterly and semin annual period you can use a regex as shown below**\n",
+    "\n",
+    "Hint: you need to use (?:) here to match everything enclosed"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['2021 Q1', '2021 S1']"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "text = '''\n",
+    "Tesla's gross cost of operating lease vehicles in FY2021 Q1 was $4.85 billion.\n",
+    "BMW's gross cost of operating vehicles in FY2021 S1 was $8 billion.\n",
+    "'''\n",
+    "\n",
+    "pattern = 'FY(\\d{4} (?:Q[1-4]|S[1-2]))'\n",
+    "matches = re.findall(pattern, text)\n",
+    "matches"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/Advanced/regex/regex_tutorial_exercise_questions.ipynb b/Advanced/regex/regex_tutorial_exercise_questions.ipynb
@@ -0,0 +1,135 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<h1 align='center'>Python Regular Expression Tutorial Exericse</h1>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import re"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**1. Extract all twitter handles from following text. Twitter handle is the text that appears after https://twitter.com/ and is a single word. Also it contains only alpha numeric characters i.e. A-Z a-z , o to 9 and underscore _**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "text = '''\n",
+    "Follow our leader Elon musk on twitter here: https://twitter.com/elonmusk, more information \n",
+    "on Tesla's products can be found at https://www.tesla.com/. Also here are leading influencers \n",
+    "for tesla related news,\n",
+    "https://twitter.com/teslarati\n",
+    "https://twitter.com/dummy_tesla\n",
+    "https://twitter.com/dummy_2_tesla\n",
+    "'''\n",
+    "pattern = '' # todo: type your regex here\n",
+    "\n",
+    "re.findall(pattern, text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**2. Extract Concentration Risk Types. It will be a text that appears after \"Concentration Risk:\", In below example, your regex should extract these two strings**\n",
+    "\n",
+    "(1) Credit Risk\n",
+    "\n",
+    "(2) Supply Rish"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "text = '''\n",
+    "Concentration of Risk: Credit Risk\n",
+    "Financial instruments that potentially subject us to a concentration of credit risk consist of cash, cash equivalents, marketable securities,\n",
+    "restricted cash, accounts receivable, convertible note hedges, and interest rate swaps. Our cash balances are primarily invested in money market funds\n",
+    "or on deposit at high credit quality financial institutions in the U.S. These deposits are typically in excess of insured limits. As of September 30, 2021\n",
+    "and December 31, 2020, no entity represented 10% or more of our total accounts receivable balance. The risk of concentration for our convertible note\n",
+    "hedges and interest rate swaps is mitigated by transacting with several highly-rated multinational banks.\n",
+    "Concentration of Risk: Supply Risk\n",
+    "We are dependent on our suppliers, including single source suppliers, and the inability of these suppliers to deliver necessary components of our\n",
+    "products in a timely manner at prices, quality levels and volumes acceptable to us, or our inability to efficiently manage these components from these\n",
+    "suppliers, could have a material adverse effect on our business, prospects, financial condition and operating results.\n",
+    "'''\n",
+    "pattern = '' # todo: type your regex here\n",
+    "\n",
+    "re.findall(pattern, text)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**3. Companies in europe reports their financial numbers of semi annual basis and you can have a document like this. To exatract quarterly and semin annual period you can use a regex as shown below**\n",
+    "\n",
+    "Hint: you need to use (?:) here to match everything enclosed"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "text = '''\n",
+    "Tesla's gross cost of operating lease vehicles in FY2021 Q1 was $4.85 billion.\n",
+    "BMW's gross cost of operating vehicles in FY2021 S1 was $8 billion.\n",
+    "'''\n",
+    "\n",
+    "pattern = '' # todo: type your regex here\n",
+    "matches = re.findall(pattern, text)\n",
+    "matches"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "__[Solution](http://ndtv.com)__"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}