Table of Contents
- Cosmos DB scenario-based labs - Retail hands-on lab step-by-step
- Abstract and learning objectives
- Overview
- Solution architecture (High-level)
- Requirements
- Before the demo
- Exercise 1: Deployment and Setup
- Exercise 1: Configure Databricks and generate event data
- Task 5: Perform and deploy collaborative filtering rules calculation
- Exercise 2: Email alerts using Logic Apps
- Exercise 3: Explore Contoso Movie Store
- After the hands-on lab
In this demo you will show your audience how to utilize Azure services to host a movie retail store with custom AI models and CosmosDb. Several other PaaS based technologies will be used to show how Azure can be used to migrate legacy applications to the cloud.
Contoso Movies, Ltd. has redesigned its website to utilize Azure PaaS services including CosmosDb, Functions, EventHubs, Stream Analytics, Power BI and Logic Apps. As part of this redesign they have also implemented a new recommendation system based on custom AI models. These AI models are done offline and stored in CosmosDb for reference when users are browsing a site. User events will implicitly rank the items they are clicking on and then modify their recommendations based on these events.
-
Microsoft Azure subscription must be pay-as-you-go or MSDN.
- Trial subscriptions will not work.
-
Azure CLI - version 2.0.68 or later
NOTE You can run the following commands to install the latest
Invoke-WebRequest -Uri https://aka.ms/installazurecliwindows -OutFile .\AzureCLI.msi;
Start-Process msiexec.exe -Wait -ArgumentList '/I AzureCLI.msi /quiet'
Refer to the Before the hands-on lab setup guide manual before continuing to the lab exercises.
Be sure that you change the script mode to demo
such that the solution code is deployed to the web app and function apps.
Duration: 60 minutes
Synopsis: In this exercise you will do the necessary setup items that could not be done in the deployment scripts.
Duration: 30 minutes
Synopsis: We have pre-generated a set of events that include buy and details events. Based on this data, a Top Items recommendation will be made to users that are new to the site (aka a cold start recommendation). You will implement this top items code in the web application and function applications, then deploy the applications to test the functionality.
The algorithms for creating the offline calculations are written in Python and are executed via Azure Databricks.
-
Open the Azure portal (https://portal.azure.com), search for your assigned lab resource group. If you were not assigned a resource group, your generated resource group will be named after the following pattern: YOURINIT-s2-retail.
-
Select your resource group, and then select your Azure Databricks instance, it should be named s2_databricks....
-
Select Launch Workspace, if prompted, login as the account you used to create your environment.
-
In the side navigation, Select Clusters.
-
Select Create Cluster.
-
On the create cluster form, provide the following:
-
Cluster Name: small
-
Cluster Type: Standard
-
Databricks Runtime Version: Runtime: 5.5 (Scala 2.11, Spark 2.4.3) (Note: the runtime version may have LTS after the version. This is also a valid selection.)
-
Python Version: 3
-
Enable Autoscaling: Uncheck this option.
-
Auto Termination: Check the box and enter 120
-
Worker Type: Standard_DS3_v2
-
Driver Type: Same as worker
-
Workers: 1
-
-
Select Create Cluster.
-
Before continuing to the next step, verify that your new cluster is running. Wait for the state to change from Pending to Running.
-
Select the small cluster, then select Libraries.
-
Select Install New.
-
In the Install Library dialog, select Maven for the Library Source.
-
In the Coordinates field type:
com.microsoft.azure:azure-cosmosdb-spark_2.4.0_2.11:1.4.1
-
Select Install.
-
Wait until the library's status shows as Installed before continuing.
-
Within Azure Databricks, select Workspace on the menu, then Users, select your user, then select the down arrow on the top of your user workspace. Select Import.
-
Within the Import Notebooks dialog, select Import from: file, then drag-and-drop the file or browse to upload it (
{un-zipped repo folder}/Retail/Notebooks/02 Retail.dbc
) -
Select Import
-
After importing, select the new 02 Retail folder, then navigation to the Includes folder
-
Select the Shared-Configuration notebook
-
Update the configuration settings and set the following using the values from your lab setup script output:
- Endpoint = Cosmos DB endpoint url
- Masterkey = Cosmos DB master key
- Database = Database id of the cosmos db ('movies')
If you do not have your setup script output values available for reference, you may find the
Endpoint
andMasterkey
values by navigating to your Cosmos DB account in the Azure portal, then selecting Keys in the left-hand menu. Copy the URI value forEndpoint
, and Primary Key for theMasterkey
value. -
Attach your cluster to the notebook using the dropdown. You will need to do this for each notebook you open. In the drop down, select the small cluster.
-
Next, navigate back up to 02 Retail and select the 01 Event Generator notebook
This notebook will simulate the browsing and purchasing activity for six users with different personality based preferences and save the result to the
events
container in Cosmos DB.The movies have been pre-selected and sorted into the genres of comedy, drama and action. While the actual movie selection and activity taken is random, it is weighted to respect the user's preferences in each genre to hit a distribution that would mirror that user's taste.
For example, user 400001 has the preference of 20 for comedy, 30 for drama, 50 for action. This will result in the user logging more activity with action movies.
NOTE: Your results (aka the
events
generated) may be different from your fellow lab participants -
Attach your cluster to the notebook using the dropdown. In the drop down, select the small cluster.
-
Select Run All.
-
Browse to the {un-zipped repo folder}/Retail/Solution/Contoso Movies folder and open the Contoso.Apps.Movies.sln solution.
If Visual Studio prompts you to sign in when it first launches, use the account provided to you for this lab (if applicable), or an existing Microsoft account.
-
Within the Solution Explorer, expand the /Utilities/MovieDataImport project and open the Program.cs file. Take a few moments to browse code. You will see that it:
- Aggregates all the event data generated from the Databricks notebook
- Creates the user personalities
- Creates the movie categories/genres
- Creates the movies
-
Right-click the project, select Set as startup project.
-
Press F5 to run the project.
You may see several of the following lines output to the console window after saving the genres and before adding the movies:
Input string was not in a correct format.
. You can safely ignore these due to some movies the API retrieved are poorly formatted.
NOTE: You must wait for the Event Generator Databricks notebook to be completed first before running this step. This is to ensure that later steps in the lab to match.
Synopsis: Based on the pre-calculated events in the Cosmos DB for our pre-defined personality types (Comedy fan, Drama fan, etc.), you will implement and deploy an algorithm that will generate these associations and put them in Cosmos DB for offline processing by the web and function applications.
-
Switch back to your Databricks workspace and open the 02 Association Rules notebook.
-
Attach your cluster to the notebook using the dropdown. In the drop down, select the small cluster.
-
Run each cell of the 02 Association Rules notebook by selecting within the cell, then entering Ctrl+Enter on your keyboard. Pay close attention to the instructions within the notebook so you understand each step of the data preparation process.
The goal of this algorithm is to compute two metrics that indicate the strength of a relationship between a source item and a target item based on event history, and then save that matrix to the associations container in Cosmos DB.
The algorithm begins with grouping events with a buy action into a transaction, grouping by the sessionId. This provides the set of items bough together.
For example, a transaction with two items would look like:
'404973': ['5512872', '4172430']
where 404973 is the sessionId that is used as the transactionId, and the the array contains the id's of the items bought ('5512872' and '4172430').
Synopsis: In this exercise you will execute the implict ratings notebook in Azure Databricks to generate the implict rating for each user that has event data. You will only execute this once during this lab, however this notebook would need to be run on a set schedule to ensure that the users rating data is up to date.
-
Within Azure Databricks, open 03 Ratings.
-
Attach your cluster to the notebook using the dropdown. In the drop down, select the small cluster.
-
Run each cell of the 03 Ratings notebook by selecting within the cell, then entering Ctrl+Enter on your keyboard. Pay close attention to the instructions within the notebook so you understand each step of the data preparation process.
This notebook will use the implict events captured in the events container in Cosmos DB to calculate what a user would rate a given item, based on their actions. In other words it converts a users buy, addToCart and details actions into a numeric score for the item. The resulting user to item ratings matrix will be saved to the ratings container in Cosmos DB.
-
Switch back to the Azure portal.
-
In your resource group, navigate to your Cosmos DB instance.
-
Open the ratings container, review the items in the container.
NOTE: These ratings are generated as part of this notebook as an 'offline' operation. If you collect a significant amount of user data, you would need to re-evaluate the events using this notebook and populate the ratings container again for the online calculations to utilize.
-
Within Azure Databricks, open 04 Similarity.
-
Attach your cluster to the notebook using the dropdown. In the drop down, select the small cluster.
-
Run each cell of the 04 Similarity notebook by selecting within the cell, then entering Ctrl+Enter on your keyboard. Pay close attention to the instructions within the notebook so you understand each step of the data preparation process.
The notebook logic uses the user to item ratings previously created to calculate a score indicating the similarity between a source item and a target item. The process begins by loading the ratings matrix and for each user to item rating, calculating a new normalized rating (to adjust for the user's bias).
An overlap matrix is calculated that identifies, for any pair of items, how many users rated both items. First, the normalized ratings matrix is converted to a Boolean matrix. That is, if an item for a user has a rating (regardless of the value of the rating), it has a value of 1, otherwise it is zero. Then dot product of the normalized ratings matrix against its transpose is calculated. This yields a simpler matrix where the value each cell now contains the count of the number users who rated both items. Cells that don't have any overlap, have a value of zero.
Separately, the cosine similarity of the normalized ratings matrix is computed. It's easiest to understand the cosine similarity calculation as being done between an item
i
and another itemj
. The cosine similarity is a ratio:- The numerator is computed as the sum of the product of the normalized rating of item i multiplied with the rating of j, for all users who have provided ratings. The denominator is computed as the square root of the sum of the squares of the normalized rating of item i multiplied by the square root of the sum of thesquares of the normalized rating of item j. In Python, the logic uses the cosine_similarity method from scikit-learn to compute the similarity between items by providing it our normalized user-to-items ratings matrix.
The result is then filtered to remove entries with a similarity score lower than configured, and having an overlap in the overlap matrix of less than a configured overlap in quantity of ratings for the pair of items. Just before saving, any resulting similarities with scores less than the configured minimum similarity are removed, so that weaker similarities are not recommended.
-
Open the Azure Portal, navigate to your Stream Analytics job that was created for you in the setup script
-
Select Inputs
-
Select +Add stream input, then select Event Hub
-
For the alias, type s2events
-
Select your subscription
-
Select the s2ns.. event hub
-
For the event hub, select store
-
For the policy name, select RootManageSharedAccessKey
-
Select Save
-
Select Outputs
-
Select +Add, then select Power BI
-
For the output alias, type eventOrdersLastHour
-
For the dataset, type eventOrdersLastHour
-
For the table name, type eventOrdersLastHour
-
Select Authorize, login to your Power BI instance
-
Select Save
-
Repeat for steps 11-16, but replace eventOrdersLastHour with:
- eventSummary
- failureCount
- eventData
-
Select Query
-
Update the query to the following:
SELECT Count(*) as FailureCount
INTO failureCount
FROM s2events
WHERE Event = 'paymentFailure'
GROUP BY TumblingWindow(second,10)
SELECT Count(distinct UserId) as UserCount, System.TimeStamp AS Time, Count(*) as EventCount
INTO eventData
FROM s2events
GROUP BY TumblingWindow(second,10)
SELECT System.TimeStamp AS Time, Event, Count(*)
INTO eventSummary
FROM s2events
GROUP BY Event, TumblingWindow(second,10)
select DateAdd(second,-10,System.Timestamp()) AS WinStartTime, System.Timestamp() AS WinEndTime,0 as Min, Count(*) as Count, 10 as Target
into eventOrdersLastHour
from s2events
where event = 'buy'
GROUP BY SlidingWindow(second,10)
- The Query windows should look similar to this:
-
Select Save query
-
Select Overview, in the menu, select Start to start your stream analytics job
- In the dialog, ensure that Now is selected, then select Start
NOTE: If your job fails for any reason, you can use the Activity Log to see what the error(s) were.
-
Browse to the {un-zipped repo folder}/Retail/Solution/Contoso Movies folder and open the Contoso.Apps.Movies.sln solution
-
Right-click the DataGenerator project, select Set as startup project
-
Press F5 to run the project
-
Notice events will be generated based on a set of users and their preferred movie type
-
Buy events will be generated for the first 30 seconds with random payment failures also generated. After 30 seconds, you will notice the orders per hour will fall below the target of 10. This would signify that something is wrong with the front end web site or order processing.
-
After about 1 minute, close the DataGenerator console program
-
Open a new browser window to Power BI
-
Click Sign In, sign in using the same credentials you used to authorize your outputs for Stream Analytics above.
-
Select My workspace
-
Select +Create, then select Dashboard
-
For the name, type Contoso Movies, select Create
-
Select the ... ellipses, then select +Add tile
-
Select Custom Streaming Data, select Next
-
Select the eventData data set, then select Next
Important: If the eventData data set does not appear, it is because there is a lag time of several minutes between when you first configure the Stream Analytics Power BI output and when data first appears in the streaming data set. Please ensure the data generator is running and that you have started the Stream Analytics query. Also, you may try restarting the Function App as well.
-
For the visualization type, select Card
-
For the Fields, select EventCount
-
Select Next
-
For the title, type Event Count, then select Apply
-
Select +Add tile, you may need to select the ... ellipses first
-
Select Custom Streaming Data, select Next. Use the following table to create the needed tiles:
Dataset | Type | Fields | Title |
eventData | Card | UserCount | User Count |
failureCount | Card | FailureCount | Payment Failures |
eventSummary | Line cart | Axis = UserCount, Legend = Event, Values = Count | Count By Event |
eventOrdersLastHour | Gauge | Value = Count, Minimum = Min, Target = Target | Orders Per Hour |
- Your dashboard should look similar to the following:
-
Switch back to Visual Studio, press F5 to run the data generator project
-
Switch to your Power BI dashboard, after a few minutes, you should see it update with the event data:
Duration: 30 minutes
In this exercise you will configure your change feed function to call an HTTP login app endpoint that will then send an email when an order event occurs. The function will be using Polly to handle retries in the case the function app is not available.
-
Open the Azure Portal to your resource group and select the Logic App in your resource group, it should be named s2logicapp...
-
Click Edit
- Click +New step
- Search for send an email, then select the Office 365 outlook connector
- Click Sign in, login using your Azure AD credentials
-
Set the To as your email
-
Set the Subject as Thank you for your order
-
Set the Body as Your order is being processed
-
Click Save
- Click on the When a HTTP request is received action, copy the HTTP POST URL for the logic app and save it for the next task
-
Open the Azure Portal to your resource group and select the Function App in your resource group, it should be named s2func...
-
Click Configuration
-
Add or update the LogicAppUrl configuration variable to the Logic App http endpoint you recorded above
-
Click Save
-
Switch back to the Azure Portal
-
Select your Databricks instance, then click Launch Workspace
-
Browse to each of the notebooks that were deployed as part of your deployment script and review the contents with your audience. Note the following:
-
01 Event Generator - this notebook will generate a random set of events for each target user and their personality. This is then used to generate the 'ratings'. Most of the generation code is in Cmd 9 and you can focus your converstation around that cell.
-
02 Associations Rules - Review the comments in Cmd 7, this describes what is happening in the rest of the notebook
-
03 Ratings - Review Cmd 9, point out the weightings for each action and then where the implict rating is created.
-
04 Similarity - REview the comments in Cmd 7, this describes what is happening in the rest of the notebook
-
Switch to Visual Studio and open the Contoso.Apps.FunctionApp project, then open the RecommendationHelper.cs file
-
Navigate to the
public static List<Item> Get(string algo, int userId, int contentId, int take)
Get method signature. Point out that this is the entry point for where a recommedation will start based on the algorithm requested. -
Review the following methods and their code:
- TopRecommendation - this is the basic method for randomly selecting a set of top purchased items.
- AssociationRecommendationByUser
- CollaborativeBasedRecommendation
-
Switch to Visual Studio and open the Contoso.Apps.FunctionApp project, then open the FuncChangeFeed.cs file
-
Review the Dependency Injection for the IHttpClientFactory and the CosmosClient objects:
// Use Dependency Injection to inject the HttpClientFactory service that was configured in Startup.cs.
public FuncChangeFeed(IHttpClientFactory httpClientFactory, CosmosClient cosmosClient)
{
_httpClientFactory = httpClientFactory;
_cosmosClient = cosmosClient;
}
- Review the following methods and their code:
-
DoAggregateCalculations - This method updates the item aggregations for the
buy
events to keep track of the top items purchased. This will continually update and drive thetop
suggestions. You will see this when you execute the Data Generator tool. These aggregations will be stored in theobject
table as anItemAggregation
object type. -
AddEventToEventHub - This method will forward the changefeed item to the event hub where Stream Analytics will then process the data.
-
CallLogicApp - This method will forward the changefeed item to the logic app's http endpoint that will generate an email
-
Switch to Visual Studio, right-click the DataGenerator project, select Set as startup project
-
Press F5 to run the project
-
For each
buy
event, you should receive an email
NOTE: You could receive quite a
few
emails.
Duration: 15 minutes
Synopsis: You will show your attendees the Contoso Movies store. It is an ecommerce site setup using Cosmos DB as its data store. In addition, Azure Functions are monitoring the changefeed
of Cosmos DB to execute reporting and notification activities. A second function is in charge of providing recommendations based on the logged in user. This function calls logic and pre-calculated offline AI models based on user behavior to make movie recommendations.
- Open the deployed Conotos Movie web site
NOTE: This should have opened as part of the
demo
mode setup script.
-
Mention that you are not logged in as any user and the results that are being displayed are based on the top purchased items in the Cosmso database.
-
In the top navigation, select the Login link
-
Mention that there are several pre-populated personalities. Select the [email protected] personality
-
Mention that you now have targeted movies based on two different algorithms (Association and Collaborative)
-
In the top navigation, select the [email protected] link, then select SWITCH
-
Change the user to the [email protected] user. Note how the recommendations are different from the comedy user.
-
In the top navigation, select the [email protected] link, then select SWITCH
-
Select New User. This will create a session as a new user that has no implict ratings (no actions have been generated).
-
Point out that you have no Association or Collaboration recommendations.
-
Click on a few movies in the portal, then select Add to Cart for a random set. These actions will generate events for the new user.
-
Click Home, you should now see recommendations displayed.
NOTE: Some movies may not have a corresponding similarity or assocations depending on the randomness of the Databricks notebook execution. You may need to click on a few movies before you see any recommendations.
Duration: 10 minutes
In this exercise, attendees will deprovision any Azure resources that were created in support of the lab.
-
Using the Azure portal, navigate to the Resource group you used throughout this hands-on lab by selecting Resource groups in the menu.
-
Search for the name of your research group, and select it from the list.
-
Select Delete in the command bar, and confirm the deletion by re-typing the Resource group name and selecting Delete.
You should follow all steps provided after attending the Hands-on lab.