Skip to content

Latest commit

 

History

History
154 lines (104 loc) · 7.44 KB

deepdive-custom-r-functions-using-rxexec.md

File metadata and controls

154 lines (104 loc) · 7.44 KB
titledescriptionauthorms.authorms.datems.servicems.subservicems.topicmonikerRange
Custom R functions using rxExec
Learn to use simulated data to demonstrate execution of a custom R function that runs on a remote server.
VanMSFT
vanto
11/27/2018
sql
machine-learning-services
tutorial
>=sql-server-2016||>=sql-server-linux-ver15

Run custom R functions on SQL Server using rxExec (SQL Server and RevoScaleR tutorial)

[!INCLUDE SQL Server 2016 and later]

This is tutorial 14 of the RevoScaleR tutorial series on how to use RevoScaleR functions with SQL Server.

In this tutorial, you'll use simulated data to demonstrate execution of a custom R function that runs on a remote server.

You can run custom R functions in the context of SQL Server by passing your function via rxExec, assuming that any libraries your script requires are also installed on the server and those libraries are compatible with the base distribution of R.

The rxExec function in RevoScaleR provides a mechanism for running any R script you require. Additionally, rxExec is able to explicitly distribute work across multiple cores in a single server, adding scale to scripts that are otherwise limited to the resource constraints of the native R engine.

Prerequisites

The R distribution on the client workstation provides a built-in Rgui tool that you can use to run the R script in this tutorial. You can also use an IDE such as RStudio or R Tools for Visual Studio.

Create the remote compute context

Run the following R commands on a client workstation. For example, you are using Rgui, start it from this location: C:\Program Files\Microsoft\R Client\R_SERVER\bin\x64.

  1. Specify the connection string for the SQL Server instance where computations are performed. The server must be configured for R integration. The database name is not used in this exercise, but the connection string requires one. If you have a test or sample database, you can use that.

    Using a SQL login

    sqlConnString<-"Driver=SQL Server;Server=<SQL-Server-instance-name>; Database=<database-name>;Uid=<SQL-user-name>;Pwd=<password>"

    Using Windows authentication

    sqlConnString<-"Driver=SQL Server;Server=<SQL-Server-instance-name>;Database=<database-name>;Trusted_Connection=True"
  2. Create a remote compute context to the SQL Server instance referenced in the connection string.

    sqlCompute<- RxInSqlServer(connectionString=sqlConnString)
  3. Activate the compute context and then return the object definition as a confirmation step. You should see the properties of the compute context object.

    rxSetComputeContext(sqlCompute) rxGetComputeContext()

Create the custom function

In this exercise, you will create a custom R function that simulates a common casino consisting of rolling a pair of dice. Rules of the game determine a win or loss outcome:

  • Roll a 7 or 11 on your initial roll, you win.
  • Roll 2, 3, or 12, you lose.
  • Roll a 4, 5, 6, 8, 9, or 10, that number becomes your point, and you continue rolling until you either roll your point again (in which case you win) or roll a 7, in which case you lose.

The game is easily simulated in R, by creating a custom function, and then running it many times.

  1. Create the custom function using the following R code:

    rollDice<-function() { result<-NULLpoint<-NULLcount<-1while (is.null(result)) { roll<- sum(sample(6, 2, replace=TRUE)) if (is.null(point)) { point<-roll } if (count==1&& (roll==7||roll==11)) { result<-"Win" } elseif (count==1&& (roll==2||roll==3||roll==12)) { result<-"Loss" } elseif (count>1&&roll==7 ) { result<-"Loss" } elseif (count>1&&point==roll) { result<-"Win" } else { count<-count+1 } } result }
  2. Simulate a single game of dice by running the function.

    rollDice()

    Did you win or lose?

Now that you have an operational script, let's see how you can use rxExec to run the function multiple times to create a simulation that helps determine the probability of a win.

Pass rollDice() in rxExec

To run an arbitrary function in the context of a remote SQL Server, call the rxExec function.

  1. Call the custom function as an argument to rxExec, together with other parameters that modify the simulation.

    sqlServerExec<- rxExec(rollDice, timesToRun=20, RNGseed="auto") length(sqlServerExec)
    • Use the timesToRun argument to indicate how many times the function should be executed. In this case, you roll the dice 20 times.

    • The arguments RNGseed and RNGkind can be used to control random number generation. When RNGseed is set to auto, a parallel random number stream is initialized on each worker.

  2. The rxExec function creates a list with one element for each run; however, you won't see much happening until the list is complete. When all the iterations are complete, the line starting with length will return a value.

    You can then go to the next step to get a summary of your win-loss record.

  3. Convert the returned list to a vector using R's unlist function, and summarize the results using the table function.

    table(unlist(sqlServerExec))

    Your results should look something like this:

    Loss Win12 8

Conclusion

Although this exercise is simplistic, it demonstrates an important mechanism for integrating arbitrary R functions in R script running on SQL Server. To summarize the key points that make this technique possible:

  • SQL Server must be configured for machine learning and R integration: SQL Server Machine Learning Services with the R feature, or SQL Server 2016 R Services (in-Database).

  • Open-source or third-party libraries used in your function, including any dependencies, must be installed on SQL Server. For more information, see Install new R packages.

  • Moving script from a development environment to a hardened production environment can introduce firewall and network restrictions. Test carefully to make sure your script is able to perform as expected.

Next steps

For a more complex example of using rxExec, see this article: Coarse grain parallelism with foreach and rxExec

close