title | description | author | ms.author | ms.date | ms.service | ms.subservice | ms.topic | monikerRange |
---|---|---|---|---|---|---|---|---|
Tutorial: Regex string search in Java | This tutorial shows you how to use SQL Server Language Extensions and run Java code that search a string with regular expressions (regex). | rwestMSFT | randolphwest | 04/29/2024 | sql | language-extensions | tutorial | >=sql-server-ver15 || >=sql-server-linux-ver15 |
[!INCLUDE sqlserver2019-and-later]
This tutorial shows you how to use SQL Server Language Extensions to create a Java class that receives two columns (ID and text) from SQL Server and a regular expression (regex) as an input parameter. The class returns two columns back to SQL Server (ID and text).
For a given text in the text column sent to the Java class, the code checks if the given regular expression is fulfilled, and returns that text together with the original ID.
This sample code uses a regular expression that checks if a text contains the word Java
or java
.
Database Engine instance on [!INCLUDE sssql19-md] and later versions, with the extensibility framework and Java programming extension on Windows or on Linux. For more information, see SQL Server Language Extensions. For more information about coding requirements, see How to call the Java runtime in SQL Server Language Extensions.
SQL Server Management Studio or Azure Data Studio for executing T-SQL.
Java SE Development Kit (JDK) 8 or JRE 8 on Windows or Linux.
The
mssql-java-lang-extension.jar
file from the Microsoft Extensibility SDK for Java for SQL Server .
Command-line compilation using javac
is sufficient for this tutorial.
First, create a new database and populate a testdata
table with ID
and text
columns.
CREATEDATABASEjavatest; GO USE javatest; GO CREATETABLEtestdata ( [id] INTNOT NULL, [text] NVARCHAR(100) NOT NULL ); GO -- Insert data into test tableINSERT INTO testdata ([id], [text]) VALUES (1, 'This sentence contains java'); INSERT INTO testdata ([id], [text]) VALUES (2, 'This sentence does not'); INSERT INTO testdata ([id], [text]) VALUES (3, 'I love Java!'); GO
In this step, create a class file called RegexSample.java
and copy the following Java code into that file.
This main class is importing the SDK, which means that the jar file downloaded in step1 needs to be discoverable from this class.
packagepkg; importcom.microsoft.sqlserver.javalangextension.PrimitiveDataset; importcom.microsoft.sqlserver.javalangextension.AbstractSqlServerExtensionExecutor; importjava.util.LinkedHashMap; importjava.util.LinkedList; importjava.util.ListIterator; importjava.util.regex.*; publicclassRegexSampleextendsAbstractSqlServerExtensionExecutor { privatePatternexpr; publicRegexSample() { // Setup the expected extension version, and class to use for input and output datasetexecutorExtensionVersion = SQLSERVER_JAVA_LANG_EXTENSION_V1; executorInputDatasetClassName = PrimitiveDataset.class.getName(); executorOutputDatasetClassName = PrimitiveDataset.class.getName(); } publicPrimitiveDatasetexecute(PrimitiveDatasetinput, LinkedHashMap<String, Object> params) { // Validate the input parameters and input column schemavalidateInput(input, params); int[] inIds = input.getIntColumn(0); String[] inValues = input.getStringColumn(1); introwCount = inValues.length; StringregexExpr = (String)params.get("regexExpr"); expr = Pattern.compile(regexExpr); System.out.println("regex expression: " + regexExpr); // Lists to store the output dataLinkedList<Integer> outIds = newLinkedList<Integer>(); LinkedList<String> outValues = newLinkedList<String>(); // Evaluate each rowfor(inti = 0; i < rowCount; i++) { if (check(inValues[i])) { outIds.add(inIds[i]); outValues.add(inValues[i]); } } intoutputRowCount = outValues.size(); int[] idOutputCol = newint[outputRowCount]; String[] valueOutputCol = newString[outputRowCount]; // Convert the list of output columns to arraysoutValues.toArray(valueOutputCol); ListIterator<Integer> it = outIds.listIterator(0); introwId = 0; System.out.println("Output data:"); while (it.hasNext()) { idOutputCol[rowId] = it.next().intValue(); System.out.println("ID: " + idOutputCol[rowId] + " Value: " + valueOutputCol[rowId]); rowId++; } // Construct the output datasetPrimitiveDatasetoutput = newPrimitiveDataset(); output.addColumnMetadata(0, "ID", java.sql.Types.INTEGER, 0, 0); output.addColumnMetadata(1, "Text", java.sql.Types.NVARCHAR, 0, 0); output.addIntColumn(0, idOutputCol, null); output.addStringColumn(1, valueOutputCol); returnoutput; } privatevoidvalidateInput(PrimitiveDatasetinput, LinkedHashMap<String, Object> params) { // Check for the regex expression input parameterif (params.get("regexExpr") == null) { thrownewIllegalArgumentException("Input parameter 'regexExpr' is not found"); } // The expected input schema should be at least 2 columns, (INTEGER, STRING)if (input.getColumnCount() < 2) { thrownewIllegalArgumentException("Unexpected input schema, schema should be an (INTEGER, NVARCHAR or VARCHAR)"); } // Check that the input column types are expectedif (input.getColumnType(0) != java.sql.Types.INTEGER && (input.getColumnType(1) != java.sql.Types.VARCHAR && input.getColumnType(1) == java.sql.Types.NVARCHAR )) { thrownewIllegalArgumentException("Unexpected input schema, schema should be an (INTEGER, NVARCHAR or VARCHAR)"); } } privatebooleancheck(Stringtext) { Matcherm = expr.matcher(text); returnm.find(); } }
Package your classes and dependencies into a .jar
files. Most Java IDEs (for example, Eclipse or IntelliJ) support generating .jar
files when you build or compile the project. Name the .jar
file regex.jar
.
If you aren't using a Java IDE, you can manually create a .jar
file. For more information, see Create a Java .jar file from class files.
Note
This tutorial uses packages. The package pkg;
line at the top of the class ensures that the compiled code is saved in a sub folder called pkg
. If you use an IDE, the compiled code is automatically saved in this folder. If you use javac
to manually compile the classes, you need to place the compiled code in the pkg
folder.
You need to create an external language in the database. The external language is a database scoped object, which means that external languages like Java need to be created for each database you want to use it in.
If you're using Windows, follow these steps to create an external language for Java.
Create a .zip file containing the extension.
As part of the SQL Server setup on Windows, the Java extension
.zip
file is installed in this location:[SQL Server install path]\MSSQL\Binn\java-lang-extension.zip
. This zip file contains thejavaextension.dll
.Create an external language Java from the .zip file:
CREATE EXTERNAL LANGUAGE Java FROM (CONTENT = N'[SQL Server install path]\MSSQL\Binn\java-lang-extension.zip', FILE_NAME ='javaextension.dll', ENVIRONMENT_VARIABLES = N'{"JRE_HOME":"<path to JRE>"}' ); GO
As part of setup, the extension .tar.gz
file is saved under the following path: /opt/mssql-extensibility/lib/java-lang-extension.tar.gz
.
To create an external language Java, run the following T-SQL statement on Linux:
CREATE EXTERNAL LANGUAGE Java FROM (CONTENT = N'/opt/mssql-extensibility/lib/java-lang-extension.tar.gz', file_name ='javaextension.so', ENVIRONMENT_VARIABLES = N'{"JRE_HOME":"<path to JRE>"}' ); GO
To execute Java code, a user needs to be granted external script execution on that specific language.
For more information, see CREATE EXTERNAL LANGUAGE.
Use CREATE EXTERNAL LIBRARY to create an external library for your .jar
files. SQL Server has access to the .jar
files and you don't need to set any special permissions to the classpath
.
In this sample, you create two external libraries. One for the SDK and one for the RegEx Java code.
The SDK jar file
mssql-java-lang-extension.jar
is installed as part of [!INCLUDE sssql19-md] and later versions, on both Windows and Linux.Default installation path on Windows:
<instance installation home directory>\MSSQL\Binn\mssql-java-lang-extension.jar
Default installation path on Linux:
/opt/mssql/lib/mssql-java-lang-extension.jar
The code is also open sourced and can be found on the SQL Server Language Extensions GitHub repository. For more information, see Microsoft Extensibility SDK for Java for SQL Server.
Create an external library for the SDK.
CREATE EXTERNAL LIBRARY sdk FROM (CONTENT ='<OS specific path from above>/mssql-java-lang-extension.jar') WITH (LANGUAGE ='Java'); GO
Create an external library for the RegEx code.
CREATE EXTERNAL LIBRARY regex FROM (CONTENT ='<path>/regex.jar') WITH (LANGUAGE ='Java'); GO
Note
Skip this step, if you use external libraries in the previous step. The recommended way is to create an external library from your .jar
file.
If you don't want to use external libraries, you need to set the necessary permissions. Script execution only succeeds if the process identities have access to your code. You can find more information about setting permissions in the installation guide.
Grant read/execute permissions on the classpath to the mssql_satellite
user.
Grant 'Read and Execute' permissions to SQLRUserGroup and the All application packages SID on the folder containing your compiled Java code.
The entire tree must have permissions, from root parent to the last sub folder.
- Right-click the folder (for example,
C:\myJavaCode
) and choose Properties > Security. - Select Edit.
- Select Add.
- In Select Users, Computer, Service Accounts, or Groups:
- Select Object Types and make sure Built-in security principles and Groups are selected.
- Select Locations to select the local computer name at the top of the list.
- Enter SQLRUserGroup, check the name, and select OK to add the group.
- Enter ALL APPLICATION PACKAGES, check the name, and select OK to add. If the name doesn't resolve, revisit the Locations step. The SID is local to your machine.
Make sure both security identities have Read and Execute permissions on the folder and the pkg
sub folder.
Create a stored procedure that calls sp_execute_external_script
to call the Java code from SQL Server. In the script
parameter, define which package.class
you want to call. In the following code, the class belongs to a package called pkg
and a class file called RegexSample.java
.
Note
The code isn't defining which method to call. By default, the execute
method will be called. This means that you need to follow the SDK interface and implement an execute method in your Java class, if you want to be able to call the class from SQL Server.
The stored procedure takes an input query (input dataset) and a regular expression and returns the rows that fulfilled the given regular expression. It uses a regular expression [Jj]ava
that checks if a text contains the word Java
or java
.
CREATE OR ALTER PROCEDURE [dbo].[java_regex] @expr NVARCHAR(200), @query NVARCHAR(400) ASBEGIN--Call the Java program by giving the package.className in @script--The method invoked in the Java code is always the "execute" method EXEC sp_execute_external_script @language = N'Java', @script = N'pkg.RegexSample', @input_data_1 = @query, @params = N'@regexExpr nvarchar(200)', @regexExpr = @expr WITH result sets(( ID INT, TEXT NVARCHAR(100) )); END GO --Now execute the above stored procedure and provide the regular expression and an input query EXECUTE [dbo].[java_regex] N'[Jj]ava', N'SELECT id, text FROM testdata' GO
After executing the call, you should get a result set with two of the rows.
:::image type="content" source="../media/java/java-sample-results.png" alt-text="Screenshot of Results from Java sample.":::
When you compile your classes, the
pkg
sub folder should contain the compiled code for all three classes.If you aren't using external libraries, check permissions on each folder, from the
root
topkg
sub folder, to ensure that the security identities running the external process have permission to read and execute your code.