Use PATINDEX to match data from another table
One of my favorite SQL Server functions is PATINDEX. It can be used to get the position of pattern in an expression or a table column.
As per the official docs, the most basic usage is as follows.
SELECT PATINDEX('%er%', 'interesting data');
I have used
PATINDEX similarly to find whether a pattern exists in an expression or to extract a pattern out of an expression.
Say, I have a column of data which contains PID (process ID) information and may contain the word PID more than once. I need to get the PID 88920, which is the real PID, in my contrived case.
DECLARE @UnstructuredText VARCHAR(100) SET @UnstructuredText='PIDXWWEPID88920;-LSKPIDD199922;PID9911AKKAA'
To check the existence of the pattern, we can use
IIF is a recent addition, so if you are using SQL Server 2012 and below, you can use a
CASE WHEN END to do the same thing.
SELECT IIF(PATINDEX('%PID[0-9;]%', @UnstructuredText) > 0, 'yes', 'no')
which would return
To extract the PID, we could employ
SUBSTRING along with
DECLARE @UnstructuredText VARCHAR(100) SET @UnstructuredText='PIDXWWEPID88920;-LSKPIDD199922;PID9911AKKAA' DECLARE @Start INT DECLARE @End INT SET @Start = PATINDEX('%PID[0-9;]%', @UnstructuredText) SET @End = CHARINDEX(';', @UnstructuredText, @start) SELECT REPLACE(SUBSTRING(@UnstructuredText, @Start, @End-@Start),'PID','') AS REALPID
This returns the real PID.
REALPID ------ 88920
Recently, I came across a scenario where I would need to match a set of strings from one table to another. I gave PATINDEX a try and amazed that this could be done and that I never even thought about this.
Say, I have a table,
CompaniesByInvestmentType, that holds a list of types of investments. This data is derived from an XML file and is a bit unstructured.
CREATE TABLE CompaniesByInvestmentType (InvestmentType VARCHAR(MAX)) INSERT INTO CompaniesByInvestmentType VALUES ('Funds - ABC Insurance and Funds Co'), ('Bonds - XYX Financial Holdings'), ('Alternative - GLS Investments'), ('Stocks - ABA Stocks and Insurances'), ('Real estate - Bluth Company'), ('Real estate - AAA Realtors'), ('Stocks - B2 Home Loan and Stock Management'), ('Real estate - Pioneer Housing Projects'), ('Funds - BAK Stocks and Funds'), ('Stocks - Newstock Management'), ('Real estate - BBP Classic Realty'), ('Funds - Greenrock Infra and Mutual Funds'), ('Alternative - MS Options and Futures') SELECT InvestmentType from CompaniesByInvestmentType
InvestmentType ------------------------ Funds - ABC Insurance and Funds Co Bonds - XYX Financial Holdings Alternative - GLS Investments Stocks - ABA Stocks and Insurances Real estate - Bluth Company Real estate - AAA Realtors Stocks - B2 Home Loan and Stock Management Real estate - Pioneer Housing Projects Funds - BAK Stocks and Funds Stocks - Newstock Management Real estate - BBP Classic Realty Funds - Greenrock Infra and Mutual Funds Alternative - MS Options and Futures (13 row(s) affected)
I have another table,
AllowedInvestmentTypes, that has the allowed investment types.
CREATE TABLE AllowedInvestmentTypes (AllowedType VARCHAR(50)) INSERT INTO AllowedInvestmentTypes VALUES ('Funds'), ('Alternative') SELECT AllowedType FROM AllowedInvestmentTypes
AllowedType ------ Funds Alternative (2 row(s) affected)
To get all the companies that fall under these two types, we can join both the tables on the condition that we match the start of the type and that the match ends with a hyphen.
SELECT InvestmentType AS AllowedInvestmentType FROM CompaniesByInvestmentType C INNER JOIN AllowedInvestmentTypes AIT ON PATINDEX(AIT.AllowedType + '-%', REPLACE(C.InvestmentType, " ", "") = 1
This query will fetch us only
Alternative investment types.
AllowedInvestmentType -------------------- Funds - ABC Insurance and Funds Co Alternative - GLS Investments Funds - BAK Stocks and Funds Funds - Greenrock Infra and Mutual Funds Alternative - MS Options and Futures