Contact Us

Optimize your sql query for best search result

Database Processes

Search Optimization and query optimization in SQL Server
Platform

MS SQL Server

Background

When developing solutions that require querying large amount of data, you may want to consider user Stored Procedures. When accessing multiple "query patterns", you want to program your system to use the most efficient algorithm to return your data to the user. While using dynamic SQL may work for small databases, a preferred method would be to use a Stored Procedure to optimize performance.

Call us for Free Consultation for Remote DBA services at our SQL Consulting Firm at: 732-536-4765

The Solution

We recently implemented this technique for a client who is planning to target a large national audience. His database will grow form a few hundred thousand records to potentially millions in rather short order. We had to create a search engine for this business to deal with both his current query needs but keep in mind the growth potential.
See how the search engine was implemented via Alpha Five Software Development and within SQL database.

The Search Concept

The idea of this search technique is to create a search to accommodate all of the possible scenarios that might be based on some or any of the following fields: FirstName, LastName, Address, City, State, Zip,Phone, Email, SerialNumber, PO and UserID. Some of the fields are from related tables (Address data, PO, SerialNumber, etc.)

We achieve search optimization by passing parameters into a stored procedure.

MS SQL and Alpha Five - a partnership for high performance for large database searches. Using ms sql stored procedure to optimize the performance for quick search results, while using Alpha Five as the front end for the software development. Please see below video:

Caching Query Plans

Stored procedure compile the first time they are run, and the procedure is then cached and re-used by subsequent queries. This caching is one of the primary benefits of using stored procedures. The SQL server does not need to keep recreating an execution plan - it simply re-uses the cached plan with new parameters submitted to the query.
To ensure optimization we also recommended running the Update Statistic process, at least once a week. This process helps the server to make the best decisions about which indexes to use when a query is processed - keeping up to date the key values in indexes and columns. It should be run manually whenever a large amount of data has been modified or added, changed, or deleted (such as batch imports, global price changes, etc.). It should also be run if there are many DML operations.

Note: (DML is abbreviation of Data Manipulation Language. These are command used to retrieve, store, modify, delete, insert and update data in database. For instance, SELECT, UPDATE and INSERT statements.

It may also be advisable to compress the data in order to increase performance. You may want to read this article: Data compression in large tables in MS SQL server.

As mentioned earlier, when you run any query the very first time, SQL Server builds a query plan and stores the plan in the sys table. The next time that specific query is run, the plan is re-used. If the plan is not used for an extended period of time it is automatically cleared from the cache and the plan is recreated when needed.

How Dynamic Query works

Your application may also build a query plan from a dynamic query as shown below:

EXEC Sp_ExecuteSql 
    N'SELECT  
        CheckNumber,
        CheckDate,
        CheckAmount
   FROM
        Checks
   WHERE CheckNumber = @CHECKNO', N'@CHECKNO AS INT',14550

In the above case MS SQL server may Auto Parameterized the query to improve the performance of the Dynamic SQL. SELECT * FROM tb WHERE col=@1. The @1 is basically a place holder for the parameter values like @1. This way the same query plan could be used. Dynamic SQL could not be complied as static sql. In a more complicated dynamic sql it would be more complicated to trouble shoot.

If you don't parameterized your dynamic sql then each time a new CHECKNO is submitted a new query plan is rebuilt by the SQL server and thus may be slower than a comparable stored procedure.

Dynamic SQL can also potentially introduce security vulnerabilities that you do not want. Hackers may try to inject data that causes the sql script to execute statements that you did not intend.

Note: I had a situation that one day I got more than a 100 emails via my web site contact form while someone tried to inject some values into the sql script. I was fortunate that my data was not exposed since I was not using any dynamic SQL for that contact form, but they always try.

Check our Custom Software Development Services.

Why static SQL is better optimized for searches

See above explanation of Caching Query Plans. I could easily make decisions based on parameters value which query to execute. See this topic about Decision making based on parameters values for more details. The below SQL statements are basically optimized based on any search possibility that may come up. The idea is to keep track of the primary key or each record that is searched and insert that into a table to store temporarily the result set. In this scenario we store the CustomerID into a table named SearchResults as shown below:

CREATE TABLE [dbo].[SearchResults2](
    [ID] [int] IDENTITY(1,1) NOT NULL,
    [UserID] [varchar](40) NULL,
    [ResultID] [int] NULL,
    [SearchType] [varchar](50) NULL
)

In addition, I also created a clustered index on the ResultID field. Since after the first SQL run we need to run another sql statement and JOIN it into the Customer table as follows:

SELECT 
    FirstName ,
    MiddleName,
    LastName,
    Gender
FROM
    Customer
        INNER JOIN SearchResults
    ON Customer.ID = SearchResults.ResultID

Based on the above we show the actual customer record and all details per customers including multiple addresses, contact points, purchase order and much more. If the search is too broad then a large number of CustomeIDs will be inserted into the SearchResults table.

If the search is done based on the PO or a Serial Number which is unique and also related to one customer than only one record is found. The same idea works for Serial Number as well. The idea is to refine the search based on the parameters that are passed into the specific table and no need to add additional over head when adding an addition JOIN condition.

In each search we provide the approximate capability and if you don't know how the Last Name is spelled properly then using the wild card % on the right field will provide that capability in most cases. See my article as String manipulation on large data using LIKE operator or PATINDEX.

The UserID is also passed per each search so for every new search we delete the last search result for the same user. See below code for the actual search to get the idea.

Decision making based on parameters values

By studying the code below you can see that this routine only runs the simplest sql needed by checking for parameters that are not empty. Based on the values of the parameter the right sql statement is running to providing maximum performance.

If you search only for FirstName and LastName then the sql statement works only the Customer's table. The IF statement will make sure that only one simple is sql statement is executed as shown below:

IF @Address IS NULL AND @City IS NULL AND @State IS NULL AND @Zip IS NULL
BEGIN
        INSERT INTO SearchResults
        SELECT
            UserID = @UserID ,  
            Cu.ID,
            'Customer' AS SearchType
        FROM 
            Customer Cu 
        WHERE 
            (Cu.FirstName LIKE @FirstName + '%' OR @FirstName IS NULL) AND 
            (Cu.LastName LIKE  @LastName  + '%' OR @LastName IS NULL)   
        RETURN      
    END

I was using the same strategy for much more complicated search. The difference is per search scenario I was creating a separate stored procedure while the main stored procedure made the decisions of which one to execute.

When is the time to de-normalize the data

If you scroll down below there is only one select statement that is running on two tables and it is the last one. You can allow yourself to JOIN another table if you have maybe several millions of records and also depends on the server hardware configuration.
If you deal with hundreds Gigabytes or Terabytes of data you will want to combine the address into the First Name and Last Name fields and this way no JOIN condition is required. Working with large scale of data requires a complete different strategy. You may check some of the articles I have written on how to manipulate big data in sql starting with Processing hundreds of millions records got much easier.

Keeping track on each sql search

Your application may also require you to collect usage data - we store all the parameter values and timestamp the query. This data can help provide statistics identifying the most popular searches and when those queries are most often run.

INSERT INTO SearchParams(
            FirstName,CustomerID,LastName,[Address],City,[State],Zip,Phone,Email,SerialNumber,PO,UserID)
      VALUES (
            @FirstName,@CustomerID,@LastName,@Address,@City,@State,@Zip,@Phone,@Email,@SerialNumber,@PO,@UserID)

Limit the top 100 records search:

If you have many records, than you may want to limit the search for the first X number of records as follows.

SELECT TOP(100)    
    UserID = @UserID ,  
    Ad.CustomerID,
    'Customer' AS SearchType
FROM 
    [Address] Ad
WHERE 
    (Ad.StreetNumber = @StreetNumber OR @StreetNumber IS NULL) AND 
    (Ad.StreetName LIKE @StreetName + '%' OR @StreetName IS NULL) AND
    (Ad.City LIKE @City + '%' OR @City IS NULL) AND
    (Ad.StateProvince = @State OR @State IS NULL) AND
(Ad.PostalCode = @Zip OR @Zip IS NULL)

Parameter sniffing allows SQL to compile a plan that is tailored to the type of parameter that is actually passed into the stored procedure. One way to resolve that is to use a query hint as shown above. We basically use the OPTION(RECOMPILE) query hint to solve the parameter sniffing problems. This one is designed to give you the control at the actual query level and no need to recompile the entire stored procedure. This can happen when the stored procedure re-uses the previously cached plan but may not be optimal for the current input parameter values.

Another way to fix this problem is to convert the parameters to local variables and use them at the qeury level.

DECLARE @MyStreetName AS VARCHAR(50),
@MyStreetNumber AS VARCHAR(20)

SET @MyStreetNumber = @StreetNumber 
SET @MyStreetName =  @StreetName

SELECT
    UserID = @UserID ,  
    Ad.CustomerID,
    'Customer' AS SearchType
FROM 
    [Address] Ad
WHERE 
    (Ad.StreetNumber = @MyStreetNumber OR @MyStreetNumber IS NULL) AND 
    (Ad.StreetName LIKE @MyStreetName + '%' OR @MyStreetName IS NULL) 
Parameter sniffing should be addresed only if you see some slow down in performnace with certain parameters values. One problem that I noticed that may cause this issue is when in some searches you will get up to 100 records and another search you may retrieve 10,000. Obiously I would limit the max search into the TOP 100 to be on the safe side.

How do we split the Address into Street Number and Street Name
The same concept works when we search for Addresses. We split the address on the fly for Street Number and Street Name and search on these parameters only within the Address table. We also created 2 additional fields named: StreetNumber and StreetName to support this kind of search. I wrote a whole article that explains how that code works and why we split the address which is found here: Splitting address into street number and street name in sql.

In the address split code we added additional code to check if only a Street Number entered to be searched on. In reality some of the searches might enforce validation to require a street number. In the implementation for our client we allowed the user to enter just a street number.

See below the sql script to perform an optimized search:
-- =============================================
-- Author:      Doron Farber
-- Create date: February 15,2013
-- Description: Search for customers

-- Calling Sample:
-- EXEC SearchEngine 
--      @FirstName = NULL,
--      @LastName = NULL ,
--      @Address = NULL,
--      @City = NULL ,
--      @State = NULL ,
--      @Zip  = NULL ,
--      @Phone = NULL ,
--      @Email = NULL ,
--      @SerialNumber = NULL ,
--      @PO = NULL ,
--      @UserID = NULL,
--      @CustomerID = NULL 

-- Search for the following fields:
-- FirstName, LastName, StreetNumber,StreetName,City,State,Zip,Phone,Email,SerialNumber,PO,UserID

-- =============================================
ALTER PROCEDURE [dbo].[SearchEngine] 
    @FirstName AS VARCHAR(20)= NULL,
    @LastName AS VARCHAR(30)= NULL  ,
    @Address AS VARCHAR(50)= NULL ,
    @City AS VARCHAR(50) = NULL ,
    @State AS CHAR(2) = NULL ,
    @Zip AS VARCHAR(10) = NULL ,
    @Phone AS VARCHAR(14) = NULL ,
    @Email AS VARCHAR(100) = NULL ,
    @SerialNumber AS VARCHAR(50) = NULL ,
    @PO AS VARCHAR(50) = NULL ,
    @UserID AS VARCHAR(40) = NULL,
    @CustomerID AS INT = NULL 
AS
BEGIN
    SET NOCOUNT ON;
    
    DECLARE @StreetNumber AS VARCHAR(20) = NULL,
        @StreetName AS VARCHAR(50) = NULL

    ---------------------------------------------------------------------------------
    ------ Make sure all parameters are NULL if empty                             ---
    ---------------------------------------------------------------------------------
    SET @FirstName = NULLIF(@FirstName,'')
    SET @LastName =  NULLIF(@LastName,'')
    SET @Address  =  NULLIF(@Address,'')    
    SET @City =   NULLIF(@City,'')
    SET @State=   NULLIF(@State,'')
    SET @Zip =   NULLIF(@Zip,'')
    SET @Phone =   NULLIF(@Phone,'')
    SET @Email =   NULLIF(@Email,'')
    SET @SerialNumber =   NULLIF(@SerialNumber,'')
    SET @PO = NULLIF(@PO,'')
    SET @UserID = NULLIF(@UserID,'')
    SET @CustomerID = NULLIF(@CustomerID,0)

    --- Remove the last search results for the current user ---
    IF @UserID IS NOT NULL
        BEGIN
            DELETE 
                FROM 
            SearchResults 
            WHERE UserID = @UserID
        END
    
    ---------------------------------------------------------------------------------
    -- Track down any parameter sent. Date and time also in entered for the       ---
    -- CreateDate field which gets the GETDATE() function as default.             ---
    ---------------------------------------------------------------------------------
    INSERT INTO SearchParams(
            FirstName,CustomerID,LastName,[Address],City,[State],Zip,Phone,Email,SerialNumber,PO,UserID)
      VALUES (
            @FirstName,@CustomerID,@LastName,@Address,@City,@State,@Zip,@Phone,@Email,@SerialNumber,@PO,@UserID)    

    ---------------------------------------------------------------------------------
    --- Grab CustomerID based on CustomerID                                               ---
    ---------------------------------------------------------------------------------
    IF @CustomerID IS NOT NULL
        BEGIN
            INSERT INTO SearchResults
            SELECT
                UserID = @UserID , 
                ID,
                'Customer' AS SearchType 
            FROM 
                Customer
            WHERE ID = @CustomerID
        
            SELECT @@ROWCOUNT AS TotalRecs
            RETURN
        END 
    ---------------------------------------------------------------------------------   
    --- Grab CustomerID based on PO                                               ---
    ---------------------------------------------------------------------------------
    IF @PO IS NOT NULL
        BEGIN
            INSERT INTO SearchResults
            SELECT
                UserID = @UserID , 
                CustomerID,
                'Customer' AS SearchType 
            FROM 
                ServiceRequest  
            WHERE PO = @PO  
        
            SELECT @@ROWCOUNT AS TotalRecs
            RETURN
        END
    
    ---------------------------------------------------------------------------------
    --- Grab SerialNumber based on SerialNumber                                              ---    
    ---------------------------------------------------------------------------------
    IF @SerialNumber IS NOT NULL
        BEGIN
            INSERT INTO SearchResults
            SELECT
                UserID = @UserID ,  
                CustomerID,
                'Customer' AS SearchType 
            FROM 
                ServiceRequest  
            WHERE SerialNumber = @SerialNumber  
        
            SELECT @@ROWCOUNT AS TotalRecs
            RETURN
        END 
        
    ---------------------------------------------------------------------------------
    --- When ContactPointsTypesID is: 
    --- Phone = 1 , Email = 4 
    ---------------------------------------------------------------------------------
    IF @Phone IS NOT NULL
        BEGIN
            INSERT INTO SearchResults
            SELECT 
                UserID = @UserID ,  
                CustomerID,
                'Customer' AS SearchType 
            FROM
                CustContactPoint    
            WHERE ContactPoint = @Phone AND ContactPointTypeID = 1  
        
            SELECT @@ROWCOUNT AS TotalRecs
            RETURN
        END 
    
    ---------------------------------------------------------------------------------
    --- Email is a uniqe record so it is enough to find the CustomerID            ---
    --------------------------------------------------------------------------------- 
    IF @Email IS NOT NULL
        BEGIN
            INSERT INTO SearchResults
            SELECT 
                UserID = @UserID ,  
                CustomerID,
                'Customer' AS SearchType 
            FROM
                CustContactPoint    
            WHERE ContactPoint = @Email AND ContactPointTypeID = 4  
        
            SELECT @@ROWCOUNT AS TotalRecs
            RETURN
        END 
        
        ---------------------------------------------------------------------------------   
        --- Check only for the FirstName and LastName only.                           ---
        ---------------------------------------------------------------------------------
        IF @Address IS NULL AND @City IS NULL AND @State IS NULL AND @Zip IS NULL
            BEGIN
                INSERT INTO SearchResults
                SELECT TOP 100
                    UserID = @UserID ,  
                    Cu.ID,
                    'Customer' AS SearchType
                FROM 
                    Customer Cu 
                WHERE 
                    (Cu.FirstName LIKE @FirstName + '%' OR @FirstName IS NULL) AND 
                    (Cu.LastName LIKE  @LastName  + '%' OR @LastName IS NULL)   
                    
                SELECT @@ROWCOUNT AS TotalRecs
                RETURN      
            END
        
        ---------------------------------------------------------------------------------
        --- Split the Address to StreetNumber and StreetName                          ---
        ---------------------------------------------------------------------------------
        IF @Address IS NOT NULL
            BEGIN
                ---------------------------------------------------------------------------------
                --- Check if only StreetNumber has entered in the Address parameter           ---
                ---------------------------------------------------------------------------------
                IF ISNUMERIC(RTRIM(LTRIM(@Address))) = 1
                    BEGIN
                        SET @StreetNumber = RTRIM(LTRIM(@Address))
                    END
                ELSE
                    BEGIN
                        --- Split the Address into StreetNumber and StreetName ---
                        SELECT 
                            @StreetNumber = CASE WHEN  ISNUMERIC(LEFT(LTRIM(@Address), CHARINDEX(' ', LTRIM(@Address)))) = 1 THEN LEFT(LTRIM(@Address), 
                            CHARINDEX(' ', LTRIM(@Address)))
                            ELSE NULL END,
                        @StreetName = CASE WHEN ISNUMERIC(LEFT(LTRIM(@Address), CHARINDEX(' ', LTRIM(@Address)))) = 1 
                            THEN LTRIM(RTRIM(SUBSTRING(LTRIM(@Address),CHARINDEX(' ', LTRIM(@Address)) + 1, LEN(LTRIM(@Address)))))
                            ELSE LTRIM(RTRIM(@Address)) END
                    END
            END  --> IF @Address IS NOT NULL     
        
            ---------------------------------------------------------------------------------
            --- In case only the Address is passed then search only within the Address    ---
            --- table.                                                                    ---
            ---------------------------------------------------------------------------------   
        IF @City IS NULL AND @State IS NULL AND @Zip IS NULL AND @FirstName IS NULL AND @LastName IS NULL      
            BEGIN
                INSERT INTO SearchResults
                SELECT TOP 100
                    UserID = @UserID ,  
                    Ad.CustomerID,
                    'Customer' AS SearchType
                FROM 
                    [Address] Ad
                WHERE 
                    (Ad.StreetNumber = @StreetNumber OR @StreetNumber IS NULL) AND 
                    (Ad.StreetName LIKE @StreetName + '%' OR @StreetName IS NULL) 
                    
                    SELECT @@ROWCOUNT AS TotalRecs              
                    RETURN
                END
                
        ---------------------------------------------------------------------------------
        --- In case the Address is passed, City, StateProvince or Zip. No FirstName   ---
        --- or LastName will be handled.                                              ---
        ---------------------------------------------------------------------------------   
        IF @FirstName IS NULL AND @LastName IS NULL   
            BEGIN
                INSERT INTO SearchResults
                SELECT TOP 100
                    UserID = @UserID ,  
                    Ad.CustomerID,
                    'Customer' AS SearchType
                FROM 
                    [Address] Ad
                WHERE 
                    (Ad.StreetNumber = @StreetNumber OR @StreetNumber IS NULL) AND 
                    (Ad.StreetName LIKE @StreetName + '%' OR @StreetName IS NULL) AND
                    (Ad.City LIKE @City + '%' OR @City IS NULL) AND
                    (Ad.StateProvince = @State OR @State IS NULL) AND
                    (Ad.PostalCode = @Zip OR @Zip IS NULL)      
                    
                    SELECT @@ROWCOUNT AS TotalRecs
                    RETURN
                END
        
        ---------------------------------------------------------------------------------
        ---  In a small database with millions of records we can create a JOIN        ---
        ---  condition between the 2 tables. But with large scale of data such as:    ---
        ---  with hundereds of GB we need to denormilize the data, so the JOIN        --- 
        ---  condition is removed.                                                    ---
        ---------------------------------------------------------------------------------
        INSERT INTO SearchResults
        SELECT TOP 100
            UserID = @UserID ,  
            Cu.ID,
            'Customer' AS SearchType 
        FROM 
            Customer Cu 
                INNER JOIN [Address] Ad
            ON Cu.ID = Ad.CustomerID        
        WHERE 
            (Cu.FirstName LIKE @FirstName + '%' OR @FirstName IS NULL) AND 
            (Cu.LastName LIKE @LastName + '%'  OR @LastName IS NULL) AND        
            (Ad.StreetNumber = @StreetNumber OR @StreetNumber IS NULL) AND 
            (Ad.StreetName LIKE @StreetName + '%' OR @StreetName IS NULL) AND
            (Ad.City LIKE @City + '%' OR @City IS NULL) AND
            (Ad.StateProvince = @State OR @State IS NULL) AND
            (Ad.PostalCode = @Zip OR @Zip IS NULL)      
            
            SELECT @@ROWCOUNT AS TotalRecs
       ---------------------------------------------------------------------------------            
END

Call us for Free Consultation for Remote DBA services at our SQL Consulting Firm at: 732-536-4765

Go Back to Our: Main Blog

 

Written by: Doron Farber - The Farber Consulting Group, Inc

There are 0 comments for "Optimize your sql query for best search result"

Add your comment