Wednesday, 7 September 2011

SQL Tips & Tricks


There are many advantages of Stored Procedures. I was once asked what do I think is the most important feature of Stored Procedure? I have to pick only ONE. It is tough question.
I answered : Execution Plan Retention and Reuse (SP are compiled and their execution plan is cached and used again to when the same SP is executed again)
Not to mentioned I received the second question following my answer : Why? Because all the other advantage known (they are mentioned below) of SP can be achieved without using SP. Though Execution Plan Retention and Reuse can only be achieved using Stored Procedure only.
  • Execution plan retention and reuse
  • Query auto-parameterization
  • Encapsulation of business rules and policies
  • Application modularization
  • Sharing of application logic between applications
  • Access to database objects that is both secure and uniform
  • Consistent, safe data modification
  • Network bandwidth conservation
  • Support for automatic execution at system start-up
  • Enhanced hardware and software capabilities
  • Improved security
  • Reduced development cost and increased reliability
  • Centralized security, administration, and maintenance for common routines

Function and Sp difference

1)      A FUNCTION is always returns a value using the return statement. A  PROCEDURE may return one or more values through parameters or may not return at all.
2)      Functions are normally used for computations where as procedures are normally used for executing business logic.
3)      A Function returns 1 value only. Procedure can return multiple values (max 1024).
4)      Stored procedure returns always integer value by default zero. Whereas function returns type could be scalar or table or table values
5)      Stored procedure is precompiled execution plan where as functions are not.
6)      A function can call directly by SQL statement like select func_name from dual while procedure cannot.
7)      Stored procedure has the security and reduces the network traffic and also we can call stored procedure in any no. Of applications at a time.
8)      A Function can be used in the SQL Queries while a procedure cannot be used in SQL queries .that cause a major difference b/w function and procedures.
 
Restore Database From SQL Script
Delete  Duplicate Rows from Table

SQL SERVER – Delete Duplicate Records – Rows

Following code is useful to delete duplicate records. The table must have identity column, which will be used to identify the duplicate records. Table in example is has ID as Identity Column and Columns which have duplicate data are DuplicateColumn1, DuplicateColumn2 and DuplicateColumn3.
DELETE
FROM
MyTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn2)

SQL Optimization

Some techniques for SQL Optimization (MySQL and SQL Server) [Part 2]

SQL optimization may seem not useful when a programmer starts application development but when the software runs few months then it start to proof its necessity. Without SQL optimization software may slow down within few months and may be obsolete within a year. So SQL optimization is necessary.

Choose your data types carefully

Always try to avoid some data type such as `UNSIGNED`, ‘NULL’, ‘VARCHAR` etc. because it may take some time to process those data types. If anyone is sure that there will be no null value then he can easily declare that field as `NOT NULL` which will save time for null checking. In case of `VARCHAR`, it takes some time to calculate the text length and then save it, we can use `CHAR` instated of `VARCHAR` if space is not an issue.
It is very easy and cost effective to increase space rather then speed, so CPU speed up is more valuable then space saving.

Try to avoid automatic type conversion

This will only yield a tiny speed boost, but every little helps. If you declare a field as having an Age field of type INT, this query is in fact valid:
UPDATE user_table SET Age = “1″ WHERE userid=’ram’
The query states that `Age` should be set to the string value “1″, not the number 1. Here MySQL/SQL Server will recognize that `Age` should be an integer, and automatically convert the string “1″ to the integer 1 before updating the table. This type conversion is indeed very helpful, but as we already knew that `Age` was going to be an integer, we could have saved DB’s some pointless work and not used the quotes.

Few other things about SQL optimization

  • Always try to use index where lookup is necessary. Unnecessary indexing could slow down system performance.
  • If insert, update and delete are very common then too much indexing would hamper DB performance because each time any insert/update/delete operation it rearrange its indexes.
  • Always try to do join operation on index field.
  • Spot slow queries and then give attention on those

TempTable and TempVariable are stored in TempDB(not in memory)

exec sp_who
exec sp_who2
exec sp_helpdb  give databse size,date of creation,status
exec sp_lock

Impove Performance:
SQL Tuning or SQL Optimization
Sql Statements are used to retrieve data from the database. We can get same results by writing different sql queries. But use of the best query is important when performance is considered. So you need to sql query tuning based on the requirement. Here is the list of queries which we use reqularly and how these sql queries can be optimized for better performance.
SQL Tuning/SQL Optimization Techniques:
1) The sql query becomes faster if you use the actual columns names in SELECT statement instead of than ‘*’.
For Example: Write the query as
SELECT id, first_name, last_name, age, subject FROM student_details;
Instead of:
SELECT * FROM student_details;
2) HAVING clause is used to filter the rows after all the rows are selected. It is just like a filter. Do not use HAVING clause for any other purposes.
For Example: Write the query as

SELECT subject, count(subject)
FROM student_details
WHERE subject != 'Science'
AND subject != 'Maths'
GROUP BY subject;

Instead of:
SELECT subject, count(subject)
FROM student_details
GROUP BY subject
HAVING subject!= 'Vancouver' AND subject!= 'Toronto';

3) Sometimes you may have more than one subqueries in your main query. Try to minimize the number of subquery block in your query.
For Example: Write the query as

SELECT name
FROM employee
WHERE (salary, age ) = (SELECT MAX (salary), MAX (age)
FROM employee_details)
AND dept = 'Electronics';

Instead of:

SELECT name
FROM employee
WHERE salary = (SELECT MAX(salary) FROM employee_details)
AND age = (SELECT MAX(age) FROM employee_details)
AND emp_dept = 'Electronics';

4) Use operator EXISTS, IN and table joins appropriately in your query.
a) Usually IN has the slowest performance.
b) IN is efficient when most of the filter criteria is in the sub-query.
c) EXISTS is efficient when most of the filter criteria is in the main query.

For Example: Write the query as
Select * from product p
where EXISTS (select * from order_items o
where o.product_id = p.product_id)

Instead of:

Select * from product p
where product_id IN
(select product_id from order_items

5) Use EXISTS instead of DISTINCT when using joins which involves tables having one-to-many relationship.
For Example: Write the query as

SELECT d.dept_id, d.dept
FROM dept d
WHERE EXISTS ( SELECT 'X' FROM employee e WHERE e.dept = d.dept);

Instead of:
SELECT DISTINCT d.dept_id, d.dept
FROM dept d,employee e
WHERE e.dept = e.dept;

6) Try to use UNION ALL in place of UNION.
For Example: Write the query as

SELECT id, first_name
FROM student_details_class10
UNION ALL
SELECT id, first_name
FROM sports_team;

Instead of:
SELECT id, first_name, subject
FROM student_details_class10
UNION
SELECT id, first_name
FROM sports_team;

7) Be careful while using conditions in WHERE clause.
For Example: Write the query as

SELECT id, first_name, age FROM student_details WHERE age > 10;
Instead of:
SELECT id, first_name, age FROM student_details WHERE age != 10;
Write the query as

SELECT id, first_name, age
FROM student_details
WHERE first_name LIKE 'Chan%';

Instead of:
SELECT id, first_name, age
FROM student_details
WHERE SUBSTR(first_name,1,3) = 'Cha';

Write the query as
SELECT id, first_name, age
FROM student_details
WHERE first_name LIKE NVL ( :name, '%');

Instead of:
SELECT id, first_name, age
FROM student_details
WHERE first_name = NVL ( :name, first_name);

Write the query as
SELECT product_id, product_name
FROM product
WHERE unit_price BETWEEN MAX(unit_price) and MIN(unit_price)

Instead of:
SELECT product_id, product_name
FROM product
WHERE unit_price >= MAX(unit_price)
and unit_price <= MIN(unit_price)

Write the query as

SELECT id, name, salary
FROM employee
WHERE dept = 'Electronics'
AND location = 'Bangalore';

Instead of:
SELECT id, name, salary
FROM employee
WHERE dept || location= 'ElectronicsBangalore';

Use non-column expression on one side of the query because it will be processed earlier.
Write the query as

SELECT id, name, salary
FROM employee
WHERE salary < 25000;

Instead of:
SELECT id, name, salary
FROM employee
WHERE salary + 10000 < 35000;

Write the query as
SELECT id, first_name, age
FROM student_details
WHERE age > 10;

Instead of:
SELECT id, first_name, age
FROM student_details
WHERE age NOT = 10;

Use DECODE to avoid the scanning of same rows or joining the same table repetitively. DECODE can also be made used in place of GROUP BY or ORDER BY clause.
For Example: Write the query as

SELECT id FROM employee
WHERE name LIKE 'Ramesh%'
and location = 'Bangalore';

Instead of:
SELECT DECODE(location,'Bangalore',id,NULL) id FROM employee
WHERE name LIKE 'Ramesh%';

9) To store large binary objects, first place them in the file system and add the file path in the database.
10) To write queries which provide efficient performance follow the general SQL standard rules.
a) Use single case for all SQL verbs
b) Begin all SQL verbs on a new line
c) Separate all words with a single space
d) Right or left aligning verbs within the initial SQL verb


How to get database name?
Select DB_name

Query to Get List of Views?
select * from information_schema.views

Query to Get List of Table Valued Functions?
select * from Sys.Objects where Type='tf'


1)      What is normalization? - Well a relational database is basically composed of tables that contain related data. So the Process of organizing this data into tables is actually referred to as normalization.
2)     What is a Stored Procedure? - Its nothing but a set of T-SQL statements combined to perform a single task of several tasks. Its basically like a Macro so when you invoke the Stored procedure, you actually run a set of statements.
3)     Can you give an example of Stored Procedure? - sp_helpdb , sp_who2, sp_renamedb are a set of system defined stored procedures. We can also have user defined stored procedures which can be called in similar way.
4)     What is a trigger? - Triggers are basically used to implement business rules. Triggers is also similar to stored procedures. The difference is that it can be activated when data is added or edited or deleted from a table in a database.
5)     What is a view? - If we have several tables in a db and we want to view only specific columns from specific tables we can go for views. It would also suffice the needs of security some times allowing specfic users to see only specific columns based on the permission that we can configure on the view. Views also reduce the effort that is required for writing queries to access specific columns every time.
6)     What is an Index? - When queries are run against a db, an index on that db basically helps in the way the data is sorted to process the query for faster and data retrievals are much faster when we have an index.
7)     What are the types of indexes available with SQL Server? - There are basically two types of indexes that we use with the SQL Server. Clustered and the Non-Clustered.
8)     What is the basic difference between clustered and a non-clustered index? - The difference is that, Clustered index is unique for any given table and we can have only one clustered index on a table. The leaf level of a clustered index is the actual data and the data is resorted in case of clustered index. Whereas in case of non-clustered index the leaf level is actually a pointer to the data in rows so we can have as many non-clustered indexes as we can on the db.
9)     What are cursors? - Well cursors help us to do an operation on a set of data that we retreive by commands such as Select columns from table. For example : If we have duplicate records in a table we can remove it by declaring a cursor which would check the records during retreival one by one and remove rows which have duplicate values.
10)  When do we use the UPDATE_STATISTICS command? - This command is basically used when we do a large processing of data. If we do a large amount of deletions any modification or Bulk Copy into the tables, we need to basically update the indexes to take these changes into account. UPDATE_STATISTICS updates the indexes on these tables accordingly.
11)   Which TCP/IP port does SQL Server run on? - SQL Server runs on port 1433 but we can also change it for better security.
12)  From where can you change the default port? - From the Network Utility TCP/IP properties –> Port number.both on client and the server.
13)  Can you tell me the difference between DELETE & TRUNCATE commands? - Delete command removes the rows from a table based on the condition that we provide with a WHERE clause. Truncate will actually remove all the rows from a table and there will be no data in the table after we run the truncate command.
14)  Can we use Truncate command on a table which is referenced by FOREIGN KEY? - No. We cannot use Truncate command on a table with Foreign Key because of referential integrity.
15)  What is the use of DBCC commands? - DBCC stands for database consistency checker. We use these commands to check the consistency of the databases, i.e., maintenance, validation task and status checks.
16)  Can you give me some DBCC command options?(Database consistency check) - DBCC CHECKDB - Ensures that tables in the db and the indexes are correctly linked.and DBCC CHECKALLOC - To check that all pages in a db are correctly allocated. DBCC SQLPERF - It gives report on current usage of transaction log in percentage. DBCC CHECKFILEGROUP - Checks all tables file group for any damage.
17)  What command do we use to rename a db? - sp_renamedb ‘oldname’ , ‘newname’
18)  Well sometimes sp_reanmedb may not work you know because if some one is using the db it will not accept this command so what do you think you can do in such cases? - In such cases we can first bring to db to single user using sp_dboptions and then we can rename that db and then we can rerun the sp_dboptions command to remove the single user mode.
19)  What is the difference between a HAVING CLAUSE and a WHERE CLAUSE? - Having Clause is basically used only with the GROUP BY function in a query. WHERE Clause is applied to each row before they are part of the GROUP BY function in a query.
20) What do you mean by COLLATION? - Collation is basically the sort order. There are three types of sort order Dictionary case sensitive, Dictonary - case insensitive and Binary.
21)  What is a Join in SQL Server? - Join actually puts data from two or more tables into a single result set.
22) Can you explain the types of Joins that we can have with Sql Server? - There are three types of joins: Inner Join, Outer Join, Cross Join
23) When do you use SQL Profiler? - SQL Profiler utility allows us to basically track connections to the SQL Server and also determine activities such as which SQL Scripts are running, failed jobs etc..
24) What is a Linked Server? - Linked Servers is a concept in SQL Server by which we can add other SQL Server to a Group and query both the SQL Server dbs using T-SQL Statements.
25) Can you link only other SQL Servers or any database servers such as Oracle? - We can link any server provided we have the OLE-DB provider from Microsoft to allow a link. For Oracle we have a OLE-DB provider for oracle that microsoft provides to add it as a linked server to the sql server group.
26) Which stored procedure will you be running to add a linked server? - sp_addlinkedserver, sp_addlinkedsrvlogin
27) What are the OS services that the SQL Server installation adds? - MS SQL SERVER SERVICE, SQL AGENT SERVICE, DTC (Distribution transac co-ordinator)
28) Can you explain the role of each service? - SQL SERVER - is for running the databases SQL AGENT - is for automation such as Jobs, DB Maintanance, Backups DTC - Is for linking and connecting to other SQL Servers
29) How do you troubleshoot SQL Server if its running very slow? - First check the processor and memory usage to see that processor is not above 80% utilization and memory not above 40-45% utilization then check the disk utilization using Performance Monitor, Secondly, use SQL Profiler to check for the users and current SQL activities and jobs running which might be a problem. Third would be to run UPDATE_STATISTICS command to update the indexes
30) Lets say due to N/W or Security issues client is not able to connect to server or vice versa. How do you troubleshoot? - First I will look to ensure that port settings are proper on server and client Network utility for connections. ODBC is properly configured at client end for connection ——Makepipe & readpipe are utilities to check for connection. Makepipe is run on Server and readpipe on client to check for any connection issues.
31)  What are the authentication modes in SQL Server? - Windows mode and mixed mode (SQL & Windows).
32) Where do you think the users names and passwords will be stored in sql server? - They get stored in master db in the sysxlogins table.
33) What is log shipping? Can we do logshipping with SQL Server 7.0 - Logshipping is a new feature of SQL Server 2000. We should have two SQL Server - Enterprise Editions. From Enterprise Manager we can configure the logshipping. In logshipping the transactional log file from one server is automatically updated into the backup database on the other server. If one server fails, the other server will have the same db and we can use this as the DR (disaster recovery) plan.
34) Let us say the SQL Server crashed and you are rebuilding the databases including the master database what procedure to you follow? - For restoring the master db we have to stop the SQL Server first and then from command line we can type SQLSERVER –m which will basically bring it into the maintenance mode after which we can restore the master db.
35) Let us say master db itself has no backup. Now you have to rebuild the db so what kind of action do you take? - (I am not sure- but I think we have a command to do it).
36) What is BCP? When do we use it? - BulkCopy is a tool used to copy huge amount of data from tables and views. But it won’t copy the structures of the same.
37) What should we do to copy the tables, schema and views from one SQL Server to another? - We have to write some DTS packages for it.
38) What are the different types of joins and what dies each do?
39) What are the four main query statements?
40)What is a sub-query? When would you use one?
41)  What is a NOLOCK?
42) What are three SQL keywords used to change or set someone’s permissions?
43) What is the difference between HAVING clause and the WHERE clause?
44)What is referential integrity? What are the advantages of it?
45) What is database normalization?
46)Which command using Query Analyzer will give you the version of SQL server and operating system?
47) Using query analyzer, name 3 ways you can get an accurate count of the number of records in a table?
48)What is the purpose of using COLLATE in a query?
49)What is a trigger?
50) What is one of the first things you would do to increase performance of a query? For example, a boss tells you that “a query that ran yesterday took 30 seconds, but today it takes 6 minutes”
51)  What is an execution plan? When would you use it? How would you view the execution plan?
52) What is the STUFF function and how does it differ from the REPLACE function?
53) What does it mean to have quoted_identifier on? What are the implications of having it off?
54) What are the different types of replication? How are they used?
55) What is the difference between a local and a global variable?
56) What is the difference between a Local temporary table and a Global temporary table? How is each one used?
57) What are cursors? Name four types of cursors and when each one would be applied?
58) What is the purpose of UPDATE STATISTICS?
59) How do you use DBCC statements to monitor various aspects of a SQL server installation?
60)How do you load large data to the SQL server database?
61)  How do you check the performance of a query and how do you optimize it?
62) How do SQL server 2000 and XML linked? Can XML be used to access data?
63) What is SQL server agent?
64)What is referential integrity and how is it achieved?
65) What is indexing?
66)What is normalization and what are the different forms of normalizations?
67) Difference between server.transfer and server.execute method?
68)What id de-normalization and when do you do it?
69)What is better - 2nd Normal form or 3rd normal form? Why?
70) Can we rewrite subqueries into simple select statements or with joins? Example?
71)  What is a function? Give some example?
72) What is a stored procedure?
73) Difference between Function and Procedure-in general?
74) Difference between Function and Stored Procedure?
75) Can a stored procedure call another stored procedure. If yes what level and can it be controlled?
76) Can a stored procedure call itself(recursive). If yes what level and can it be controlled.?
77) How do you find the number of rows in a table?
78) Difference between Cluster and Non-cluster index?
79) What is a table called, if it does not have neither Cluster nor Non-cluster Index?
80)Explain DBMS, RDBMS?
81)  Explain basic SQL queries with SELECT from where Order By, Group By-Having?
82) Explain the basic concepts of SQL server architecture?
83) Explain couple pf features of SQL server
84)Scalability, Availability, Integration with internet, etc.)?
85) Explain fundamentals of Data ware housing & OLAP?
86)Explain the new features of SQL server 2000?
87) How do we upgrade from SQL Server 6.5 to 7.0 and 7.0 to 2000?
88)What is data integrity? Explain constraints?
89)Explain some DBCC commands?
90)Explain sp_configure commands, set commands?
91)  Explain what are db_options used for?
92) What is the basic functions for master, msdb, tempdb databases?
93) What is a job?
94)What are tasks?
95) What are primary keys and foreign keys?
96)How would you Update the rows which are divisible by 10, given a set of numbers in column?
97) If a stored procedure is taking a table data type, how it looks?
98)How m-m relationships are implemented?
99)How do you know which index a table is using?
100)                     How will oyu test the stored procedure taking two parameters namely first name and last name returning full name?
101)                       How do you find the error, how can you know the number of rows effected by last SQL statement?
102)                      How can you get @@error and @@rowcount at the same time?
103)                      What are sub-queries? Give example? In which case sub-queries are not feasible?
104)                     What are the type of joins? When do we use Outer and Self joins?
105)                      Which virtual table does a trigger use?
106)                     How do you measure the performance of a stored procedure?
107)                      Questions regarding Raiseerror?
108)                     Questions on identity?
109)                     If there is failure during updation of certain rows, what will be the state?
To get Index size, Row Count of table

Check object exists in database:
IF OBJECT_NAME(OBJECT_ID('tempdb..#Alert')) IS NOT NULL
    PRINT '#some_temp_name exists.'
ELSE
    PRINT '#some_temp_name does not exist.'


To set table identity to required value

DBCC CHECKIDENT('MailAuditTrail', RESEED, 24999)

To get Row in table,datasize, index size,…

EXEC sp_spaceused 'MailAuditTrail'



To get All table in database and no. of rows in table
SELECT
    [TableName] = so.name,
    [RowCount] = MAX(si.rows)
FROM
    sysobjects so,
    sysindexes si
WHERE
    so.xtype = 'U'
    AND
    si.id = OBJECT_ID(so.name)
GROUP BY
    so.name
ORDER BY
    2 DESC


SQL DML and DDL

SQL can be divided into two parts: The Data Manipulation Language (DML) and the Data Definition Language (DDL).
The query and update commands form the DML part of SQL:
  • SELECT - extracts data from a database
  • UPDATE - updates data in a database
  • DELETE - deletes data from a database
  • INSERT INTO - inserts new data into a database
The DDL part of SQL permits database tables to be created or deleted. It also define indexes (keys), specify links between tables, and impose constraints between tables. The most important DDL statements in SQL are:
  • CREATE DATABASE - creates a new database
  • ALTER DATABASE - modifies a database
  • CREATE TABLE - creates a new table
  • ALTER TABLE - modifies a table
  • DROP TABLE - deletes a table
  • CREATE INDEX - creates an index (search key)
  • DROP INDEX - deletes an index

Different SQL JOINs

Before we continue with examples, we will list the types of JOIN you can use, and the differences between them.
  • JOIN: Return rows when there is at least one match in both tables
  • LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table
  • RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table
  • FULL JOIN: Return rows when there is a match in one of the tables

SQL INNER JOIN Keyword

The INNER JOIN keyword return rows when there is at least one match in both tables.

SQL LEFT JOIN Keyword

The LEFT JOIN keyword returns all rows from the left table (table_name1), even if there are no matches in the right table (table_name2).

SQL FULL JOIN Keyword

The FULL JOIN keyword return rows when there is a match in one of the tables.

SQL FULL JOIN Keyword

The FULL JOIN keyword return rows when there is a match in one of the tables.

The SQL SELECT INTO Statement

The SELECT INTO statement selects data from one table and inserts it into a different table.
The SELECT INTO statement is most often used to create backup copies of tables.
SELECT *
INTO Persons_Backup
FROM Persons
We can also use the IN clause to copy the table into another database:
SELECT *
INTO Persons_Backup IN 'Backup.mdb'
FROM Persons

SQL SELECT INTO - With a WHERE Clause

We can also add a WHERE clause.
The following SQL statement creates a "Persons_Backup" table with only the persons who lives in the city "Sandnes":
SELECT LastName,Firstname
INTO Persons_Backup
FROM Persons
WHERE City='Sandnes'


SQL SELECT INTO - Joined Tables

Selecting data from more than one table is also possible.
The following example creates a "Persons_Order_Backup" table contains data from the two tables "Persons" and "Orders":
SELECT Persons.LastName,Orders.OrderNo
INTO Persons_Order_Backup
FROM Persons
INNER JOIN Orders
ON Persons.P_Id=Orders.P_Id

SQL NULL Values

If a column in a table is optional, we can insert a new record or update an existing record without adding a value to this column. This means that the field will be saved with a NULL value.
NULL values are treated differently from other values.
NULL is used as a placeholder for unknown or inapplicable values.

NULL values represent missing unknown data.
By default, a table column can hold NULL values.


SSIS Package:

SSIS (SQL Server Integration Services) is an upgrade of DTS (Data Transformation Services), which is a feature of the previous version of SQL Server. SSIS packages can be created in BIDS (Business Intelligence Development Studio). These can be used to merge data from heterogeneous data sources into SQL Server. They can also be used to populate data warehouses, to clean and standardize data, and to automate administrative tasks.
SQL Server Integration Services (SSIS) is a component of Microsoft SQL Server 2005.It replaces Data Transformation Services, which has been a feature of SQL Server since Version 7.0. Unlike DTS, which was included in all versions, SSIS is only available in the "Standard" and "Enterprise" editions. Integration Services provides a platform to build data integration and workflow applications. The primary use for SSIS is data warehousing as the product features a fast and flexible tool for data extraction, transformation, and loading (ETL). ). The tool may also be used to automate maintenance of SQL Server databases, update multidimensional cube data, and perform other functions

Parameter Sniffing refers to a process whereby SQL Server’s execution environment “sniffs” the parameter values during first invocation, and passes it along to the query optimizer so that they can be used to generate optimal query execution plans.
“First invocation” also refers to the first invocation after a plan was removed from cache for lack of reuse or for any other reason. The optimizer “knows” what the values of the input parameters are, and it generates an adequate plan for those inputs parameters. SQL Server internally maintains the statistics and distribution of the values in the columns used for filtering.
While parameter sniffing is certainly a powerful feature, it can cause problems when a procedure’s plan happens to have been kicked out of the procedure cache (or was never in it) just prior to the procedure being called with atypical parameter values. This can result in a plan that is skewed toward atypical use, one that is not optimal when called with typical values. Since, once cached, a query plan can be reused for parameter values that vary widely, the ideal situation is to have a plan in the cache that covers the typical usage of the procedure as much as possible. If a plan makes it into the cache that is oriented toward atypical parameter values, it can have a devastating effect on performance when executed with typical values.
An example would probably help here. Suppose we had a stored procedure that returns sales data by country. In our case, three-fourths of our sales is in the UK. The procedure takes a single parameter, @country, indicating the country for which to return sales info. It uses this parameter to filter a simple SELECT statement that returns the requested sales data.
CREATE PROCEDURE uspGetCountrySale
(@Country Varchar(50))
AS 
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.SaleOrders
WHERE Country = @Country 
GO
 The optimizer would most likely to choose to do a clustered index scan when creating execution plan for this query because (given that “UK” would normally be passed in for @country) so much of the table would be traversed anyway that scanning it would require less I/O and be faster than repeated nonclustered index lookups. However, what happens if the plan happens to have been kicked out of the cache (let’s say due to an auto-statistics update) just prior to a user calling it with, say, “Spain”, where we have almost no sales? Assuming a suitable index exists, the optimizer may decide to use a nonclustered index seek in the new query plan. Subsequent executions of the procedure would reuse this plan, even if they passed in “UK” for @country. This could result in performance that is very slower than the scan-based plan.
As a workaround prior to SQL Server 2005, local variables can be used instead of stored procedure parameters. Please note SQL Server can not sniff the value of local variable. This will lead SQL Server to use statistics on filter column and create a plan which is best for average values in that column. This can also lead to serious performance when same procedure called with atypical value but will do best for typical values.
CREATE PROCEDURE uspGetCountrySale
(@Country Varchar(50))
AS 
DECLARE @_Country Varchar(20)
SET @_Country = @Country 
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.SaleOrders
WHERE Country = @_Country 
GO
There’s a new query hint provided in SQL Server 2005 to tackle the problem—the OPTIMIZE FOR query hint. This hint allows you to provide SQL Server with a literal that reflects the selectivity of the variable, in case the input is typical. For example, if you know that the variable will typically end up with a highly selective value, you can provide the literal which reflects the typical value. for example.
CREATE PROCEDURE uspGetCountrySale (@Country Varchar(50))
AS 
DECLARE @_Country Varchar(20)
SET @_Country = @Country 
SELECT OrderID, CustomerID, EmployeeID, OrderDate
FROM dbo.SaleOrders
WHERE Country = @_Country
OPTION (OPTIMIZE FOR(@Country = ‘UK’)); 
GO


SQL Server 2005 ranking functions - RANK(), DENSE_RANK(), NTILE()

RANK(), DENSE_RANK() and NTILE() are newly added functions in SQL Server 2005 T-SQL syntax. Another ranking function is ROW_NUMBER() that I have blogged earlier.
RANK() returns the rank of each row within the partition of a result set. When there is a tie, the same rank is assigned to the tied rows.
For example, 1, 2, 3, 3, 3, 6, 7, 7, 9, 10
DENSE_RANK() works like RANK(), except that the numbers being returned are packed (do not have gaps) and always have consecutive ranks.
For example, 1, 2, 3, 3, 3, 4, 5, 5, 6, 7
NTILE(integer_expression) breaks the rows within a partition into groups, while the number of groups is specified by "integer_expression". This is useful when percentile rank is required.
NTILE(N) returns 1 for the rows in the first group, 2 for those in the second group, and returns N for the last (N-th) group. Each group contains the same number of rows, or, if the number of rows in a partition is not divisible by "integer_expression", lower-numbered groups (starting from 1, 2, ...) will each contain one more row. For example,
NTILE(2): 1, 1, 1, 1, 1, 2, 2, 2, 2, 2
NTILE(3): 1, 1, 1, 1, 2, 2, 2, 3, 3, 3
NTILE(4): 1, 1, 1, 2, 2, 2, 3, 3, 4, 4
NTILE(5): 1, 1, 2, 2, 3, 3, 4, 4, 5, 5
NTILE(6): 1, 1, 2, 2, 3, 3, 4, 4, 5, 6

The basic syntax follows.
ROW_NUMBER()  OVER ([<partition_by_clause>] <order_by_clause>)
RANK()  OVER ([<partition_by_clause>] <order_by_clause>)
DENSE_RANK()  OVER([<partition_by_clause>]<order_by_clause>)
NTILE(integer_expression)  OVER ([<partition_by_clause>] <order_by_clause>)
==========
Putting it all together, the following query shows all ranking functions in action using the famous AdventureWorks OLTP database!

SELECT i.ProductID
  , ProductName = p.Name
  , i.LocationID
  , i.Quantity
  , RowNumber = ROW_NUMBER()
                  OVER (PARTITION BY i.LocationID
                        ORDER BY i.Quantity)
  , Quartile = NTILE(4)
                  OVER (PARTITION BY i.LocationID
                        ORDER BY i.Quantity)
  , Rank = RANK()
                  OVER (PARTITION BY i.LocationID
                        ORDER BY i.Quantity)
  , DenseRank = DENSE_RANK()
                  OVER (PARTITION BY i.LocationID
                        ORDER BY i.Quantity)
 FROM Production.ProductInventory i
       INNER JOIN Production.Product p
             ON i.ProductID = p.ProductID
 WHERE i.LocationID in (3, 4, 5)
 ORDER BY i.LocationID, RowNumber;

----- here comes the query result -----
(
ProductName is intentionally removed for better presentation)
ProductID   LocID Quantity RowNumber  Quartile Rank DenseRank
----------- ----- -------- ---------- ----- ---------- ------
492         3     17       1          1     1          1
496         3     30       2          1     2          2
493         3     41       3          2     3          3
494         3     49       4          3     4          4
495         3     49       5          4     4          4
494         4     12       1          1     1          1
492         4     14       2          1     2          2
493         4     24       3          2     3          3
496         4     25       4          3     4          4
495         4     35       5          4     5          5
317         5     158      1          1     1          1
318         5     171      2          1     2          2
351         5     179      3          1     3          3
319         5     184      4          1     4          4
952         5     192      5          1     5          5
400         5     260      6          1     6          6
815         5     265      7          1     7          7
401         5     283      8          1     8          8
352         5     300      9          1     9          9
488         5     318      10         1     10         10
477         5     323      11         1     11         11
476         5     324      12         1     12         12
949         5     336      13         1     13         13
487         5     337      14         2     14         14
950         5     342      15         2     15         15
332         5     344      16         2     16         16
945         5     347      17         2     17         17
948         5     347      18         2     17         17
951         5     348      19         2     19         18
802         5     350      20         2     20         19
803         5     356      21         2     21         20
804         5     363      22         2     22         21
399         5     366      23         2     23         22
398         5     372      24         2     24         23
320         5     372      25         2     24         23
484         5     374      26         2     26         24
481         5     374      27         3     26         24
479         5     390      28         3     28         25
816         5     406      29         3     29         26
327         5     408      30         3     30         27
819         5     409      31         3     31         28
482         5     427      32         3     32         29
485         5     427      33         3     32         29
818         5     428      34         3     34         30
821         5     432      35         3     35         31
817         5     443      36         3     36         32
820         5     446      37         3     37         33
486         5     515      38         3     38         34
480         5     515      39         3     38         34
483         5     531      40         4     40         35
316         5     532      41         4     41         36
321         5     540      42         4     42         37
330         5     548      43         4     43         38
329         5     558      44         4     44         39
328         5     568      45         4     45         40
323         5     568      46         4     45         40
324         5     568      47         4     45         40
478         5     568      48         4     45         40
331         5     574      49         4     49         41
322         5     587      50         4     50         42
350         5     622      51         4     51         43

==========
For more information

- NTILE { http://msdn2.microsoft.com/en-us/library/ms175126.aspx }
- RANK { http://msdn2.microsoft.com/en-us/library/ms176102.aspx }
- DENSE_RANK { http://msdn2.microsoft.com/en-us/library/ms173825.aspx }
- ROW_NUMBER { http://msdn2.microsoft.com/en-us/library/ms186734.aspx }
- Ranking functions { http://msdn2.microsoft.com/en-us/library/ms189798.aspx }
- OVER clause { http://msdn2.microsoft.com/en-us/library/ms189461.aspx }
- ORDER BY clause { http://msdn2.microsoft.com/en-us/library/ms188385.aspx }
- What's New in SQL Server 2005 { http://www.microsoft.com/sql/prodinfo/overview/whats-new-in-sqlserver2005.mspx }
- AdventureWorks - SQL Server 2005 Samples and Sample Databases (July 2006) { http://www.microsoft.com/downloads/details.aspx?FamilyId=E719ECF7-9F46-4312-AF89-6AD8702E4E6E&displaylang=en }

No comments:

Post a Comment