I’m still barely scratching the surface working with spatial data in SQL Server 2008. We’ve ported some of the data into a table where we built a geography spatial data column and we’re begginning to work with point data. The requirements from the developers are, so far, very simple. They’ll feed me a point and I find all the locations “close” to it. We had to go round & round on what defines “close” but finally settled on, I think, 15km.
The query to answer a question like this is ridiculously simple (a few object names have been changed):
SELECT ebe.[Location].STDistance(@Location) AS Distance, ebe.[InterestId], ebe.[Location].Lat AS Latitude, ebe.[Location].Long AS Longitude, ebe.[OrgId] FROM dbo.[ebe] AS ebe WHERE ebe[OrgId] = @OrgId AND ebe.[Location].STDistance(@Location) < @CloseDistance
I’m not even hard-coding the “close” value so it can change when they change their minds. It retrieves exactly what’s needed. But… Well, look at the STATISTICS IO & TIME:
Table 'ebe'. Scan count 3, logical reads 40179
Table 'Worktable'. Scan count 0, logical reads 0
CPU time = 376 ms, elapsed time = 373 ms.
Not exactly snappy, is it? So, the obvious answer is to provide an index it can use so that it doesn’t have to scan the clustered index. Spatial indexes only support certain functions with the spatial column, STIntersects, STEquals, STDistance. Since I’m dealing with STDistance, I have a shot at this working. Here’s the script:
CREATE SPATIAL INDEX [ebe_spatial] ON [dbo].[ebe]
(
[Location]
)
USING GEOGRAPHY_GRID
WITH
( GRIDS =( LEVEL_1 = MEDIUM, LEVEL_2 = MEDIUM, LEVEL_3 = MEDIUM, LEVEL_4 = MEDIUM),
CELLS_PER_OBJECT= 128,
PAD_INDEX = OFF,
SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON)
ON [PRIMARY]
When I then run the query, well, here’s the STATISTICS IO & TIME again:
Table 'Worktable'. Scan count 0, logical reads 0
Table 'extended_index_469576711_384000'. Scan count 2076
Table 'ebe'. Scan count 0, logical reads 448064
Table 'Worktable'. Scan count 0, logical reads 0
CPU time = 1155 ms, elapsed time = 1393 ms.
You read that correctly, it used the spatial index which caused the performance to decrease. There are adjustments you can make to spatial indexes. You can change the number of cells per object or set the detail level on the grid, but it’s hard to understand what this does to the index.
After fighting with it for a bit, I sent a tweet out on twitter, just whining about the index causing a slow-down. Surprised as anything, I get a response from Paul Randall, uh, wow, suggesting an approach. This is responded to by Isaac Kunen, you know the Microsoft PM & spatial expert. You just have to love Twitter. I took the information they gave me and did a few searches on the web. This lead to Bob Beauchemin’s blog and my introduction to some new Dynamic Management Views, sp_help_spatial_geometry_index and sp_help_spatial_geography_index. Now we’re talking. A mechanism to identify how useful the spatial data index is, similar to looking at sys.dm_db_index_physical_stats to see information about a clustered index. This lead to the following query and output:
DECLARE @LocationGEOGRAPHY SET @Location = GEOGRAPHY::STPointFromText('POINT (-86.674582 36.148843)',4326) EXEC sp_help_spatial_geography_index @tabname = 'EWH_BIM_Extract2' ,@indexname='ebe_spatial' ,@verboseoutput = 1 ,@query_sample = @Location
The question is, how efficient is the index? In my case, the Primary_Filter_Efficiency was listed as 3.33333. Yes, that’s 3% efficient. No wonder it killed the performance. What do I do to make the index more efficient? I haven’t a clue. That may be the next blog post. But at least now I know how to evaluate the usefulness of a spatial index.
The post Spatial Data Hurts My Brain appeared first on Home Of The Scary DBA.