Skip to content

hotspot_dual_kde() produces invalid results with lat/lon data #67

@mpjashby

Description

@mpjashby
library(sfhotspot)

# Running dual KDE on lat/lon data produces missing KDE estimates
hotspot_dual_kde(memphis_robberies, memphis_population) |> summary()
#> Cell size set to 0.00524 degrees automatically
#> Bandwidth set automatically based on rule of thumb.
#> ℹ Bandwidth for `x` = 0.06128371 degrees.
#> Bandwidth set automatically based on rule of thumb.
#> ℹ Bandwidth for `y` = 0.05693893 degrees.
#> Data transformed to "WGS 84 / UTM zone 16N" co-ordinate system.
#> ℹ CRS code: "EPSG:32616".
#> ℹ Unit of measurement: metre.
#>        n                kde                geometry   
#>  Min.   : 0.0000   Min.   : NA    POLYGON      :2926  
#>  1st Qu.: 0.0000   1st Qu.: NA    epsg:4326    :   0  
#>  Median : 0.0000   Median : NA    +proj=long...:   0  
#>  Mean   : 0.7673   Mean   :NaN                        
#>  3rd Qu.: 1.0000   3rd Qu.: NA                        
#>  Max.   :28.0000   Max.   : NA                        
#>                    NA's   :2926

# But the KDE estimates are as expected when the data are first transformed to
# a projected CRS
hotspot_dual_kde(
  sf::st_transform(memphis_robberies, "EPSG:6410"), 
  sf::st_transform(memphis_population, "EPSG:6410")
) |> 
  summary()
#> Cell size set to 500 metres automatically
#> Bandwidth set automatically based on rule of thumb.
#> ℹ Bandwidth for `x` = 5,588 metres.
#> Bandwidth set automatically based on rule of thumb.
#> ℹ Bandwidth for `y` = 5,178 metres.
#>        n                kde                   geometry   
#>  Min.   : 0.0000   Min.   :0.02357   POLYGON      :3300  
#>  1st Qu.: 0.0000   1st Qu.:0.14901   epsg:6410    :   0  
#>  Median : 0.0000   Median :0.23398   +proj=lcc ...:   0  
#>  Mean   : 0.6803   Mean   :0.23669                       
#>  3rd Qu.: 1.0000   3rd Qu.:0.28704                       
#>  Max.   :24.0000   Max.   :1.10445

Created on 2025-08-01 with reprex v2.1.1

It might also be worth checking at this point if hotspot_dual_kde() also needs to check that the two datasets overlap. This is done for x and grid because of #39 but probably also needs to be done for x and y (and possibly for y and grid).

https://github.com/mpjashby/sfhotspot/blob/6b3caf7d27690a15072cf5d77d2fbd6f33185808/R/hotspot_dual_kde.R#L185C3-L186C73

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions