ExperimentalWe used data and feedback to make a user-friendly site. We’d love your feedback.
Back to data.ca.gov.

Black Rail Species Habitat Model, Categorical - CWHR [ds3249]

The Range and Distribution Mapping and Analysis Project (RADMAP) in the California Department of Fish and Wildlife’s (CDFW) Biogeographic Data Branch (BDB) develops and maintains spatial models for use in conservation decision making, including species range maps and species habitat models. RADMAP is building a library of vetted species range maps and habitat models within California for use by CDFW staff and partners. The categorical species habitat model (SHM) is derived from a continuous SHM, splitting the continuous model output into predicted habitat and non-habitat. This map simplifies continuous SHMs, categorizing predicted habitat into areas with high, medium, and low relative probabilities of supporting the species. Habitat (values higher than the minimum training presence threshold) is divided into low, medium, and high categories based on expert-chosen predicted thresholds. The focal taxon may or may not actually occur in areas predicted with a high relative probability of habitat use; habitat may be suitable but unoccupied, particularly for taxa with small and/or declining populations or limited mobility. Models may not accurately reflect all habitats used per taxon due to a dearth of presence data, thereby limiting the scope of environmental space represented by the SHM. Users should refer to the validation metrics and consider the level of uncertainty associated with the model when interpreting model outputs. Areas with high relative predicted values indicate areas where the habitat is most likely to support the species and may be prioritized for locating survey or monitoring sites for scientific studies aiming to conserve and protect focal taxa. Occurrence records were obtained from various sources, as indicated below. To reduce sampling bias and avoid model overfitting, which can reduce model applicability to unsampled areas, we excluded spatially autocorrelated presence records. Owing to the breadth of our study region, we accounted for topographical heterogeneity using a digital elevation model and filtered occurrence records based on this new raster at three distinct distances. Areas with species occurrences and high topographical heterogeneity were filtered at closer Euclidean distances. For potentially ecologically relevant environmental covariate inputs we computed a Pearson correlation coefficient matrix to assess the strength of association among variables; those that were highly autocorrelated (greater than or equal to 0.7) were removed from further analyses and the most ecologically relevant variables kept. All covariates were continuous and formatted at a resolution of 30 m. Potential habitat use was estimated using a maximum entropy approach (Maxent) implemented via R language that relied on presence data and comparisons between environmental covariate values at presence localities and those at randomly selected background sites (Phillips et al. 2006). To demarcate the specific geographical area used for model calibration, background locations were selected via local adaptive convex-hull polygons based on known species’ dispersal and movement limitations. This allowed RADMAP to exclude uncolonized suitable habitat, potentially due to dispersal barriers or inhibitory biotic interactions; it also precluded overfitting models to environmental conditions immediately adjacent to occurrence records, thus improving predictive performance. For each model, we randomly sampled > 10,000 background locations. Five feature-class combinations and five regularization-multiplier settings were adjusted rather than using default Maxent settings to improve model fit. We considered a range of regularization multipliers in integer-sized increments from 1 to 5, then divided data into training and test groups using geographically structured k-fold cross validation (k = 4) to reduce overfitting to environmental conditions among spatial partitions. A total of 25 models were run during this phase of model development. Additionally, 25 full models were run using all available presence data. Models were evaluated with multiple statistics, including test omission error rate (OER) and true skill statistic (TSS), which are threshold-dependent metrics. Second, we generated receiver-operating-characteristic curves and assessed model performance using area-under-the-curve (AUC) analyses for test data, a threshold-independent metric. AUC calculates an average value for the k-folds used on the analysis and assesses the difference between AUC training and test data (AUCdiff), the latter of which is used to quantify overfitting (i.e., lower values indicate better fits). The training model with the lowest OER and highest AUC and TSS values for both test and training models was considered the top model. We calculated the contribution percent of each predictor variable to the top model to identify the explicit role of each in influencing the distribution of a species. Models were extrapolated to the taxon’s range to predict across all potentially occupied areas within California. Results of the Maxent model are included in the pdf attachment, including top model review score, validation metrics, model output details, covariate response curves, percent contribution of covariates to the top model, and a full list of covariates included in the model. The top model results are linked here: https://nrm.dfg.ca.gov/FileHandler.ashx?DocumentID=239507. Each focal taxon’s location data was extracted (when applicable) and collated from the following list of data sources. BIOS datasets are bracketed with their “ds” numbers and can be located on CDFW’s BIOS viewer: https://wildlife.ca.gov/Data/BIOS. California Natural Diversity Database, Terrestrial Species Monitoring [ds2826], North American Bat Monitoring Data Portal, VertNet, Breeding Bird Survey, Wildlife Insights, eBird, iNaturalist, other available CDFW or partner data. Please refer to the Range Map and Species Habitat Model Use Case Guidance document on how best to interpret RADMAP outputs, including range maps, continuous SHMs, and categorical SHMs. Specifically, users should follow these guidelines to determine which products to utilize for conservation, management, and policy decision making use cases: https://nrm.dfg.ca.gov/FileHandler.ashx?DocumentID=222269.

Data files

Data title and descriptionAccess dataFile detailsLast updated

Source download (File Geodatabase)

DownloadZIP
12/11/25

Supporting files

Data title and descriptionAccess dataFile detailsLast updated

ArcGIS Hub Dataset

HTML
12/11/25

ArcGIS GeoService

ARCGIS GEOSERVICES REST API
12/11/25

CDFW BIOS viewer


12/11/25

Range and Species Habitat Model Use Case Guidance


12/11/25

SHM results metadata


12/11/25

API endpoint

Dataset Name

Use the query web API to retrieve data with a set of basic parameters. Copy the API endpoint you need to start.

Usage documentation