Data exploration
Raw data set
Imbalanced classes
The class is not balanced in the data set. After checking and correcting the pseudo-absence points, there is still only 20.3% of the data points have white spruce.
Variances among two classes
Fig 4 and Fig 5 tell the same story that white spruce occurrence is closely related with low temperature. The species tend to live in cold and continental regions. However, precipitation-related climatic variables(AHM, SHM, GSP, and MAP) seems have no obvious differences among these two classes. Same results also can get from Fig 3 and Fig 6. Moreover, places without the modeled species have greater variances(wider range of variable values) within the class.
Outlier? From Fig 4 and Fig 5, AHM showed one extremely unusual point(#828). It is likely to be a outlier or error point, more information about this point is needed.
Based on PCA results, white spruce clusters in high TD and DD <0 area. The first two component of PCA can explain roughly 81% variances of the data set. The first component is mostly about temperature, and the second one represent mainly on precipitation. The correlations among climatic variables are clear from Fig 6 and Fig 7. TD, DD<0, MAT, DD>5 and MCMT are closely related; same with AHM, SHM, MAP, and GSP.
Based on PCA results, white spruce clusters in high TD and DD <0 area. The first two component of PCA can explain roughly 81% variances of the data set. The first component is mostly about temperature, and the second one represent mainly on precipitation. The correlations among climatic variables are clear from Fig 6 and Fig 7. TD, DD<0, MAT, DD>5 and MCMT are closely related; same with AHM, SHM, MAP, and GSP.