Biodiversity and Conservation Analysis

Overview / Goal

Biodiversity loss is a pressing global concern, and U.S. National Parks serve as critical habitats for many vulnerable species. This project analyzed conservation statuses of species across U.S. National Parks using datasets provided by Codecademy, based on real National Park Service data.

The primary aim was to identify endangered species, detect patterns across parks, and explore ecological factors that might influence conservation status. The full data lifecycle was covered: cleaning, exploratory analysis, statistical testing, visualization, and interpretation.

Key Findings

Certain parks have disproportionately high endangered species counts, pointing to geographic hotspots where conservation resources should be focused
Mammals and birds had the highest percentage of endangered listings in the dataset, confirmed through chi-squared testing (p = 3.1e-98)
Observation frequency varied significantly by park, suggesting differences in monitoring effort or species abundance rather than actual population differences
Species with lower observation frequency were not always those with the highest conservation concern, revealing gaps in monitoring coverage

Methods / Process

The two datasets (species_info.csv and observations.csv) were cleaned with Pandas: missing values filled, species names standardized, datasets merged, and categorical variables encoded for analysis.

Exploratory analysis covered species distribution by conservation status, common characteristics among endangered species, and observation patterns across parks. Chi-squared tests via SciPy confirmed whether certain species categories were statistically more likely to be endangered.

Visualizations were built with Matplotlib and Seaborn. The entire project was written and executed in Jupyter Notebook.

Reflections / Next Steps

Working with even fictionalized data made clear how complex conservation work is. Quantitative analysis alone isn't enough without ecological context to interpret what the numbers actually mean.

Next steps would include applying the same techniques to real NPS datasets and integrating geospatial analysis for deeper geographic insights.

← Back to all projects

BIODIVERSITY AND CONSERVATION U.S. NATIONAL PARKS