5119ac_2ff9d66bc25f4d9baae92ca974e6f9e4~mv2.png

Socioeconomic and Ecological Drivers of Birdwatching Engagement in China

This research investigates how birdwatching engagement across Chinese provinces relates to actual bird species diversity, revealing that socioeconomic factors—especially GDP per capita—play a stronger role than biodiversity in shaping participation.

Background and hypothesis

Wild birds are closely related to a group of people who pay close attention to them—birdwatchers. Birdwatching is a recreational activity and also a form of citizen science where people observe wild birds. It has significant recreational value, connecting humans with nature, and it also holds potential to foster tourism and boost the economy of an area. It is noticeable on social media that people in some Chinese provincial administrative regions (provinces) participate in birdwatching more than others through their posts. However, many provinces, such as Shanghai and Beijing, are not as rich in bird biodiversity as other provinces like Yunnan, though people in Shanghai and Beijing seem to post more about birdwatching. It becomes questionable, according to this observation, whether bird biodiversity in each province correlates directly with birdwatching participation. Understanding this question is crucial for the development of sustainable ecotourism. So, I formulated this research question to investigate the following: To what extent does public engagement in birdwatching align with the actual distribution of bird species diversity across different provinces in China, and what other factors could affect birding engagement in different provinces?

a black and white cute bird head_edited_

Methodology

Figure 1. Workflow of the analysis. The three main steps are data acquisition, data cleaning and integration, and data analysis.

Global Biodiversity Information Facility (GBIF)

GBIF is an open-access data platform funded by international governments that provides biodiversity data worldwide. It draws from diverse data sources from museum species to smartphone photo records and integrates them using data standards, like Darwin Core. In GBIF, I downloaded two datasets from eBird and iNaturalist, where most birdwatchers in China upload their observations apart from China Bird Report Center. The filter to find and download the suitable eBird dataset is “China”, “2014-2025”, and “EOD-Ebird Observation Dataset”, and the filter for the iNaturalist dataset is “China”, “2014-2025”, “Aves”, and “iNaturalist Research-grade Observations.”

https://www.gbif.org/

China Bird Report Center

China Bird Report Center is a national bird report site that Chinese birders use apart from eBird and iNaturalist. It has data on the total observation number of each province and the species of birds each province has. I downloaded the provincial total amount of birding reports from 2014 to 2025 and the provincial total species count on the website front page.

https://www.birdreport.cn/

National

Data

National Data is a Chinese governmental website that publishes statistical data about the country, from the country's GDP in seasons to the number of high schools in each province. From the National Data, I obtained the provincial GDP, provincial high school graduates, provincial population, and provincial natural reserve area of China over 10 years.

https://data.stats.gov.cn/index.htm

Provincial area

Each province’s area is found on the provincial government’s official website.

Data Acquisition

To investigate this problem, I collected information using the platforms below:

Data Cleaning and Integration

I used Colab to perform data analysis. I found the yearly average for each province’s high school graduates and population, then combined it with GDP, natural reserve area, bird report center report counts, and species counts of each province into a data frame. Each row of the data frame is a province, and each column is a variable. Each row of the raw data of iNaturalit and eBird shows the information about each time a bird species is seen, so I summed the number of times a bird was reported for each province and added it to the data frame. Afterwards, I calculated GDP per capita and natural reserve area per 10 thousand square kilometers, and summed all the report counts to find reports per capita. I then used the “dropna()” command to remove some rows that lack certain values. Finally, the data cleaning and integration are complete. There are 31 provinces out of the 34 provinces in China that will be used for further analysis.

Data Analysis

The first analysis is a descriptive analysis that shows the proportion of each province in species counts and total report counts. So, I plotted two pie charts for species counts and total report counts. Next, I plotted a percentage bar graph that shows the proportion of birding reports from the three platforms with the provincial species counts. Then, I performed a correlation analysis to investigate the correlation between birding activity and bird species diversity, and I also chose between Spearman and Pearson correlations to find the suitable coefficient. I plotted a linear regression line to show their correlation. Subsequently, I performed multiple linear regression to find the correlation between multiple independent variables, such as GDP and population, and the dependent variable, which is the total report number. I plotted two correlation heat maps for the independent variables, the second one being the modified version of the first after standardizing the variables. I made 3D wireframe plots of six pairs of independent variables. Finally, I calculated the mismatch index between the Z-score of bird report and bird species, as well as the Z-score of bird reports and GDP. The Z-score formula is shown below, where x is the individual data, μ is the population mean, σ is the population standard deviation.

Results

The two pie charts (Figure 2) in the descriptive analysis show the proportion of each province in total report numbers in the 34 provinces and the proportion of each province in total bird species numbers, respectively. The results show that Beijing (17.9%), Zhejiang (9.8%), Yunnan (9.4%), and Shanghai (9.4%) account for the largest birding report proportions among the 31 provinces. The bird species distribution in other provinces is relatively even (about 3%), with a higher proportion in Yunnan (6%), signaling higher bird species biodiversity.

Figure 2. Pie charts showing provincial proportion of report numbers (left) and bird species numbers (right). Each slice represents a province.

The percentage bar graph (Figure 3) shows the proportion of birding reports from different reporting platforms (iNaturalist, eBird, BRC) with bird species counts from each province in China. The percentage of birding reports contributed by each platform varies across provinces. Most provinces have eBird as the greatest contributor, except Hunan. It can be read from the graph that Ningxia province has the highest proportion of species richness compared to its report counts, followed by Heilongjiang, Chongqing, and Gansu. The two graphs help show how much data different data sources contribute, allowing for better understanding about the composition of the total data. However, the actual species richness is difficult to evaluate in this graph because its percentage, compared to the report data, is small. It is hard to tell if two provinces have significant differences in bird species richness based on this graph alone, so I will continue investigating the hypothesis with other methods.

Figure 3. Percentage bar graph of provincial distribution of bird reports from the three platforms with provincial species diversity.

Figure 4. Scatterplot and regression line of bird report counts and bird species counts

Figure 5. Data for the regression of bird report counts and bird species counts scatterplot.

I then proceed to explore the core question in my analysis, which is the correlation between regional species biodiversity and birdwatching activity. The correlation analysis shows that the two variables are not normally distributed, so between Spearman and Pearson correlation, I chose Spearman correlation to find the Spearman correlation coefficient, which is 0.5113. A coefficient of 0.5113 indicates a moderate positive correlation between the two variables. This means that as bird biodiversity increases, birdwatching engagement tends to increase as well, though not necessarily in a linear way, and vice versa. A p-value < 0.05 indicates that the correlation is statistically significant.I made a scatter plot (Figure 4) of the two variables with a simple linear regression line of bird report counts and bird species counts, where total report counts is the dependent variable, while species richness is the independent variable. The R2 value, which is 0.16, demonstrates that only 16% of the variation in report counts is explained by the bird species richness. The F-statistic, which is 5.515, and the prob (F-statistic), which is 0.0259, signify that the overall regression model is statistically significant. The independent variable explains a meaningful amount of variation in the dependent variable. The equation for the simple linear regression line is y=448.6x-112000 (Figure 5).

After the simpler linear regression model, I performed multiple linear regression to examine how multiple factors might influence birdwatching engagement together (Figure 6). The independent variable of the regression is population, province area, provincial GDP, provincial high school graduates, provincial natural reserve area, and species counts. The dependent variable is report counts. The model analyzes how each independent variable correlates with the dependent variable, and results were not significant. Only the species count is statistically significant (p < 0.01), indicating it is a strong predictor of the dependent variable, total report counts. About 55.3% of the variation in total report counts is explained by the model because R2=0.553. The reason for this outcome is severe multicollinearity between some

predictors, shown through the condition number being 1.80e+05. Indeed, the GDP of a province should be related to the population of that province because logically a higher population should result in more production, and the larger a province in area, the more natural reserve area it could have. I then used a Seaborn heat map to visualize the relationship between the independent variables to see if they might be

multicollinear. The first Seaborn heat map (Figure 7) I drew was based on six independent variables, which are: population, province area, provincial GDP, provincial high school graduates, provincial natural reserve area, and species counts. There are many numbers with their absolute values close to 1, showing strong multicollinearity.

Figure 6. Data for the first Seaborn heatmap.

Figure 7. Seaborn heatmap of 5 factors. A brighter color (blue or red) with a number closer to 1 or -1 signals that two independent factors are more multicollinear. A whiter color with a number closer to 0 signals that two independent factors are not very multicollinear.

Figure 8. Data for the second Seaborn heatmap.

Figure 9. Seaborn heatmap of 4 factors. A brighter color (blue or red) with a number closer to 1 or -1 signals that two independent factors are more multicollinear. A whiter color with a number closer to 0 signals that two independent factors are not very multicollinear.

To improve the model, I performed standardization before using multiple linear regression. Dividing GDP by population and natural reserve area by province area, I acquired the GDP per capita and reserve area per ten thousand km² of every province. The second model was based on four independent variables: provincial GDP per capita, provincial high school graduates, provincial natural reserve area per ten thousand km2, and species counts. The dependent variable is still report counts. This time, the results were better (Figure 8). The overall model is statistically significant (p= 0.000118 < 0.001), meaning the predictors collectively explain the outcome well. Condition Number = 2.80e+03 shows acceptable multicollinearity. Among the independent variables, GDP per capita has the smallest p-value, meaning it is the strongest predictor. The second heat map (Figure 9) also shows decreased multicollinearity between the four independent variables.

I also investigated the correlation of pairs of independent factors. The four independent factors, which are GDP per capita, graduates average, reserve area per ten thousand square kilometers, and bird species counts, are paired together, and 3D wireframe plots are made for each pair with the dependent variable, which is report counts (Figure 10). They are all rather linearly related.

To directly compare species counts and report counts quantitatively, I performed Z-score standardization on species counts and report counts, then subtracted the Z score of species counts from the Z score of report counts to get the mismatch index, shown in the table below. The mismatch index shows if a province’s bird biodiversity might receive over-attention from birdwatchers, not enough attention, or a balanced amount. It could provide insight on policy making for each province to encourage ecotourism or redirect public focus. Provinces like Beijing and Shanghai are receiving over-attention from birdwatchers, while provinces like Tibet and Yunnan have the biodiversity to support more birding-related tourism.

Table 1. Mismatch index of report counts Z score minus species counts Z score. A mismatch index >1 shows the province is a “overattention zone” where too many people birdwatch in consideration of the bird biodiversity of the region. Provinces with a mismatch index <-1 show the biodiversity is undervalued—more people should come to birdwatch in those regions. The provinces with a mismatch index in-between could maintain their current human-nature interaction balance.

Figure 10. 3D wireframe plots for six pairs of independent factors.

I also visualized this mismatch index through a map of China. Different colors on the map show different values of the index, which holds significant visualization value.

Table 2. Mismatch index of report counts Z score minus GDP Z score. A mismatch index > 1.5 shows the province is performing exceedingly well in birdwatching when considering its GDP, and an index <-1.5 shows that the province has the potential for more birdwatching engagement considering its GDP. The ones in the middle have a birdwatching engagement that aligns with their GDP performance.

I did the same for another set of values, average provincial GDP, and report counts. This set of mismatch indices shows how well each province’s birding engagement is, considering their GDP. Beijing and Yunan’s mismatch score between GDP and birdwatching reports Z-score is the top two largest with report over GDP; in contrast, Shandong and Guangdong have the smallest mismatch index. The results show that provinces like Beijing and Yunnan have a lot of birdwatching engagement considering their GDP, while provinces like Shandong and Guangdong have the GDP to do better. I visualized it as well.

Discussion

Through different analyses, many conclusions can be made. I found that GDP per capita plays the most significant role in affecting birdwatching participation, while the actual bird species biodiversity is not that correlative. This signifies that wealthier provinces tend to have more birds being reported. More economic resources likely support better infrastructure, education, and citizen science participation, which could be the reason for this outcome. Interestingly, high school graduates have a negative correlation with report numbers, which is counterintuitive. This could reflect regional differences in occupation or time availability.

The role of GDP per capita suggests that, down to its core, birdwatching is affected by socio-economic factors rather than environmental factors, and it also underscores the importance of economic prosperity in enabling recreational activities like birdwatching. I also found that some provinces might be too engaged in birdwatching, while others have great ecological potential, but the engagement remains low. These can aid in the provincial government’s policy-making strategies. For example, for provinces that are undervalued in their bird biodiversity, like Yunnan, the local government can aim to promote ecotourism for birdwatchers, which can drive the economy forward. When the GDP improves, there will possibly be more birdwatching engagement because they are highly correlated. As for provinces with over-attention on birdwatching, the government could emphasize redirecting public focus to ensure that the enthusiastic birdwatchers will not cause too much environmental damage during birdwatching.