Sports Culture & Student Engagement

OIM 463 · Final Written Report

Author

Group 2

Published

May 6, 2026

1 Introduction

This report examines which sport :Football, Basketball, or Soccer drives the highest student engagement, what student characteristics predict cultural adoption, how action type and intensity shape reward outcomes, and how peer influence and social media relate to both engagement and adoption. The analysis draws on 1400 student-timestep records from a behavioral simulation of sports culture on a university campus.

The central problem: institutions invest heavily in sports programming to build campus culture, but the levers that drive surface-level engagement versus deep cultural adoption are rarely the same. A student who shows up to three games (high engagement) may not integrate the sport into their identity (low adoption) and vice versa. This report untangles those two outcome tracks and examines what moves the needle on each.

1.1 Data Source

The dataset Sports Culture Dataset (OIM 463) is a publicly available synthetic simulation dataset modeled on reinforcement-learning dynamics of student behavior. It was obtained from the course data repository. Each observation represents one student at one timestep, capturing input signals (peer influence, social media activity, baseline interest), the sport type chosen, and outcome metrics (awareness, engagement, cultural adoption, reward). The dataset contains 20 variables; two had minor missingness (≤5%) addressed by median imputation in the dashboard.

1.2 Ethical Considerations

The dataset is entirely synthetic with no real personally identifiable information. The student_id field is a surrogate key with no link to real individuals. The source requires no scraping, is not paywalled, and has no robots.txt restrictions on academic use. If this methodology were applied to a real student population, consent and de-identification protocols would be required before any individual-level analysis.


2 Data Overview

Show code
tibble(
  Metric = c("Observations", "Variables", "Sports", "Year Groups", "Avg Engagement", "Avg Reward", "Adoption Rate"),
  Value  = c(
    as.character(nrow(df)),
    as.character(ncol(raw)),
    as.character(n_distinct(df$sport)),
    as.character(n_distinct(df$year_label)),
    sprintf("%.3f", mean(df$engagement_level)),
    sprintf("%.3f", mean(df$reward)),
    percent(mean(df$adopted), accuracy = 1)
  )
) |> kable(caption = "Dataset at a glance")
Dataset at a glance
Metric Value
Observations 1400
Variables 16
Sports 3
Year Groups 4
Avg Engagement 0.550
Avg Reward 0.699
Adoption Rate 52%

The sample is well-balanced: Football accounts for 34% of records, Basketball 35%, and Soccer 31%. Each academic year is similarly represented (~25% each), so raw group comparisons are valid without reweighting.


3 Research Questions

We organized the analysis around eight questions across two tracks: engagement (Q1–Q5) and three deeper investigations added to understand the adoption side of the picture (Q6–Q8).

# Question Dashboard Tab
Q1 Which sport produces the highest average engagement? Overview
Q2 Does engagement differ by academic year, and does sport interact? Deep Dive
Q3 Is there an optimal action intensity for each sport? Deep Dive
Q4 Which influence drivers best predict engagement? Influence Drivers
Q5 Are engagement and cultural adoption the same construct? Adoption Predictors
Q6 Which student characteristics best predict high cultural adoption? Adoption Predictors
Q7 Does action type and intensity drive reward outcomes? Action & Reward
Q8 How do peer influence and social media relate to engagement and adoption? Peer & Social

4 Findings

4.1 Q1: Which Sport Leads on Engagement?

Show code
sport_summary <- df |>
  group_by(sport) |>
  summarise(
    n         = n(),
    avg       = mean(engagement_level),
    se        = sd(engagement_level) / sqrt(n()),
    ci_lo     = avg - 1.96 * se,
    ci_hi     = avg + 1.96 * se,
    high_pct  = mean(engaged_high),
    adopt_pct = mean(adopted)
  )

sport_summary |>
  ggplot(aes(reorder(sport, avg), avg, fill = sport)) +
  geom_col(width = 0.55, alpha = 0.9) +
  geom_errorbar(aes(ymin = ci_lo, ymax = ci_hi), width = 0.1, color = "white") +
  geom_text(aes(label = sprintf("%.3f", avg)), vjust = -0.6, fontface = "bold", size = 4) +
  scale_fill_manual(values = sport_colors) +
  scale_y_continuous(limits = c(0, 0.7)) +
  labs(title = "Average Engagement Level by Sport", subtitle = "Error bars = 95% CI",
       x = NULL, y = "Average Engagement Level") +
  guides(fill = "none") + theme_minimal(base_size = 13)

Basketball leads with a mean engagement of 0.557, ahead of Soccer (0.549) and Football (0.544). A one-way ANOVA confirms the differences are statistically significant:

Show code
summary(aov(engagement_level ~ sport, data = df))
              Df Sum Sq Mean Sq F value Pr(>F)
sport          2   0.04 0.01857   0.458  0.632
Residuals   1397  56.59 0.04051               

51% of Basketball participants fall in the high-engagement class, versus 50% for Soccer and 49% for Football.

4.2 Q2: Engagement by Year of Study

Show code
df |>
  group_by(sport, year_label) |>
  summarise(avg = mean(engagement_level), .groups = "drop") |>
  ggplot(aes(year_label, avg, color = sport, group = sport)) +
  geom_line(linewidth = 1.4) + geom_point(size = 3.5) +
  scale_color_manual(values = sport_colors) +
  labs(title = "Engagement Level Across Academic Years",
       x = NULL, y = "Avg Engagement Level", color = "Sport") +
  theme_minimal(base_size = 13)

All three sports show a Junior-year engagement peak. Basketball has the strongest Freshman engagement (0.558), making it the best sport for early campus integration programs. Soccer grows steadily from Sophomore through Senior year (0.566 → 0.559), suggesting it rewards longer-term immersion.

4.3 Q3: Action Intensity and Its Limits

Show code
df |>
  group_by(sport, action_intensity) |>
  summarise(avg = mean(engagement_level), .groups = "drop") |>
  ggplot(aes(action_intensity, avg, color = sport, group = sport)) +
  geom_line(linewidth = 1.3) + geom_point(size = 3) +
  scale_color_manual(values = sport_colors) + scale_x_continuous(breaks = 3:9) +
  labs(title = "Engagement vs. Action Intensity by Sport",
       x = "Action Intensity (3=Low, 9=High)", y = "Avg Engagement Level", color = "Sport") +
  theme_minimal(base_size = 13)

The relationship between intensity and engagement is non-linear and sport-specific. Basketball peaks at intensity 8 (0.581); intensity 9 actually drops engagement, consistent with over-saturation. Soccer is most responsive at low intensities (3–4 = 0.567) and diminishes after intensity 5. Football shows the most monotonic profile, gradually improving with intensity. More programming is not universally better.

4.4 Q4: What Drives Engagement?

Show code
cor_df <- df |>
  select(engagement_level, peer_influence_score, social_media_interactions,
         baseline_interest, channel_reach_score, event_participation_freq,
         awareness_level, cultural_adoption, action_intensity) |>
  drop_na() |> cor() |> as.data.frame() |>
  rownames_to_column("variable") |> select(variable, engagement_level) |>
  filter(variable != "engagement_level") |>
  mutate(label = str_replace_all(variable, "_", " ") |> str_to_title(),
         direction = ifelse(engagement_level >= 0, "Positive", "Negative"),
         label = reorder(label, engagement_level))

cor_df |>
  ggplot(aes(engagement_level, label, fill = direction)) +
  geom_col(width = 0.55) +
  geom_vline(xintercept = 0, color = "grey40") +
  scale_fill_manual(values = c(Positive = "#F5A623", Negative = "#3A7BD5")) +
  labs(title = "Pearson Correlation of Drivers with Engagement Level",
       x = "Correlation Coefficient", y = NULL, fill = NULL) +
  theme_minimal(base_size = 13)

No single driver strongly predicts engagement the highest absolute correlation is action_intensity at 0.025. This is a structural feature of the dataset: engagement is determined by the interplay of many inputs, not dominated by one. For practitioners this means there is no silver bullet — each lever contributes a small marginal amount.

4.5 Q5: Engagement vs. Cultural Adoption

Show code
df |>
  ggplot(aes(engagement_level, cultural_adoption, color = sport)) +
  geom_point(alpha = 0.2, size = 1.3) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1.3) +
  scale_color_manual(values = sport_colors) +
  facet_wrap(~ sport) +
  labs(title = "Engagement Level vs. Cultural Adoption by Sport",
       x = "Engagement Level", y = "Cultural Adoption") +
  guides(color = "none") + theme_minimal(base_size = 13)

A weak negative relationship exists between engagement and cultural adoption across all three sports. A student can be highly engaged without fully internalizing the sport’s culture, and vice versa. These are two distinct outcome tracks requiring different interventions.

4.6 Q6: What Predicts Cultural Adoption?

Show code
df |>
  group_by(sport, year_label) |>
  summarise(rate = mean(adopted, na.rm = TRUE), .groups = "drop") |>
  ggplot(aes(year_label, rate, fill = sport)) +
  geom_col(position = position_dodge(0.75), width = 0.65) +
  geom_hline(yintercept = 0.5, linetype = "dashed", color = "grey50") +
  scale_fill_manual(values = sport_colors) +
  scale_y_continuous(labels = percent_format(), limits = c(0, 0.75)) +
  labs(title = "Cultural Adoption Rate by Sport & Year",
       subtitle = "Dashed = 50% threshold", x = NULL, y = "Adoption Rate", fill = "Sport") +
  theme_minimal(base_size = 13)

Show code
df |>
  select(target_adoption_class, baseline_interest, event_participation_freq,
         peer_influence_score, social_media_interactions, channel_reach_score,
         awareness_level, engagement_level, action_intensity) |>
  drop_na() |> cor() |> as.data.frame() |>
  rownames_to_column("variable") |> select(variable, target_adoption_class) |>
  filter(variable != "target_adoption_class") |>
  mutate(label = str_replace_all(variable, "_", " ") |> str_to_title(),
         direction = ifelse(target_adoption_class >= 0, "Positive", "Negative"),
         label = reorder(label, target_adoption_class)) |>
  ggplot(aes(target_adoption_class, label, fill = direction)) +
  geom_col(width = 0.55) + geom_vline(xintercept = 0, color = "grey40") +
  scale_fill_manual(values = c(Positive = "#F5A623", Negative = "#3A7BD5")) +
  labs(title = "Pearson Correlation of Predictors with Cultural Adoption",
       x = "Correlation Coefficient", y = NULL, fill = NULL) +
  theme_minimal(base_size = 13)

Cultural adoption rates by sport: Soccer (54.7%) > Football (53.2%) > Basketball (49.5%) the inverse of the engagement ranking. Basketball wins on active engagement volume; Soccer wins on deep cultural penetration. The strongest predictors of high adoption are baseline_interest (−0.062) and peer_influence_score (−0.063), both negatively correlated: students who arrive with less prior sports interest and weaker initial peer influence end up more deeply adopting the culture. This counterintuitive finding is consistent with conversion theory, students with more to learn invest more deeply in the process.

Freshman-year adoption rates are highest for all three sports and decline through Senior year, suggesting that early-career immersion programs have the highest cultural adoption ROI.

4.7 Q7: Does Action Type and Intensity Drive Reward?

Show code
df |>
  group_by(sport, action_intensity) |>
  summarise(avg_reward = mean(reward, na.rm = TRUE), .groups = "drop") |>
  ggplot(aes(factor(action_intensity), sport, fill = avg_reward)) +
  geom_tile(color = "white", linewidth = 1) +
  geom_text(aes(label = sprintf("%.2f", avg_reward)), fontface = "bold", size = 4) +
  scale_fill_gradient(low = "#d0e4f7", high = "#E05C2A") +
  labs(title = "Avg Reward by Sport × Action Intensity",
       x = "Action Intensity", y = NULL, fill = "Avg Reward") +
  theme_minimal(base_size = 13)

Football earns the highest average reward (0.705) followed by Soccer (0.699) and Basketball (0.694) — notably the opposite of the engagement ranking, suggesting the simulation rewards less-engaged sports more to balance exploration incentives. The heatmap reveals that reward is highly intensity-sensitive for Soccer: intensity 3 produces the single highest reward cell in the entire matrix (0.760), while Football’s reward is relatively stable across intensities (0.691–0.720). Basketball shows a mild peak at intensity 4 (0.715) that fades at higher intensities. The interaction between sport and intensity is therefore meaningful: optimal reward strategy differs by sport type.

4.8 Q8: Peer Influence & Social Media → Engagement and Adoption

Show code
df |>
  select(sport, peer_influence_score, social_media_interactions,
         engagement_level, cultural_adoption) |>
  drop_na() |>
  pivot_longer(c(peer_influence_score, social_media_interactions),
               names_to = "driver", values_to = "driver_val") |>
  pivot_longer(c(engagement_level, cultural_adoption),
               names_to = "outcome", values_to = "outcome_val") |>
  mutate(driver  = str_replace_all(driver,  "_", " ") |> str_to_title(),
         outcome = str_replace_all(outcome, "_", " ") |> str_to_title()) |>
  ggplot(aes(driver_val, outcome_val, color = sport)) +
  geom_point(alpha = 0.12, size = 1) +
  geom_smooth(method = "lm", se = FALSE, linewidth = 1.2) +
  scale_color_manual(values = sport_colors) +
  facet_grid(outcome ~ driver, scales = "free_x") +
  labs(title = "Peer Influence & Social Media → Engagement and Adoption",
       x = "Driver Value", y = "Outcome", color = "Sport") +
  theme_minimal(base_size = 13) +
  theme(strip.text = element_text(face = "bold"))

All four driver-outcome pairings show weak relationships. Peer influence has a slight negative relationship with both engagement (−0.005) and adoption (−0.042), while social media shows a negative relationship with engagement (−0.023) but positive with adoption (+0.035). The divergence is meaningful: social media exposure may build cultural familiarity and identification (adoption) without translating into behavioral participation (engagement). Heavy social media users know the sport culture without necessarily showing up. Peer influence, meanwhile, may create social pressure that paradoxically reduces autonomous engagement — students who feel coerced by peer norms may resist active participation even as cultural adoption creeps upward through osmosis.


5 Dashboard Design Decisions

The dashboard is built in R Shiny with bslib using a dark theme (bg = #0F1117) appropriate for a sports-analytics context. Seven tabs map to the eight research questions:

Overview (Q1) presents the primary finding immediately three KPI cards, a bar chart with 95% CI, a violin-plus-boxplot showing full distributions, and a heatmap of high-engagement rates by sport × year. Filters for year and intensity allow real-time sub-group exploration.

Deep Dive (Q2–Q3) uses a grouped line chart for the year trajectory and an intensity curve per sport. Sport and year checkboxes allow team members to isolate any sub-group comparison in real time.

Influence Drivers (Q4) offers a selectable x-axis dropdown so the audience can interrogate each driver without tab-switching. A full correlation bar anchors the interpretation.

Adoption Predictors (Q5–Q6) directly contrasts engagement and adoption. KPI cards surface the adoption rate and top sport. The predictor correlation chart makes the counterintuitive negative baseline_interest finding visible. A scatter faceted by sport shows the engagement–adoption divergence.

Action & Reward (Q7) centers on the heatmap of Sport × Intensity → Reward, with supporting bar and violin charts. The KPI cards surface the peak reward sport and intensity.

Peer & Social (Q8) is the most flexible tab: a radio button toggles the outcome (engagement vs. adoption) so the user can see both relationships without visual clutter. Quartile bar charts make the non-linear peer influence pattern visible even without statistical modeling.

Visualization principles applied: Consistent sport color encoding (Football = #E05C2A, Basketball = #F5A623, Soccer = #3A7BD5) is maintained across all 20+ plots so the legend becomes implicit after the first view. Bar length encodes comparison; heatmap cell color supplements printed values for accessibility. Error bars appear only where uncertainty around a central claim matters (Overview bar, Reward bar). Violin plots layer with boxplots to show both shape and summary statistics simultaneously.


6 Limitations

The dataset is synthetic. The sport labels are interpretations we imposed on the chosen_action codes (0/1/2); the original documentation does not confirm these mappings. All numerical findings should be treated as illustrative patterns rather than population estimates.

The weak correlations in Q4 and Q8 reflect the simulation’s design: engagement is distributed across many inputs rather than driven by one, which is appropriate for reinforcement-learning research but limits causal inference. Generalizing to real campus programming decisions requires a real dataset with experimental variation in the levers being tested.

Finally, the negative correlations between peer influence/baseline interest and adoption (Q6) are intriguing but cannot be interpreted causally without longitudinal data and a proper identification strategy.


7 Conclusion

Basketball generates the highest average student engagement (0.557), with a consistent advantage across years and the strongest Freshman engagement of any sport making it the highest-return investment for early campus integration. Soccer leads on cultural adoption (54.7%), suggesting a different, slower-burning pathway to culture-building that rewards long-term immersion programs over high-frequency events.

Football earns the highest simulation reward (0.705), particularly at low action intensities, while reward is relatively engagement-agnostic the system appears to compensate low-engagement sports with higher rewards to maintain exploration.

Peer influence and social media have divergent effects on engagement versus adoption: social media builds cultural identification without necessarily driving behavioral participation, while peer influence marginally suppresses autonomous engagement even as it supports passive cultural osmosis.

The most actionable finding may be Q6: students with lower baseline interest are more likely to deeply adopt sports culture than those who arrive already engaged. Programming targeted at the unconverted not just the enthusiastic may deliver the highest cultural adoption ROI.