Shaping Dietary Perceptions: An LLM-based Content Analysis of Protein Discourse on Instagram

Dario Moratti
Master Digital Healthcare, St. Pölten University of Applied Sciences 2025

Aim and Research Question(s)

This thesis explored the portrayal of the "Protein Hype" on Instagram, examining its communicative tone and thematic focus. The analysis systematically investigated patterns across different influencer types, protein sources, and food processing levels to provide a comprehensive picture of the discourse. A key methodological objective was to evaluate the plausibility of using Large Language Model (LLM)–assisted coding as a transparent and reliable tool for social media content analysis

Background

The “Protein Hype” has become one of the most visible nutrition trends of the past decade. While protein is physiologically essential [1], its online portrayal often exaggerates benefits and obscures risks, particularly through the promotion of ultra-processed high-protein products [2]. Instagram plays an important role in shaping health discourses, especially for younger audiences. Influencers act as informal health communicators who often blend scientific claims with commercial promotion. Despite this prominence, the symbolic construction of protein in digital culture has so far received little academic attention.

Methods

A dataset of 272 public Instagram posts was collected in early 2025. Coding was based on a predefined categorical framework, developed from existing literature and piloted on sample posts. The framework covered eleven dimensions, including Tonality, Contextual Framing, Protein Source, NOVA Classification, Advertising Type, Creator Type, Gender Representation, Visual and Linguistic Framing. A hybrid workflow was applied: LLM-assisted classification with GPT-4o using structured prompts and annotated examples; manual coding by the author of 79 posts; and an intercoder reliability test, in which 30 posts were independently analysed by a second coder.

Results and Discussion

The portrayal of protein was dominated by explicitly promotional and motivational/personal tones. Fitness and lifestyle contexts contained the highest share of promotional posts, while scientific or pseudo-scientific contexts showed more balance and included critical perspectives. Animal- and plant-based proteins were most often presented in promotional ways, whereas mixed sources leaned towards motivational/personal framing. Ultra-processed products (NOVA 4) were strongly linked with promotional content. Intercoder reliability was high between author and LLM (M = 90.0%, κ = 0.74), while comparison with a second coder produced moderate agreement (M = 78.3%, κ = 0.49)). Peer reviewers judged the plausibility of LLM classifications as very high across all tested dimensions (M > 4.65/5).

Conclusion

Protein on Instagram is framed less as a nutrient and more as a lifestyle commodity, reinforcing simplified and commercialised health ideals. The hybrid coding workflow showed that LLMs, when embedded in a human-supervised process, can reliably support social media content analysis. Beyond this study, LLMs hold potential as scalable tools for analysing complex, multimodal datasets in digital health research. The findings highlight the importance of strengthening media and nutrition literacy in digital health communication and open paths for further cross-platform research.

References

[1] Ortega et al. (2024). Nutrients, 16(11):1697.

[2] Denniss et al. (2023). Nutrients, 15(10):2332.