YouTube Transcripts Word Frequency Measure

Authors

  • Vincent Smith University of Charleston
  • Michael Garrett University of Charleston
  • Austin Harwood University of Charleston, USA
  • James Shamblin University of Charleston, USA

DOI:

https://doi.org/10.61320/jolcc.v1i2.91-99

Keywords:

Word Frequency, Youtube, Computer Science, Data, Data Analytics, Analysis

Abstract

Many YouTube videos provide written audio transcripts which provide information on the language used on YouTube. One important measure relating to language usage is word frequency. Using student-developed software and libraries in R, Python, and Microsoft Excel, the transcripts of one million YouTube videos from the YouTube-8M data set were scraped and analyzed. The word frequency of the YouTube data set was shown to correlate with commonly used word frequency measures from established studies, such as the subtitle word frequency and the HAL word frequency.

References

References:

Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., & Vijayanarasimhan, S. (2016). Youtube-8m: A large-scale video classification benchmark. arXiv preprint. https://doi.org/10.48550/arXiv.1609.08675

Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., ... & Treiman, R. (2007). The English lexicon project. Behavior research methods, 39(3), 445-459. https://doi.org/10.3758/bf03193014

Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior research methods, 41(4), 977-990. https://doi.org/10.3758/brm.41.4.977

Brysbaert, M., Mandera, P., & Keuleers, E. (2018). The word frequency effect in word processing: An updated review. Current Directions in Psychological Science, 27(1), 45-50. http://doi.org/10.1177/0963721417727521

Ceci, L. (2022). YouTube – Statistics & Facts. Statista. https://www.statista.com/topics/2019/youtube/#topicHeader__wrapper

Cicconet, M. (2013, April 7). YouTube is not just a site for entertainment, but education. Washington Square News. https://nyunews.com/2013/04/07/cicconet-13/

Dryer

Johns, B. T., Dye, M., & Jones, M. N. (2016). The influence of contextual diversity on word learning. Psychonomic bulletin & review, 23(4), 1214-1220. https://doi.org/10.3758/s13423-015-0980-7

Mohan, S., & Punathambekar, A. (2019). Localizing YouTube: Language, cultural regions, and digital platforms. International Journal of Cultural Studies, 22(3), 317-333. https://doi.org/10.1177/1367877918794681

Zhao, K., Shi, N., Sa, Z., Wang, H. X., Lu, C. H., & Xu, X. Y. (2020). Text mining and analysis of treatise on febrile diseases based on natural language processing. World Journal of Traditional Chinese Medicine, 6(1), 67. https://doi.org/10.4103/wjtcm.wjtcm_28_19

Downloads

Published

2023-09-11

How to Cite

Smith, V., Garrett, M., Harwood, A., & Shamblin, J. (2023). YouTube Transcripts Word Frequency Measure. Journal of Linguistics, Culture and Communication, 1(2), 91–99. https://doi.org/10.61320/jolcc.v1i2.91-99

Similar Articles

1 2 3 > >> 

You may also start an advanced similarity search for this article.