Due to the rapid growth of large size text data from Internet sources like Twitter, social media platforms have become the more popular sources to be utilized to extract information. The extracted text information is then further converted to number through a series of data transformation and then analyzed through text analytics models for decision-making problems. Among the text analytics models, one particular common and popular one is based on Latent Dirichlet Allocation (LDA), which is a topic model method with the topics being clusters of words in the documents associated with fitted multivariate statistical distributions. However, these models are often poor estimators of topic proportions. Hence, this paper proposes a timely topic score technique for social media text data visualization, which is based on a point system from topic models to support text signaling. This importance score system is intended to mitigate the weakness of topic models by employing the topic proportion outputs and assigning importance points to present text topic trends. The technique then generates visualization tools to show topic trends over the studied time period and then further facilitate decision-making problems. Finally, this paper studies two real-life case examples from Twitter text sources and illustrates the efficiency of the methodology.
Published in | American Journal of Management Science and Engineering (Volume 4, Issue 3) |
DOI | 10.11648/j.ajmse.20190403.12 |
Page(s) | 49-55 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2019. Published by Science Publishing Group |
Text Analytics, Natural Language Processing, Cyber Security, Signaling, Pattern Detection, Social Media
[1] | Zaman, T. R., Herbrich, R., Van Gael, J., & Stern, D. (2010, December). Predicting information spreading in Twitter. In Workshop on computational social science and the wisdom of crowds, nips (Vol. 104, No. 45, pp. 17599-601). Citeseer. |
[2] | Allen, T. T., Sui, Z., & Parker, N. L. (2017). Timely decision analysis enabled by efficient social media modeling. Decision Analysis, 14 (4), 250-260. https://doi.org/10.1287/deca.2017.0360. |
[3] | Yang, J., & Counts, S. (2010, May). Predicting the speed, scale, and range of information diffusion in Twitter. In Fourth International AAAI Conference on Weblogs and Social Media. |
[4] | Shah, D., & Zaman, T. (2010). Community detection in networks: The leader-follower algorithm. stat, 1050, 2. |
[5] | Zaman, T., Fox, E. B., & Bradlow, E. T. (2014). A bayesian approach for predicting the popularity of tweets. The Annals of Applied Statistics, 8 (3), 1583-1611. |
[6] | Allen, T. T., & Xiong, H. (2012). Pareto charting using multifield freestyle text data applied to Toyota Camry user reviews. Applied Stochastic Models in Business and Industry, 28 (2), 152-163. |
[7] | Allen, T. T., Xiong, H., & Afful‐Dadzie, A. (2016). A directed topic model applied to call center improvement. Applied Stochastic Models in Business and Industry, 32 (1), 57-73. |
[8] | Allen, T. T., Vinson, S. M., Raqab, A., & Allam, Y. (2013). Using SMERT to Identify Actionable Topics in Student Feedback. Integrated Systems Engineering Technical Report 2013. |
[9] | Blei, D. M., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation Journal of Machine Learning Research (3). |
[10] | Allen, T. T., Sui, Z., & Akbari, K. (2018). Exploratory text data analysis for quality hypothesis generation. Quality Engineering, 30 (4), 701-712. |
[11] | Feldman, R. and Sanger, J. (2007). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press. |
[12] | Porter, M. F. (1980) An algorithm for suffix stripping. Program. 14 (3): 130-137. |
[13] | Teh, Y. W., Newman, D., & Welling, M. (2007). A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In Advances in neural information processing systems (pp. 1353-1360). |
[14] | Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101 (suppl 1), 5228-5235. |
[15] | Carpenter, B. (2010). Integrating out multinomial parameters in latent Dirichlet allocation and naive Bayes for collapsed Gibbs sampling. Rapport Technique, 4, 464. |
APA Style
Zhenhuan Sui. (2019). Social Media Text Data Visualization Modeling: A Timely Topic Score Technique. American Journal of Management Science and Engineering, 4(3), 49-55. https://doi.org/10.11648/j.ajmse.20190403.12
ACS Style
Zhenhuan Sui. Social Media Text Data Visualization Modeling: A Timely Topic Score Technique. Am. J. Manag. Sci. Eng. 2019, 4(3), 49-55. doi: 10.11648/j.ajmse.20190403.12
AMA Style
Zhenhuan Sui. Social Media Text Data Visualization Modeling: A Timely Topic Score Technique. Am J Manag Sci Eng. 2019;4(3):49-55. doi: 10.11648/j.ajmse.20190403.12
@article{10.11648/j.ajmse.20190403.12, author = {Zhenhuan Sui}, title = {Social Media Text Data Visualization Modeling: A Timely Topic Score Technique}, journal = {American Journal of Management Science and Engineering}, volume = {4}, number = {3}, pages = {49-55}, doi = {10.11648/j.ajmse.20190403.12}, url = {https://doi.org/10.11648/j.ajmse.20190403.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajmse.20190403.12}, abstract = {Due to the rapid growth of large size text data from Internet sources like Twitter, social media platforms have become the more popular sources to be utilized to extract information. The extracted text information is then further converted to number through a series of data transformation and then analyzed through text analytics models for decision-making problems. Among the text analytics models, one particular common and popular one is based on Latent Dirichlet Allocation (LDA), which is a topic model method with the topics being clusters of words in the documents associated with fitted multivariate statistical distributions. However, these models are often poor estimators of topic proportions. Hence, this paper proposes a timely topic score technique for social media text data visualization, which is based on a point system from topic models to support text signaling. This importance score system is intended to mitigate the weakness of topic models by employing the topic proportion outputs and assigning importance points to present text topic trends. The technique then generates visualization tools to show topic trends over the studied time period and then further facilitate decision-making problems. Finally, this paper studies two real-life case examples from Twitter text sources and illustrates the efficiency of the methodology.}, year = {2019} }
TY - JOUR T1 - Social Media Text Data Visualization Modeling: A Timely Topic Score Technique AU - Zhenhuan Sui Y1 - 2019/07/26 PY - 2019 N1 - https://doi.org/10.11648/j.ajmse.20190403.12 DO - 10.11648/j.ajmse.20190403.12 T2 - American Journal of Management Science and Engineering JF - American Journal of Management Science and Engineering JO - American Journal of Management Science and Engineering SP - 49 EP - 55 PB - Science Publishing Group SN - 2575-1379 UR - https://doi.org/10.11648/j.ajmse.20190403.12 AB - Due to the rapid growth of large size text data from Internet sources like Twitter, social media platforms have become the more popular sources to be utilized to extract information. The extracted text information is then further converted to number through a series of data transformation and then analyzed through text analytics models for decision-making problems. Among the text analytics models, one particular common and popular one is based on Latent Dirichlet Allocation (LDA), which is a topic model method with the topics being clusters of words in the documents associated with fitted multivariate statistical distributions. However, these models are often poor estimators of topic proportions. Hence, this paper proposes a timely topic score technique for social media text data visualization, which is based on a point system from topic models to support text signaling. This importance score system is intended to mitigate the weakness of topic models by employing the topic proportion outputs and assigning importance points to present text topic trends. The technique then generates visualization tools to show topic trends over the studied time period and then further facilitate decision-making problems. Finally, this paper studies two real-life case examples from Twitter text sources and illustrates the efficiency of the methodology. VL - 4 IS - 3 ER -