Nowadays, the scale of real network is increasing day by day, while also brings sparse problems. It is usually necessary to maintain a large number of product information. To organize this product information, a feasible way is to add semantic tags to the information. In this article, we aim to solve the problem of semantic annotation of on-demand printing products. Based on good properties of random walk in global networks, we deal with the sparsity problem by applying it, and then propose an efficient ProRWR algorithm. Firstly, it processes the text description dataset of printed products based on TF-IDF algorithm, and builds “product-term” bipartite network. Secondly, ProRWR builds square matrix using the TF-IDF weight matrix, rewrite the equation of random walk, and use the normalized square matrix as the input of rewrite ProRWR algorithm. By random walks, terms with the highest convergence probability in each product document are selected as the most relevant feature terms of the product. A large number of experiments have been done on Amazon dataset. The results show that the precision and recall of our algorithm are 73.5% and 60%, respectively, indicating that ProRWR has discovered the potential semantic association and implemented the semantic annotation of on-demand printed products.
Published in | American Journal of Neural Networks and Applications (Volume 5, Issue 1) |
DOI | 10.11648/j.ajnna.20190501.15 |
Page(s) | 28-35 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2019. Published by Science Publishing Group |
TF-IDF, Random Walk, Semantic Annotation
[1] | Kiryakov A, Popov B, Terziev I, et al. “Semantic annotation, indexing, and retrieval”. Web Semantics: Science, Services and Agents on the World Wide Web, Vol.2, No. 1, pp. 49-79, 2004. |
[2] | Yu M, Han X, Gou X, et al. “Content-based social network user interest tag extraction”. International Journal of Database Theory and Application, Vol. 8, No. 2, pp. 107-118, 2015. |
[3] | Jain S, Khangarot H, Singh S. “Journal Recommendation System Using Content-Based Filtering”. Recent Developments in Machine Learning and Data Analytics. Springer, Singapore, 2019: 99-108. |
[4] | Lu Kai, Zhang Guanyuan, Wan Bin. “CICF: a context information based collaborative filtering algorithm”. Journal of Chinese Information Processing, Vol. 28, No. 2, pp. 122-128, 2014. |
[5] | Canito A, Marreiros G, Corchado J M. “Automatic Document Annotation with Data Mining Algorithms”. World Conference on Information Systems and Technologies. Springer, Cham, 2019, pp. 68-76. |
[6] | Zhao W X, Wang J, He Y, et al. “Mining product adopter information from online reviews for improving product recommendation”. ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 10, No. 3, pp. 29, 2016. |
[7] | Bandyopadhyay S, Thakur S S, Mandal J K. “Product Recommendation for E-Commerce Data Using Association Rule and Apriori Algorithm”. International Conference on Modelling and Simulation, Springer, Cham, pp. 585-593, 2017. |
[8] | Gao G, Liu Y S, Lin P, et al. “BIMTag: Concept-based automatic semantic annotation of online BIM product resources”. Advanced Engineering Informatics, Vol. 31, pp. 48-61, 2017. |
[9] | Verma Y, Jawahar C V. “Image annotation by propagating labels from semantic neighbourhoods”. International Journal of Computer Vision, Vol. 121, No. 1, pp. 126-148, 2017. |
[10] | Halder A, Dobe O.“Detection of tumor in brain MRI using fuzzy feature selection and support vector machine”. 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI), IEEE, 2016, pp. 1919-1923. |
[11] | Gupta V, Karnick H, Bansal A, et al. “Product classification in e-commerce using distributional semantics”. arXiv preprint arXiv:1606.06083, 2016. |
[12] | Ravale U, Marathe N, Padiya P. “Feature selection based hybrid anomaly intrusion detection system using K means and RBF kernel function”. Procedia Computer Science, Vol. 45, pp. 428-435, 2015. |
[13] | Pang L, Lan Y, Guo J, et al. “Text matching as image recognition”. Thirtieth AAAI Conference on Artificial Intelligence, 2016. |
[14] | Yih W T, Goodman J, Carvalho V R. “Finding advertising keywords on web pages”. International Conference on World Wide Web, DBLP, pp. 213, 2006. |
[15] | Matsuo Y, Ishizuka M. “Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information”. International Journal on Artificial Intelligence Tools, Vol. 13, No. 01, pp. 157-169, 2008. |
[16] | Jung J. “Random walk with restart on large graphs using block elimination”. ACM Transactions on Database Systems, Vol. 41, No. 2, pp. 1-43, 2016. |
[17] | Jung J, Park N, Lee S, et al. “BepI: Fast and memory-efficient method for billion-scale random walk with restart”. Proceedings of the 2017 ACM International Conference on Management of Data, ACM, pp. 789-804, 2017. |
[18] | Zhou Y, Cheng H, Yu J X. “Graph clustering based on structural/attribute similarities”. Proceedings of the VLDB Endowment, Vol. 2, No. 1, pp. 718-729, 2009. |
[19] | Zhang M L, Zhou Z H. “A k-nearest neighbor based algorithm for multi-label classification”. GrC, Vol. 5, pp. 718-721, 2005. |
[20] | Hollocou A, Bonald T, Lelarge M. “Multiple local community detection”. ACM SIGMETRICS Performance Evaluation Review, Vol. 45, No. 3, pp. 76-83, 2018. |
[21] | Zhiyuli A, Liang X, Chen Y. “HSEM: highly scalable node embedding for link prediction in very large-scale social networks”. World Wide Web, 2018, pp. 1-26. |
[22] | Ahmed R, Baali I, Erten C, et al. “MEXCOWalk: Mutual Exclusion and Coverage Based Random Walk to Identify Cancer Modules”. bioRxiv, 2019, pp. 547653. |
[23] | Han C, Luo Z, Gu W, et al. “A Random Walk Tensor Model for Heterogeneous Network Entity Classification”. IEEE Access, 2019. |
[24] | Zhang J, Tao T, Mu Y, et al. “Web image annotation based on Tri-relational Graph and semantic context analysis”. Engineering Applications of Artificial Intelligence, Vol. 81, pp. 313-322, 2019. |
[25] | Tong, Hanghang, Faloutsos, et al. “Fast Random Walk with Restart and Its Applications”. International Conference on Data Mining, pp. 613-622, 2006. |
[26] | Yu W. “Reverse Top-k Search Using Random Walk with Restart”. PVLDB, Vol. 7, No. 5, pp. 401–412, 2014. |
[27] | Zhou Y, Cheng H, Yu J X. “Graph Clustering Based on Structural/Attribute Similarities”. PVLDB, Vol. 2, No. 1, pp. 718-729, 2009. |
[28] | Tong H, Faloutsos C. “Center-piece subgraphs: problem definition and fast solutions”. Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, August 20-23, ACM, pp.404-413, 2006. |
[29] | Jung J, Jin W, Sael L, et al. “Personalized ranking in signed networks using signed random walk with restart”. 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, pp. 973-978, 2016. |
[30] | Wang S, Tang Y, Xiao X, et al. “HubPPR: effective indexing for approximate personalized pagerank”. Proceedings of the VLDB Endowment, Vol. 10, No. 3, pp. 205-216, 2016. |
[31] | Yu W, McCann J. “Random walk with restart over dynamic graphs”. 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, pp. 589-598, 2016. |
[32] | Yoon M, Jin W, Kang U. “Fast and accurate random walk with restart on dynamic graphs with guarantees”. Proceedings of the 2018 World Wide Web Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp. 409-418, 2018. |
[33] | Guo A, Yang T. “Research and improvement of feature words weight based on TFIDF algorithm”. 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, IEEE, pp. 415-419, 2016. |
[34] | Yin L. “Chinese Keyword Extraction Based on Weighted Complex Network”. International Conference on Intelligent Systems and Knowledge Engineering, Vol. 12, pp. 1-5, 2017. |
[35] | Li Y, Shen B. “Research on sentiment analysis of microblogging based on LSA and TF-IDF”. 2017 3rd IEEE International Conference on Computer and Communications (ICCC), IEEE, pp. 2584-2588, 2017. |
APA Style
Mingxi Zhang, Guanying Su. (2019). Random Walk-Based Semantic Annotation for On-demand Printing Products. American Journal of Neural Networks and Applications, 5(1), 28-35. https://doi.org/10.11648/j.ajnna.20190501.15
ACS Style
Mingxi Zhang; Guanying Su. Random Walk-Based Semantic Annotation for On-demand Printing Products. Am. J. Neural Netw. Appl. 2019, 5(1), 28-35. doi: 10.11648/j.ajnna.20190501.15
AMA Style
Mingxi Zhang, Guanying Su. Random Walk-Based Semantic Annotation for On-demand Printing Products. Am J Neural Netw Appl. 2019;5(1):28-35. doi: 10.11648/j.ajnna.20190501.15
@article{10.11648/j.ajnna.20190501.15, author = {Mingxi Zhang and Guanying Su}, title = {Random Walk-Based Semantic Annotation for On-demand Printing Products}, journal = {American Journal of Neural Networks and Applications}, volume = {5}, number = {1}, pages = {28-35}, doi = {10.11648/j.ajnna.20190501.15}, url = {https://doi.org/10.11648/j.ajnna.20190501.15}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajnna.20190501.15}, abstract = {Nowadays, the scale of real network is increasing day by day, while also brings sparse problems. It is usually necessary to maintain a large number of product information. To organize this product information, a feasible way is to add semantic tags to the information. In this article, we aim to solve the problem of semantic annotation of on-demand printing products. Based on good properties of random walk in global networks, we deal with the sparsity problem by applying it, and then propose an efficient ProRWR algorithm. Firstly, it processes the text description dataset of printed products based on TF-IDF algorithm, and builds “product-term” bipartite network. Secondly, ProRWR builds square matrix using the TF-IDF weight matrix, rewrite the equation of random walk, and use the normalized square matrix as the input of rewrite ProRWR algorithm. By random walks, terms with the highest convergence probability in each product document are selected as the most relevant feature terms of the product. A large number of experiments have been done on Amazon dataset. The results show that the precision and recall of our algorithm are 73.5% and 60%, respectively, indicating that ProRWR has discovered the potential semantic association and implemented the semantic annotation of on-demand printed products.}, year = {2019} }
TY - JOUR T1 - Random Walk-Based Semantic Annotation for On-demand Printing Products AU - Mingxi Zhang AU - Guanying Su Y1 - 2019/07/04 PY - 2019 N1 - https://doi.org/10.11648/j.ajnna.20190501.15 DO - 10.11648/j.ajnna.20190501.15 T2 - American Journal of Neural Networks and Applications JF - American Journal of Neural Networks and Applications JO - American Journal of Neural Networks and Applications SP - 28 EP - 35 PB - Science Publishing Group SN - 2469-7419 UR - https://doi.org/10.11648/j.ajnna.20190501.15 AB - Nowadays, the scale of real network is increasing day by day, while also brings sparse problems. It is usually necessary to maintain a large number of product information. To organize this product information, a feasible way is to add semantic tags to the information. In this article, we aim to solve the problem of semantic annotation of on-demand printing products. Based on good properties of random walk in global networks, we deal with the sparsity problem by applying it, and then propose an efficient ProRWR algorithm. Firstly, it processes the text description dataset of printed products based on TF-IDF algorithm, and builds “product-term” bipartite network. Secondly, ProRWR builds square matrix using the TF-IDF weight matrix, rewrite the equation of random walk, and use the normalized square matrix as the input of rewrite ProRWR algorithm. By random walks, terms with the highest convergence probability in each product document are selected as the most relevant feature terms of the product. A large number of experiments have been done on Amazon dataset. The results show that the precision and recall of our algorithm are 73.5% and 60%, respectively, indicating that ProRWR has discovered the potential semantic association and implemented the semantic annotation of on-demand printed products. VL - 5 IS - 1 ER -