Nowadays, the amount of information in the web is tremendous. Big part of it is presented as articles, descriptions, posts and comments i.e. free text in natural language and it is really hard to make use of it while it is in this format. Whereas, in the structured form it could be used for a lot of purposes. So, the main idea that this paper proposes is an approach for extracting data which is given as a free text in natural language into a structured data for example table. The structured information is easy to search and analyze. The structured data is quantitative, while the unstructured data is qualitative. Overall such tool that enables conversion of a text into a structured data will not only provide automatic mechanism for data extraction but will also save a lot of resources for processing and storing of the extracted data. The data extraction from text will also provide automation of the process of extracting useful insights from data that is usually processed by people. The efficiency of the process as well as its accuracy will increase and the probability of human error will be minimized. The amount of the processed data will no longer be limited by the human resources.
Published in | International Journal of Intelligent Information Systems (Volume 10, Issue 4) |
DOI | 10.11648/j.ijiis.20211004.16 |
Page(s) | 74-80 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2021. Published by Science Publishing Group |
Data Extraction, Structured Data, Unstructured Data, Automation, NLP, RASA
[1] | Holst A. (2021, June 30). Amount of data created, consumed, and stored 2010-2025. https://www.statista.com/statistics/871513/worldwide-data-created/ |
[2] | Bocklisch T., Faulkner J., Pawlowski N., Nichol A. (2017). Rasa: Open Source Language Understanding and Dialogue Management. |
[3] | Petrov. C. (2021, June 30). 25+ Impressive Big Data Statistics for 2021. https://techjury.net/blog/big-data-statistics/#gref |
[4] | Taylor. C. (2021, June 30). Structured vs. Unstructured Data. https://www.datamation.com/big-data/structured-vs-unstructured-data/ |
[5] | Lomotey RK, Deters R. RSenter: terms mining tool from unstructured data sources. Int J Bus Process Integr Manag. 2013; 6 (4): 298. |
[6] | Gantz J, Reinsel D. The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. IDC iView IDC Analyze Future. 2012; 2007 (2012): 1–16. |
[7] | Jiao, A. (2020). An intelligent Chatbot system based on entity extraction USING Rasa NLU and neural network. Journal of Physics: Conference Series, 1487. |
[8] | Bagchi, M. (2020). Conceptualising a Library chatbot using open Source Conversational artificial intelligence. DESIDOC Journal of Library & Information Technology. |
[9] | RASA. (2020, July 27) Introducing DIET: state-of-the-art architecture that outperforms fine-tuning BERT and is 6X faster to train. https://blog.rasa.com/introducing-dual-intent-andentity-transformer-diet-state-of-the-art-performanceon-a-lightweight-architecture/. |
[10] | Wochinger, T. (2019, June 4). Rasa NLU in DEPTH: INTENT CLASSIFICATION. The Rasa Blog: Conversational AI Platform, Powered by Open Source. https://blog.rasa.com/rasa-nlu-in-depth-part-1-intent-classification/. |
[11] | Wochinger, T. (2019, June 4). Rasa NLU in DEPTH: Entity recognition. The Rasa Blog: Conversational AI Platform, Powered by Open Source. https://blog.rasa.com/rasa-nlu-in-depth-part-2-entity-recognition/. |
[12] | Baldauf, Matthias & Dustdar, Schahram & Rosenberg, Florian. (2007). A Survey on context-aware systems. Information Systems. 2. 10.1504/IJAHUC.2007.014070. |
[13] | Zola, A. (2021, March 31). The 5 best programming languages for AI. Springboard Blog. https://www.springboard.com/blog/ai-machine-learning/best-programming-language-for-ai/. |
[14] | Mendonca, Sandro & Brito, Yvan & Santos, Carlos & Lima, Rodrigo & Araujo, Tiago & Meiguins, Bianchi. (2020). Synthetic Datasets Generator for Testing Information Visualization and Machine Learning Techniques and Tools. IEEE Access. PP. 1-1. 10.1109/ACCESS.2020.2991949. |
[15] | Wrembel, Robert, and Christian Koncilia. Data Warehouses and Olap: Concepts, Architectures, and Solutions. IRM Press, 2007. |
[16] | spaCy · INDUSTRIAL-STRENGTH natural language processing in Python. · Industrial-strength Natural Language Processing in Python. (2020, July 30). https://spacy.io/. |
[17] | Loper, E., & Bird, S. Nltk: The natural Language Toolkit. |
[18] | Popić, Srđan & Velikic, Ivan & Teslic, Nikola & Pavkovic, Bogdan. (2019). Data generators: a short survey of techniques and use cases with focus on testing. 10.1109/ICCE-Berlin47944.2019.8966202. |
[19] | G. Albuquerque, T. Lowe and M. Magnor, "Synthetic Generation of High-Dimensional Datasets," in IEEE Transactions on Visualization and Computer Graphics, vol. 17, no. 12, pp. 2317-2324, Dec. 2011, doi: 10.1109/TVCG.2011.237. |
[20] | Rajman M., Besançon R. (1998) Text Mining: Natural Language techniques and Text Mining applications. In: Spaccapietra S., Maryanski F. (eds) Data Mining and Reverse Engineering. IFIP — The International Federation for Information Processing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-35300-5_3 |
[21] | Hotho, Andreas & Nürnberger, Andreas & Paass, Gerhard. (2005). A Brief Survey of Text Mining. LDV Forum - GLDV Journal for Computational Linguistics and Language Technology. 20. 19-62. |
[22] | Gupta, Vishal & Lehal, Gurpreet. (2009). A Survey of Text Mining Techniques and Applications. Journal of Emerging Technologies in Web Intelligence. 1. 10.4304/jetwi.1.1.60-76. |
APA Style
Zheni Mincheva, Nikola Vasilev, Ventsislav Nikolov, Anatoliy Antonov. (2021). Extracting Structured Data from Text in Natural Language. International Journal of Intelligent Information Systems, 10(4), 74-80. https://doi.org/10.11648/j.ijiis.20211004.16
ACS Style
Zheni Mincheva; Nikola Vasilev; Ventsislav Nikolov; Anatoliy Antonov. Extracting Structured Data from Text in Natural Language. Int. J. Intell. Inf. Syst. 2021, 10(4), 74-80. doi: 10.11648/j.ijiis.20211004.16
AMA Style
Zheni Mincheva, Nikola Vasilev, Ventsislav Nikolov, Anatoliy Antonov. Extracting Structured Data from Text in Natural Language. Int J Intell Inf Syst. 2021;10(4):74-80. doi: 10.11648/j.ijiis.20211004.16
@article{10.11648/j.ijiis.20211004.16, author = {Zheni Mincheva and Nikola Vasilev and Ventsislav Nikolov and Anatoliy Antonov}, title = {Extracting Structured Data from Text in Natural Language}, journal = {International Journal of Intelligent Information Systems}, volume = {10}, number = {4}, pages = {74-80}, doi = {10.11648/j.ijiis.20211004.16}, url = {https://doi.org/10.11648/j.ijiis.20211004.16}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijiis.20211004.16}, abstract = {Nowadays, the amount of information in the web is tremendous. Big part of it is presented as articles, descriptions, posts and comments i.e. free text in natural language and it is really hard to make use of it while it is in this format. Whereas, in the structured form it could be used for a lot of purposes. So, the main idea that this paper proposes is an approach for extracting data which is given as a free text in natural language into a structured data for example table. The structured information is easy to search and analyze. The structured data is quantitative, while the unstructured data is qualitative. Overall such tool that enables conversion of a text into a structured data will not only provide automatic mechanism for data extraction but will also save a lot of resources for processing and storing of the extracted data. The data extraction from text will also provide automation of the process of extracting useful insights from data that is usually processed by people. The efficiency of the process as well as its accuracy will increase and the probability of human error will be minimized. The amount of the processed data will no longer be limited by the human resources.}, year = {2021} }
TY - JOUR T1 - Extracting Structured Data from Text in Natural Language AU - Zheni Mincheva AU - Nikola Vasilev AU - Ventsislav Nikolov AU - Anatoliy Antonov Y1 - 2021/08/31 PY - 2021 N1 - https://doi.org/10.11648/j.ijiis.20211004.16 DO - 10.11648/j.ijiis.20211004.16 T2 - International Journal of Intelligent Information Systems JF - International Journal of Intelligent Information Systems JO - International Journal of Intelligent Information Systems SP - 74 EP - 80 PB - Science Publishing Group SN - 2328-7683 UR - https://doi.org/10.11648/j.ijiis.20211004.16 AB - Nowadays, the amount of information in the web is tremendous. Big part of it is presented as articles, descriptions, posts and comments i.e. free text in natural language and it is really hard to make use of it while it is in this format. Whereas, in the structured form it could be used for a lot of purposes. So, the main idea that this paper proposes is an approach for extracting data which is given as a free text in natural language into a structured data for example table. The structured information is easy to search and analyze. The structured data is quantitative, while the unstructured data is qualitative. Overall such tool that enables conversion of a text into a structured data will not only provide automatic mechanism for data extraction but will also save a lot of resources for processing and storing of the extracted data. The data extraction from text will also provide automation of the process of extracting useful insights from data that is usually processed by people. The efficiency of the process as well as its accuracy will increase and the probability of human error will be minimized. The amount of the processed data will no longer be limited by the human resources. VL - 10 IS - 4 ER -