Initial Progress of Identification of the Appropriate NLP Technique for Content Evaluation in Textual Conversations of People Infected by Sars-Cov-2
DOI:
https://doi.org/10.53591/easi.v2i3.2488Keywords:
Python, Google Collab, NLP, LSTMAbstract
When Covid-19 became a pandemic on March 2020, an urgent need arose for reliable info and advice, so Virtual Assistants were created to help teach the public how to avoid the Alpha variant. But when new variants like Beta, Delta, and Omicron appeared with different symptoms, they caused new waves of infections and deaths. To tackle this, a Natural Language Processing prototype was created to analyze experiences of 4422 people, who had been infected in Ecuador, and to detect which symptoms were most common in their conversations. This study prompted the creation of the NLP prototype, using Python language, the Google Collab platform, two combinations of NLP techniques were considered, measuring results through quality metrics, accuracy, Recall, F1, finding that the most appropriate combination of techniques of the two tested the one that gave the highest effectiveness for a Multi-Label classifier model, including Stop Word, Tokenization, Stemming with LSTM (Long Short-Term Memory) classifier, as a first advance of the study.
References
Attal, M. (Diciembre de 2021). Mapeo de incrustaciones de Word con Word2vec. https://datascientest.com/es/nlp-natural-language-processing-introduccion
Bonilla, G. J. (Mayo de 2020). Las dos caras de la educación en el Covid-19. Cuestiones de Administración, 9(2). https://doi.org/10.33210/ca.v9i2.294 Brownlee, J. (Octubre de 2017). Machine Learning Mastery. https://machinelearningmastery.com/gentle-introduction-bag-words-model/
Campos, C. (Febrero de 2020). Fases del proceso de investigación científica y elementos de la investigación cuantitativa y cualitativa. https://www.scribd.com/document/447304281/Actividad-N-02-Fases-del-proyecto-de-Investigacion-Cientifica-inv-cualitativa-y-cuantitativa
Chen, P. H. (Septiembre de 2019). Essential Elements of Natural Language Processing: What the Radiologist Should Know. Academic Radiology. https://doi.org/10.1016/j.acra.2019.08.010 Coronel y Pérez (Abril de 2020). Covid-19 y efectos. https://www.coronelyperez.com/2020/04/23/la-crisis-ocasionada-por-el-covid-19-y-sus-implicaciones-legales-en-el-ecuador/
Haleem, RV. (Agosto de 2020). Artificial Intelligence (AI) applications for COVID-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 14(4), 337-339. https://doi.org/10.1016/j.dsx.2020.04.012
Instituto Nacional de Estadística y Censos (INEC). (2023). Encuesta de salud y nutrición (ENSANUT). Recuperado de https://www.ecuadorencifras.gob.ec/encuesta-de-salud-y-nutricion-ensanut/ Johnson, D. (Enero de 2022). What is Natural Language Processing?. https://www.guru99.com/nlp-tutorial.html
Kohlbacher, F. (2006). The use of qualitative content analysis in case study research. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 7(1), Art. 21. Recuperado de http://www.qualitative-research.net/index.php/fqs/article/view/75/153
Labarthe, S. (2020). ¿Qué pasa en Ecuador?. https://www.nuso.org/articulo/que-pasa-en-ecuador/
LIMA, A. (2021). PNL CÓMO FUNCIONA LA TOKENIZACIÓN DE TEXTO, ORACIONES Y PALABRAS. https://es.acervolima.com/pnl-como-funciona-la-tokenizacion-de-texto-oraciones-y-palabras/
León, E. (Diciembre de 2020). Procesamiento del lenguaje natural (PLN) con Python. Baoss Analytics Everywhere. https://www.baoss.es/procesamiento-del-lenguaje-natural-pln-con-python López, I. P. (2018). Análisis comparativo de algoritmos de Deep Learning para la clasificación de textos. https://e-archivo.uc3m.es/bitstream/handle/10016/29209/TFG_Ivan_Lopez_Pacheco_2018.pdf?sequence=1 Microsoft. (2021). Tecnología de procesamiento de lenguaje natural. https://docs.microsoft.com/es-es/azure/architecture/data-guide/technology-choices/natural-language-processing
Ministerio De Salud Pública. (Marzo de 2020). Informe De Situación Covid-19 Ecuador. https://www.gestionderiesgos.gob.ec/wp-content/uploads/2020/03/informe-de-situaci%c3%b3n-no008-casos-coronavirus-ecuador-16032020-20h00.pdf
OMS. (Marzo de 2020). La OMS caracteriza a COVID-19 como una pandemia. Recuperado de https://www.paho.org/es/noticias/11-3-2020-oms-caracteriza-covid-19-como-pandemia
Pedregosa, F., Varoquaux, G. & Gramfort, et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825—2830. https://scikit-learn.org/stable/modules/multiclass.html
Pham, B. (Febrero de 2020). Parts of Speech Tagging: Rule-Based. Computer and Information Sciences Undergraduate. https://digitalcommons.harrisburgu.edu/cgi/viewcontent.cgi?article=1001&context=cisc_student-coursework
QuestionPro. (Noviembre de 2019). ¿Qué es la investigación cualitativa?. https://www.questionpro.com/es/investigacion-cualitativa.html
Sitiobigdata. (Agosto de 2018). Mejora de incrustaciones de Word con Word2Vec. https://sitiobigdata.com/2018/08/24/mapeo-de-incrustaciones-de-word-con-word2vec/# Solis, L. D. (Febrero de 2020). La entrevista en la investigación cualitativa. https://investigaliacr.com/investigacion/la-entrevista-en-la-investigacion-cualitativa/#:~:text=La%20entrevista%20en%20la%20investigaci%C3%B3n%20cualitativa%20es%20una%20t%C3%A9cnica%20para,a%20prop%C3%B3sitos%20concretos%20del%20estudio.&text=La%20entrev
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Ivan L. Acosta-Guzmán, Eleanor A. Varela-Tapia, Alexandra E. Piza-Guale, Nory X. Acosta-Guzmán, Christopher I. Acosta Varela
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Contributions published in the EASI journal follow the open access license CC BY-NC-ND 4.0 (Creative Commons Attribution-NonCommercial-NoDerivs 4.0). This license empowers you as an author and ensures wide dissemination of your research while still protecting your rights.
For authors:
- Authors retain copyrights without restrictions according to CC BY-NC-ND 4.0 license.
- The journal obtains a license to publish the first original manuscript.
For readers/users:
Free access and distribution: Anyone can access, download, copy, print, and share the published article freely according to the license CC BY-NC-ND 4.0 terms.
Attribution required: If any third party use the published material, they must give credit to the creator by providing the name, article title, and journal name, ensuring the intellectual property of the author(s), and helping to build the scholarly reputation.
Non-commercial use: only noncommercial use of the published work is permitted. Noncommercial means not primarily intended for or directed towards commercial advantage or monetary compensation by any third party.
No modifications allowed: The content of the published article cannot be changed, remixed, or rebuilt upon the author’s work. This ensures the integrity and accuracy of the research findings.