Journal of Information Systems Engineering and Management

Automated Readability Assessment for Spanish e-Government Information
Jorge Morato 1 * , Ana Iglesias 1, Adrián Campillo 1, Sonia Sanchez-Cuadrado 2
More Detail
1 Computer Science Department, Universidad Carlos III de Madrid, Leganes, SPAIN
2 Library and Information Sc. Dep., Universidad Complutense de Madrid, Madrid, SPAIN
* Corresponding Author
Research Article

Journal of Information Systems Engineering and Management, 2021 - Volume 6 Issue 2, Article No: em0137
https://doi.org/10.29333/jisem/9620

Published Online: 21 Jan 2021

Views: 2173 | Downloads: 1824

How to cite this article
APA 6th edition
In-text citation: (Morato et al., 2021)
Reference: Morato, J., Iglesias, A., Campillo, A., & Sanchez-Cuadrado, S. (2021). Automated Readability Assessment for Spanish e-Government Information. Journal of Information Systems Engineering and Management, 6(2), em0137. https://doi.org/10.29333/jisem/9620
Vancouver
In-text citation: (1), (2), (3), etc.
Reference: Morato J, Iglesias A, Campillo A, Sanchez-Cuadrado S. Automated Readability Assessment for Spanish e-Government Information. J INFORM SYSTEMS ENG. 2021;6(2):em0137. https://doi.org/10.29333/jisem/9620
AMA 10th edition
In-text citation: (1), (2), (3), etc.
Reference: Morato J, Iglesias A, Campillo A, Sanchez-Cuadrado S. Automated Readability Assessment for Spanish e-Government Information. J INFORM SYSTEMS ENG. 2021;6(2), em0137. https://doi.org/10.29333/jisem/9620
Chicago
In-text citation: (Morato et al., 2021)
Reference: Morato, Jorge, Ana Iglesias, Adrián Campillo, and Sonia Sanchez-Cuadrado. "Automated Readability Assessment for Spanish e-Government Information". Journal of Information Systems Engineering and Management 2021 6 no. 2 (2021): em0137. https://doi.org/10.29333/jisem/9620
Harvard
In-text citation: (Morato et al., 2021)
Reference: Morato, J., Iglesias, A., Campillo, A., and Sanchez-Cuadrado, S. (2021). Automated Readability Assessment for Spanish e-Government Information. Journal of Information Systems Engineering and Management, 6(2), em0137. https://doi.org/10.29333/jisem/9620
MLA
In-text citation: (Morato et al., 2021)
Reference: Morato, Jorge et al. "Automated Readability Assessment for Spanish e-Government Information". Journal of Information Systems Engineering and Management, vol. 6, no. 2, 2021, em0137. https://doi.org/10.29333/jisem/9620
ABSTRACT
This paper automatically evaluates the readability of Spanish e-government websites. Specifically, the websites collected explain e-government administrative procedures. The evaluation is carried out through the analysis of different linguistic characteristics that are presumably associated with a better understanding of these resources. To this end, texts from websites outside the government websites have been collected. These texts clarify the procedures published on the Spanish Government’s websites. These websites constitute the part of the corpus considered as the set of easy documents. The rest of the corpus has been completed with counterpart documents from government websites. The text of the documents has been processed, and the difficulty is evaluated through different classic readability metrics. At a later stage, automatic learning methods are used to apply algorithms to predict the difficulty of the text. The results of the study show that government web pages show high values for comprehension difficulty. This work proposes a new Spanish-language corpus of official e-government websites. In addition, a large number of combined linguistic attributes are applied, which improve the identification of the level of comprehensibility of a text with respect to classic metrics.
KEYWORDS
REFERENCES
  • Benjamin, R. G. (2012). Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty. Educational Psychology Review, 24(1), 63-88. https://doi.org/10.1007/s10648-011-9181-8
  • Campillo, A., Morato, J., Maqueda, A. I. and Sanchez-Cuadrado, S. (2020). Readability of Spanish e-government information. 2020 15th Iberian Conference on Information Systems and Technologies (CISTI), Seville, 25-27 June, IEEE, 1-4.
  • Capito project. (n.d.). Available at: https://www.capito.eu/ (Accessed: 1 January 2021).
  • Curto, P., Mamede, N. and Baptista, J. (2015). Automatic Text Difficulty Classifier - Assisting the Selection of Adequate Reading Materials for European Portuguese Teaching, in Markus Helfert, Maria Teresa Restivo, Susan Zvacek, James Uhomoibhi (eds.) Proceedings of the 7th International Conference on Computer Supported Education, INSTICC, 23 - 25 May, 2015. Setubal: Scitepress, 1, 36-44.
  • Dale, E. and Chall, J. S. (1948). A Formula for Predicting Readability. Educational Research Bulletin, 27(1), 11-28.
  • DuBay, W. H. (2007). Smart Language Readers, Readability, and the Grading of Text. Costa Mesa: Impact Information.
  • European Commission. (2019). Clear writing for Europe Conference. Available at: https://ec.europa.eu/info/sites/info/files/clear_writing_conference_notes_for_website.pdf (Accessed: 1 January 2021).
  • European Commission. (2020a). The Digital Economy and Society Index (DESI): Shaping Europe’s digital future. Available at: https://ec.europa.eu/digital-single-market/en/digital-economy-and-society-index-desi (Accessed: 1 January 2021).
  • European Commission. (2020b). Commission Style Guide. Available at: https://wikis.ec.europa.eu/download/attachments/6824833/commission_style_guide.pdf?version=1&modificationDate=1594633342434&api=v2 (Accessed: 1 January 2021).
  • FALC project. (n.d.). Available at: https://www.ideographik.org/communication/ (Accessed: 1 January 2021).
  • Fernández-Huerta, J. (1959). Medidas sencillas de lecturabilidad [Simple readability measures]. Consigna, 214, 29-32.
  • François, T. and Fairon, C. (2012). An “AI readability” formula for French as a foreign language. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL '12), Jeju (Korea), July 12-14. Stroudsburg, PA: Association for Computational Linguistics, 466 -477.
  • Freyhoff, G., Hess, G., Kerr, L., Menzel, E., Tronbacke, B. and Van Der Veken, K. (1998). Make It Simple, European Guidelines for the Production of EasytoRead Information for People with Learning Disability. Brussels: ILSMH European Association.
  • Kauchak, D., Leroy, G. and Hogue, A. (2017). Measuring text difficulty using parse-tree frequency. Journal of the Association for Information Science and Technology, 68(9), 2088-2100. https://doi.org/10.1002/asi.23855
  • Kincaid, J., Fishburne, R., Rogers, R. and Chissom, B. (1975). Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) For Navy Enlisted Personnel. Institute for Simulation and Training, 56. Available at: https://stars.library.ucf.edu/istlibrary/56 (Accessed: 1 January 2021).
  • Klare, G. R. (2000). The measurement of readability: useful information for communicators. ACM Journal of Computer Documentation, 24(3), 107-121. https://doi.org/10.1145/344599.344630
  • Landauer, T. K., Foltz, P. W. and Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2-3), 259-284. https://doi.org/10.1080/01638539809545028
  • Larsson, P. (2006). Classification into Readability Levels. Implementation and Evaluation (Master’s Thesis), Uppsala University. Available at: http://www.diva-portal.org/smash/get/diva2:131028/FULLTEXT01.pdf (Accessed: 1 January 2021).
  • Leroy, G. and Endicott, J. E. (2012). Combining NLP with Evidence-based Methods to Find Text Metrics Related to Perceived and Actual Text Difficulty. Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, 749-754.
  • LEY 19/2013, de 9 de diciembre, de transparencia, acceso a la información pública y buen gobierno [LAW 19/2013, of December 9, on transparency, access to public information and good governance]. BOE, N. 295, 10th December. Available at: https://www.boe.es/buscar/doc.php?id=BOE-A-2013-12887 (Accessed: 1 January 2021).
  • LEY 34/2002, de 11 de julio, de Servicios de la Sociedad de la información del comercio electrónico [LAW 34/2002, of July 11, on Services of the Electronic Commerce Information Society]. BOE, n.166, 12th July. Available at: https://www.boe.es/buscar/act.php?id=BOE-a-2002-13758 (Accessed: 1 January 2021).
  • Lijun, F. (2010). Automatic Readability Assessment (Dissertation Ph.D.). City University of New York (CUNY). Available at: https://academicworks.cuny.edu/gc_etds/1934/ (Accessed: 1 January 2021).
  • Mohammadi, H. and Khasteh, S. H. (2019). Text as Environment: A Deep Reinforcement Learning Text Readability Assessment Model. ArXiv: 1912.05957 [Cs]. Available at: http://arxiv.org/abs/1912.05957 (Accessed: 1 January 2021).
  • Morato, J., Ruiz-Robles, A., Sanchez-Cuadrado, S. and Marzal García-Quismondo, M. A. (2016). Technologies for Digital Inclusion: Good Practices Dealing with Diversity. In B. Passarelli, J. Straubhaar and A. Cuevas-Cerveró (eds), Handbook of Research on Comparative Approaches to the Digital Age Revolution in Europe and the Americas (pp. 332-351). Hershey, PA: IGI Global https://doi.org/10.4018/978-1-4666-8740-0
  • Morato, J., Sánchez-Cuadrado, S. and Gimmelli, P. (2018). Estimación de la comprensibilidad en paneles de museos [Measuring the readability of exhibit panels in museums]. El Profesional de la Información, 27(3), 570-581. https://doi.org/10.3145/epi.2018.may.10
  • Muñoz Baquedano, M. (2006). Legibilidad y variabilidad de los textos [Legibility and variability of texts]. Universidad Playa Ancha de Ciencias de la Educación. Available at: https://legibilidadmu.cl/1.pdf (Accessed: 1 January 2021).
  • OECD. (2016). Skills matter. Further results from the survey of adult skills. Paris: OECD Publishing. http://doi.org/10.1787/9789264258051-en
  • Ojha, P. K., Ismail, A. and Kuppusamy, K. S. (2018). Perusal of readability with focus on web content understandability. Journal of King Saud University - Computer and Information Sciences, 32(10), 1221. https://doi.org/10.1016/j.jksuci.2018.03.007
  • Padró, L. (2011). Analizadores multilingües en Freeling. Linguamática, 33(1), 13-20. https://doi.org/10.1111/j.1540-4781.2011.01146.x
  • Public Law 111 - 274 - Plain Writing Act of 2010. Available at: https://www.govinfo.gov/app/details/PLAW-111publ274 (Accessed: 1 January 2021).
  • Ramos, J., Fawcett, T. and Mishra, N. (2003). Using TF-IDF to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, 2003 December. Piscataway, NJ, Vol. 242, pp. 133-142.
  • Real Academia Española. (2020). Real Academia Española. Available at: http://www.rae.es (Accessed: 1 January 2021).
  • Schmitt, N., Jiang, X. and Grabe, W. (2011). The Percentage of Words Known in a Text and Reading Comprehension. The Modern Language Journal, 95(1), 26-43. https://doi.org/10.1111/j.1540-4781.2011.01146.x
  • Serna, Y., Morato, J. and Sánchez-Cuadrado, S. (2018). Evaluación de la comprensión de los paneles interpretativos en parajes naturales [Assessment of understanding of interpretive panels in natural settings]. Scire: Representación y organización del conocimiento-Scire: Representation and organization of knowledge, 24(2), 53-62.
  • Simplext project. (n.d.). Available at: http://simplext.taln.upf.edu/ (Accessed: 1 January 2021).
  • Venturi, G., Bellandi, T., Dell’Orletta, F. and Montemagni, S. (2015). NLP-Based Readability Assessment of Health-Related Texts: a Case Study on Italian Informed Consent Forms. In Cyril Grouin, Thierry Hamon, Aurélie Névéol, Pierre Zweigenbaum (eds) Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis, LIMSI-CNRS, 17 September 2015, New York: Association for Computational Linguistics, pp. 131-141. https://doi.org/10.18653/v1/W15-2618
  • W3C (2018). Web Content Accessibility Guidelines 2.1. W3C World Wide Web Consortium. Available at: https://www.w3.org/TR/WCAG21/ (Accessed: 1 January 2021).
  • Witten, I. H., Frank, E. and Hall, M. A. (2011). Data mining: practical machine learning tools and techniques (3rd ed). Amsterdam: Elsevier, Morgan Kaufmann.
  • Zeng-Treitler, Q., Kim, H., Goryachev, S., Keselman, A., Slaughter, L. and Smith, C. A. (2007). Text characteristics of clinical reports and their implications for the readability of personal health records. Studies in Health Technology and Informatics, 129(Pt 2), 1117-1121.
LICENSE
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.