A Constraint Identification Method for Predicate Node Identification in Clustered Xml Documents

B.A. Bodinga, A. Roko, A.B. Muhammad, I. Saidu

Abstract

A large number of documents are now represented and stored using an XML document structure on the web. These documents may emanate from the same source (Homogeneous) or different sources (Heterogeneous). This make it challenging as how these documents can be managed and retrieved. The existing systems returns irrelevant predicates. The predicate node identification method employed on the search systems use only a simple constraint. To improve the effectiveness of XML retrieval, an effective constraints Identification Algorithm (E_CIA) is developed to identify relevant predicates. The E_CIA uses Constraints Operator Generator (COG) to identify constraints to be imposed to generate most relevant predicate node to improve the effectiveness of the retrieval process. Experiments have been conducted to evaluate the performance of the proposed E_CIA. The experimental results have shown that the proposed E_CIA outperforms StruX and StruXPlus in terms of precision.

keywords:

XML retrieval, Constraint Identification, Predicate node, Constraint Operator

References:

[1] Bao, Z., Lu, J., Ling, T. W., & Chen, B. (2010). Towards an Effective XML Keyword
Search. IEE Transactions On Knowledge And Data Engineering, Vol. 22(8), pp.1077–1092.
[2] Bodinga, A. B., Roko, A., Muhammad, A.B., and Saidu, I. (2024). An Effective XML
Documents Clustering Method Using Word Embeddings for Heterogeneous Collections.
International Journal of Computer Science and Mathematical Theory. DOI:
10.56201/ijcsmt.v10.no2.2024.pg120.140
[3] Fuhr, N., Lalmas, M., & Kazai, G. (2002). INEX: Initiative for the Evaluation of XML
retrieval. In University of Dortmund. article.
[4] Gan, K.H. and Phang, K.K. (2017). A Sematic-Syntax Model for XML Query Construction.
International Journal of Web Information Systems. Vol. 13(2). doi: 10.1108/IJWIS-
06-2016-0034.
[5] Hagen, M. Potthast, M., Stein, B. & Brautigam, C. (2012). The power of Naïve
query
segmentation. In the Proceedings of the SIGIR’10 Conference, Geneva, Switzerland.
Pp. 1-2.
[6] Hummel, F., da Silva, A.S., Moro, M.M., & Laender, A.H.F. (2011). Automatically
Generating
Structured Queries for XML Keyword Search. In
S.Geva et.al., (Eds.):
INEX 2010, lncs 6932,194-205. Springer-Verlag Berling Heldeiberg.
[7] Liu, Z., and Chen, Y. (2007). Identifying meaningful return information for XML keyword
search. In Proceedings of the 2007 ACM SIGMOD international conference on
Management of data. SIGMOD ’07. Pp. 320-329. New York, New York, USA: ACM Press.
http://doi.org/10.1145/1247480.1247518.
[8] Roko, A., Doraisamy, S., Jantan, A, H. and Azman, A. (2015). Effective Keyword Query
Structuring using NER for XML Retrieval. International Journal of Web
Information Systems, vol. 11 (1), pp. 33-53.
[9] Roko, A., Doraisamy, S., and Nakone, B. (2018). Effective Predicate Identification
Algorithm for XML Retrieval. In proceedings of the Fourth International Conference on
Information Retrieval and Knowledge Management (CAMP),
Kota
Kinabalu,Malaysia, 2018, pp. 1-5, doi:
10.1109/INFRKM.2018.8464696.
[10] Woodley, A., and Geva, S. (2006). Nlpx at inex 2006. In N. Fuhr, M. Lalmas, & A.
Trotman (Eds.), Inex, 4518, 302-311. Springer-Verlag Berling Heldeiberg.
[11] Petkova D., Croft W.B., and Diao Y. (2009). Refining Keyword Queries for XML
Retrieval by Combining Content and Structure. In: Boughanem M., Berrut
C.,
Mothe J., Soule-Dupuy C. (eds) Advances in Information Retrieval.
ECIR 2009,
Springer, Berlin, Heidelberg.

DOWNLOAD PDF

CALL FOR PAPERS

VOL. 11 ISSUE 4

APRIL 2025 EDITION

Research Articles written in English are invited from interested scholars and researchers in the academic community and other establishment for publication in the following areas:

Management Sciences
Social Sciences
Education
Engineering
Humanities
Sciences

An Author who wishes to submit a manuscript should note that the manuscript has not been submitted elsewhere nor is it for consideration in another journal. The article should be the original work of the author. International Institute of Academic Research and Development (IIARD) welcomes and acknowledges high-quality theoretical and empirical original research papers from researchers, academicians, professional, practitioners, and students from all over the world.

LATEST UPDATES

DOI (DIGITAL OBJECT IDENTIFIER) ISSUANCE

We are pleased to inform you that IIARD is now a registered member of Crossref. Henceforth, we will be issuing DOI to every published article.

JOURNAL HARD COPIES ARE READY FOR DISPATCH

All Journal hard copies are ready for dispatch. Corresponding authors are advice to submit their mailing addresses to editor@iiardjournals.org