INTERNATIONAL JOURNAL OF APPLIED SCIENCES AND MATHEMATICAL THEORY (IJASMT )

E- ISSN 2489-009X
P- ISSN 2695-1908
VOL. 5 NO. 2 2019


Efficient Algorithms for Data Extraction in Big Data

Abiye-Suku, Ominini Monima & E. O. Bennett


Abstract


This paper is centered on the development of an information system for the extraction of e- mails addresses of staff/students in an organization with big data to assist in the dissemination of vital information in a very short time using two algorithms that are rule and machine learning based. The rule-based uses regular expression technique while the machine based is implemented with the decision tree classifier. The proposed tool can be used in the banking sector to extract customers email addresses for posting transaction details and goodwill messages to improve customers’ relationship, in the educational sector the tool can be used to send students’ progress report, registration and payment receipt and lastly in the health sector to send medical reports, globalization and monitoring the hospital quality. The database was generated online since most organizations are very discreet with staff details. Tokenization of the generated data takes place immediately where the domains are determined. The constructive research methodology and object-oriented design technique was used to analyze, design and implement the tool. A customized software application was developed in python programming language for its implementation ensuring the system evaluation met with system requirements and potential users’ expectations. The research extensively carried out testing using different data sizes to execute email addresses extraction and showed the limitations of the tool and potential for further work on the software package.


keywords:

Algorithm, Data Extraction, Rule Based, Machine Learning


References:


Albert, B. (2002). Mining big data in real time. Informatica, 37, 15–20
Alexander T. (2006). Using Regular Expression to Abstract Blood Pressure and Treatment
Intensification Information from he Text of Physician Notes". Journal of the American
Medical Informatics Association, Pages 691–695,.
Baesens, B. (2014). Analytics in a big data world: The Essential Guide to Data Science and its
Applications. GOOGLE. pp.15-20.
Bernice, P. (2013). The emergence of big data technology and analytics. Journal of Technology
Research.
Duy D, An B, Qing Zeng-Treitler (2014). Learning regular expression for clinical text
classification. Journal of the American Medical Information Association column 21,
issue 5
Jonathan, S. W., & Adam, B. (2013). Undefined by data: A survey of big data definitions
(Master’s thesis), University of St. Andrews, UK.
Khan. N., Habib, S., Gran, B., Soulmaz, S., Mohammed, A. & Aftab, A. A. (2018) The 10 Vs,
Issues and Challenges of Big Data. ResearchGata
Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts: SAGE Journal DOI:
10.1177/2053951714528481. pp. 1-12
Kumar, S., Kamesh, K. & Syed U. (2014). A study on Big Data and its Importance,
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 9,
Number 20. pp. 7469-7479
Utkarsha P. Pisolkar, Shivaji R. Lahane (2015). A memory Efficient Regular Expression
Matching by Compressing Deterministic Finite Automata. International journal of
computer Application vol 122-no.20
Vikram P. S., Madhusudhana R. E (2013) Big Data - Solutions for RDBMS Problems – A
Survey. International Journal of Advanced Research in Computer and Communication
Engineering Vol. 2, Issue 9. pp 3686 - 3693


DOWNLOAD PDF

Back