High Performance CDR Processing with MapReduce

Mulya Agung; Achmad Imam Kistijantoro

doi:10.5614/itbj.ict.res.appl.2016.10.2.1

Authors

Mulya Agung School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jalan Ganesha No. 10, Bandung 40132, Indonesia
Achmad Imam Kistijantoro School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jalan Ganesha No. 10, Bandung 40132, Indonesia

DOI:

https://doi.org/10.5614/itbj.ict.res.appl.2016.10.2.1

Abstract

A call detail record (CDR) is a data record produced by telecommunication equipment consisting of call detail transaction logs. It contains valuable information for many purposes in several domains, such as billing, fraud detection and analytical purposes. However, in the real world these needs face a big data challenge. Billions of CDRs are generated every day and the processing systems are expected to deliver results in a timely manner. The capacity of our current production system is not enough to meet these needs. Therefore a better performing system based on MapReduce and running on Hadoop cluster was designed and implemented. This paper presents an analysis of the previous system and the design and implementation of the new system, called MS2. In this paper also empirical evidence is provided to demonstrate the efficiency and linearity of MS2. Tests have shown that MS2 reduces overhead by 44% and speeds up performance nearly twice compared to the previous system. From benchmarking with several related technologies in large-scale data processing, MS2 was also shown to perform better in the case of CDR batch processing. When it runs on a cluster consisting of eight CPU cores and two conventional disks, MS2 is able to process 67,000 CDRs/second.

Downloads

Download data is not yet available.

References

Jacobs, A., The Pathologies of Big Data, Communications of the ACM, 52(8), pp. 36-44, 2009.

McSherry, F., Isard, M. & Murray, D.G., Scalability! But at what COST, in 15th Workshop on Hot Topics in Operating Systems (HotOS XV), Kartause Ittingen, USENIX Association (2015), pp. 14, 2015.

ITU-T, X.690 Information Technology - ASN.1 Encoding Rules, 1st ed., International Telecommunication Union, 2002.

Bouillet, E., Kothari, R., Kumar, V., Mignet, L., Nathan, S., Ranganathan, A., Turaga, D.S., Udrea, O. & Verscheure, O., Processing 6 billion CDRs/day: from research to production (experience report), in Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems, Berlin, ACM (2012), pp. 264-267, 2012.

Hohpe, G. & Bobby W., Enterprise Integration Patterns, Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2003.

Bell, G., Gray, J. & Szalay, A., Petascale computational systems, Computer, 39(1), pp. 110-112, 2006.

Dean, J. & Sanjay, G., MapReduce: simplified data processing on large clusters, Communications of the ACM, 51(1), pp. 107-113, 2008.

Holmes, A., Hadoop in Practice, 1st ed., Manning Publications Co. Greenwich, CT, USA, 2012.

Gray, J. & Prashant, S., Rules of Thumb in Data Engineering, in Data Engineering, IEEE (2000), pp. 3-10, 2000.

Apache Hadoop, Apache Foundation, http://hadoop.apache.org, (1 July 2015).

White, T., Hadoop: The Definitive Guide, 1st ed., O'Reilly Media Inc., 2012.

Heger, D., Hadoop Performance Tuning-A Pragmatic & Iterative Approach, Computer Measurement Group Journal, 4, pp. 97-113, 2013.

Lee, K.H., Lee, Y.J., Choi, H., Chung, Y.D., & Moon, B., Parallel data processing with MapReduce: a survey, AcM sIGMoD Record, 40(4), pp. 11-20, 2012.

Feng, X., Shen, J. & Fan, Y., REST: An alternative to RPC for Web services architecture, in Future Information Networks, IEEE (2009), pp. 7-10, 2009.

Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S. & Saha, B., Apache hadoop yarn: Yet Another Resource Negotiator, in Proceedings of the 4th annual Symposium on Cloud Computing, Santa Clara, ACM (2013), pp. 5, 2013.

Floratou, A., Patel, J.M., Shekita, E.J. & Tata, S., Column-Oriented Storage Techniques for MapReduce, Proceedings of the VLDB Endowment, 4(7), 419-429, 2011.

Gadkari, A., Caching in the Distributed Environment, Advances in Computer Science: an International Journal, 2(1), pp. 9-16, 2013.

Miner, D. & Adam S., MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems, 1st ed., O'Reilly Media Inc., 189-195, 2012.

Joshi, S.B., Apache Hadoop Performance-Tuning Methodologies And Best Practices, in Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering, Boston, ACM (2012), pp. 241-242, 2012.

Chang, J., Lim, K.T., Byrne, J., Ramirez, L., & Ranganathan, P., Workload Diversity and Dynamics in Big Data Analytics: Implications To System Designers, in Proceedings of the 2nd Workshop on Architectures and Systems for Big Data, Portland, ACM (2012), pp. 21-26, 2012.

Teng, W.G., & Chou, M.C., Mining Communities of Acquainted Mobile Users on Call Detail Records, in Proceedings of the 2007 ACM symposium on Applied computing, ACM (2007), pp. 957-958, 2007.

Ding, L., Gu, J., Wang, Y. & Wu, J., Analysis of Telephone Call Detail Records Based on Fuzzy Decision Tree, Forensics in Telecommunications, Information, and Multimedia, 56, pp. 301-311, 2011.

Lin, Q. & Wan, Y., Mobile Customer Clustering Based on Call Detail Records for Marketing Campaigns, in Management and Service Science, IEEE (2009), pp. 1-4, 2009.

Liu, T., Liu, Y., Wang, Q., Wang, X., Gao, F. & Qian, D., Pipeline-Based Parallel Framework for Mass File Processing, in Cloud and Service Computing (CSC), IEEE (2013), pp. 42-48, 2013.

Chen, Q., & Hsu, M., Scale out Parallel and Distributed CDR Stream Analytics, Data Management in Grid and Peer-to-Peer Systems, Springer Berlin Heidelberg, 6265, 124-136, 2010.

Logothetis, D., Trezzo, C., Webb, K.C., & Yocum, K., In-situ MapReduce for log processing, in 2011 USENIX Annual Technical Conference, Portland, USENIX Association (2011), pp. 115, 2011.

Kreps, J., Narkhede, N. & Rao, J., Kafka: A Distributed Messaging System for Log Processing, in Proceedings of the NetDB, ACM (2011), 2011.

Liu, X., Iftikhar, N. & Xie, X., Survey of Real-Time Processing Systems For Big Data, in Proceedings of the 18th International Database Engineering & Applications Symposium, Porto, ACM (2014), pp. 356-361, 2014.

Sumbaly, R., Kreps, J. & Shah, S., The "Big Data" Ecosystem at LinkedIn, in Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, ACM (2013), pp. 1125-1134, 2013.

High Performance CDR Processing with MapReduce

Authors

DOI:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section