DATA MINING USING CRISP-DM PROCESS FRAMEWORK ON OFFICIAL STATISTICS: A CASE STUDY OF EAST JAVA PROVINCE

A case analysis of East Java Province

Authors

  • Gunawan University of Surabaya

DOI:

https://doi.org/10.14203/JEP.29.2.2021.183-198

Keywords:

macroeconomics, data mining, CRISP-DM, cluster, East Java

Abstract

The conventional data analysis in economics is based on a model derived from economic theories. In contrast, data mining is a data-driven analysis to extract data and find a pattern describing the empirical interaction between variables. The emerging area of data mining offers an opportunity for extracting information from macroeconomic data. However, it is still a challenge for economic researchers and policymakers to embrace data mining because it is closely related to the information technology discipline. This study responds to the limited use of data mining in the economic area by analyzing macroeconomic indicators published by the Indonesian Central Bureau of Statistics. The primary purpose of this study is to offer a case for using the data mining approach for macroeconomic indicators. The specific objectives were (1) to introduce the Cross-Industry Standard Process for Data Mining (CRISP-DM) as a process framework and Knime Analytics Platform as a data mining software for macroeconomic data analysis; and (2) to characterize East Java regencies/municipalities based on their macroeconomic indicators and region profiles. This study was categorized as secondary and quantitative research. The unit of analysis was the regency/municipality. Five macroeconomic indicators: Human Development Index (HDI), Gross Regional Domestic Products (GRDP), poverty rate, Gini Ratio, and open unemployment rate, were selected as the variables. Four region profiles: area, population, population density, and the number of villages were included in the analysis. The clustering model was implemented through Knime’s workflow. The result of clustering grouped 38 regions into three. Its applicability and simplicity indicated the appropriateness of the CRISP-DM process framework for analyzing the structured official data. Furthermore, the predictive model, applied to past years’ datasets, revealed the regions that experienced improvement and shifted their membership between clusters over three years. Moreover, the inclusion of region profiles has provided a better understanding of underlying factors explaining the association between macroeconomic indicators. This study suggests that the East Java Government considers different facilitation-focused programs based on the characteristics of three clusters for better budget efficiency. This research adds to the literature on economic development, particularly by introducing data mining, the CRISP-DM method, and Knime software to analyze macroeconomic indicators of regency/municipality.

Downloads

Download data is not yet available.

References

Arisman. (2018). Determinant of human development index in Southeast Asia. Jurnal Ilmu Ekonomi, 7(2), 118–137. https://doi.org/10.37950/jkpd.v2i2.44

Barbaglia, L., Consoli, S., Manzan, S., Saisana, D. R. R. M., & Pezzoli, L. T. (2021). Data science technologies in economics and finance: A gentle walk-in. In S. Consoli, D. R. Recupero, & M. Saisana (Eds.), Data science for economics and finance: Methodologies and applications (pp. 1–17). Springer. https://doi.org/10.1007/978-3-030-66891-4

Benos, N., & Stavrakoudis, A. (2020). Okun’s law: Copula-based evidence from G7 countries. Quarterly Review of Economics and Finance. Advance online publication. https://doi.org/10.1016/j.qref.2020.10.004

BPS-Jatim. (2019). Indeks pembangunan manusia Provinsi Jawa Timur 2019.

BPS-Jatim. (2020). Provinsi Jawa Timur dalam angka 2020.

Brito, P., & Malerba, D. (2003). Mining official data. Intelligent Data Analysis, 7(6), 497–500. https://doi.org/10.3233/ida-2003-7601

Cao, L. (2017). Data science. ACM Computing Surveys, 50(3), 1–42. https://doi.org/10.1145/3076253

Cingano, F. (2014). Trends in income inequality and its impact on economic growth. OECD Social, Employment, and Migration Working Papers, 163, 1–59. https://doi.org/10.1787/5jxrjncwxv6j-en

Feelders, A. J. (2002). Data mining in economic science. In Dealing with the data flood (pp. 166–175). http://www.staff.science.uu.nl/~feeld101/dmecon.pdf

Georgescu, I., Androniceanu, A.-M., & Kunnunen, J. (2020). A discriminant analysis to the quantification of human development index under economic inequality. In The 14th International Management Conference (pp. 1053–1062). https://doi.org/10.24818/imc/2020/05.15

Guazzelli, A., Zeller, M., Lin, W. C., & Williams, G. (2009). PMML: An open standard for sharing models. R Journal, 1(1), 60–65. https://doi.org/10.32614/rj-2009-010

Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Elsevier.

Harding, M., & Hersh, J. (2018). Big data in economics. IZA World of Labor, September, 1–10. https://doi.org/10.15185/izawol.451

Hartanto, W., Islami, N. N., Mardiyana, L. O., Ikhsan, F. A., & Rizal, A. (2019). Analysis of human development index in East Java Province Indonesia. IOP Conference Series: Earth and Environmental Science, 243, 012061. https://doi.org/10.1088/1755-1315/243/1/012061

Hassani, H., Gheitanchi, S., & Yeganegi, M. R. (2010). On the application of data mining to official data. Journal of Data Science, 8, 75–89.

Hassani, H., Saporta, G., & Silva, E. S. (2014). Data mining and official statistics: The past, the present and the future. Big Data, 2(1), 34–43. https://doi.org/10.1089/big.2013.0038

Heckman, J. J. (2001). Econometrics and empirical economics. Journal of Econometrics, 100(1), 3–5. https://doi.org/10.1016/S0304-4076(00)00044-0

Hudáková, J. (2017). Relationship between gross domestic product and human development index. SGEM 2017 Conference Proceedings, 665–672. https://doi.org/10.5593/sgemsocial2017/14/S04.087

Imaningsih, N., Priana, W., Sishadiyati, S., Asmara, K., & Wijaya, R. S. (2020). Analysis of factors affecting human development index East Java. In The 2nd International Conference on Economics, Business, and Government Challenges. https://doi.org/10.4108/eai.3-10-2019.2291908

Johnston, M. P. (2014). Secondary data analysis: A method of which the time has come. Qualitative and Quantitative Methods in Libraries (QQML), 3, 619–626.

Khodabakhshi, A. (2011). Relationship between GDP and human development indices in India. International Journal of Trade, Economics and Finance, 2(3), 251–253. https://doi.org/10.7763/ijtef.2011.v2.111

Liu, J., & Kemp, A. (2019). Forecasting the sign of U.S. oil and gas industry stock index excess returns employing macroeconomic variables. Energy Economics, 81, 672–686. https://doi.org/10.1016/j.eneco.2019.04.023

López-Robles, J. R., Rodríguez-Salvador, M., Gamboa-Rosales, N. K., Ramirez-Rosales, S., & Cobo, M. J. (2019). The last five years of big data research in economics, econometrics and finance: Identification and conceptual analysis. Procedia Computer Science, 162, 729–736. https://doi.org/10.1016/j.procs.2019.12.044

Mügge, D. (2016). Studying macroeconomic indicators as powerful ideas. Journal of European Public Policy, 23(3), 410–427. https://doi.org/10.1080/13501763.2015.1115537

Pemprov Jatim. (2012). Peraturan daerah Provinsi Jawa Timur No. 5 Tahun 2012 tentang rencana tata ruang wilayah provinsi tahun 2011–2030.

Purnama Sari, I., Riyono, B., & Supandi, A. (2017). Indeks pembangunan manusia di Madura: Analisis tipologi Klassen. Journal of Applied Business and Economics, 110(9), 1689–1699.

Purnamasari, S. B., Yasin, H., & Wuryandari, T. (2014). Pemilihan cluster optimum pada Fuzzy C-Means pengelompokan kabupaten/kota di Provinsi Jawa Tengah berdasarkan indikator indeks pembangunan manusia. Jurnal Gaussian, 3(3), 491–498.

Qurrata, V. A., & Ramadhani, N. (2021). The impact of HDI, minimum wages, investment and GRDP on poverty in East Java in 2019. KnE Social Sciences, 2021, 411–418. https://doi.org/10.18502/kss.v5i8.9393

Rahmat, A., Hardi, H., Syam, F. A., Zamzami, Z., Febriadi, B., & Windarto, A. P. (2021). Utilization of the field of data mining in mapping the area of the Human Development Index (HDI) in Indonesia. Journal of Physics: Conference Series, 1783, 012035. https://doi.org/10.1088/1742-6596/1783/1/012035

Reyes, G. E., & Useche, A. J. (2019). Competitiveness, economic growth and human development in Latin American and Caribbean countries 2006–2015: A performance and correlation analysis. Competitiveness Review: An International Business Journal, 29(2), 139–159. https://doi.org/10.1108/CR-11-2017-0085

Sambodo, M. T. (2018). Tata kelola dan peningkatan daya saing ekonomi nasional: Suatu penelusuran konsep. Jurnal Ekonomi Pembangunan, 25(2), 33–46. https://doi.org/10.14203/jep.25.2.2017.33-46

Saporta, G. (2018). From conventional data analysis methods to big data analytics. In M. Corlosquet-Habart & J. Janssen (Eds.), Big data for insurance companies (Vol. 1, pp. 27–41). John Wiley & Sons. https://doi.org/10.1002/9781119489368.ch2

Saputra, F. A., Barakbah, A., & Rokhmawati, P. R. (2020). Data analytics of human development index (HDI) with features descriptive and predictive mining. International Electronics Symposium, 316–323.

Sinaga, M. (2020). Analysis of effect of GRDP per capita, inequality distribution income, unemployment and HDI on poverty. Budapest International Research and Critics Institute (BIRCI-Journal): Humanities and Social Sciences, 3(3), 2309–2317. https://doi.org/10.33258/birci.v3i3.1177

Taylor, L., Schroeder, R., & Meyer, E. (2014). Emerging practices and perspectives on Big Data analysis in economics: Bigger and better or more of the same? Big Data and Society, 1(2), 1–10. https://doi.org/10.1177/2053951714536877

Downloads

Published

2025-07-01

How to Cite

Gunawan. (2025). DATA MINING USING CRISP-DM PROCESS FRAMEWORK ON OFFICIAL STATISTICS: A CASE STUDY OF EAST JAVA PROVINCE: A case analysis of East Java Province. Jurnal Ekonomi Dan Pembangunan, 29(2), 183–198. https://doi.org/10.14203/JEP.29.2.2021.183-198