Cost-Sensitive Big Data Analytics

ABSTRACT

We are developing an algorithm which works in a distributed environment with a goal to reduce the overall misclassification cost. Moreover, this will solve the problem of learning from the highly imbalanced dataset as Cost-Sensitive classification is majorly applied in solving class imbalance problem.

Description

Data mining classification algorithms can be classified into two categories. i.e. error-based model (EBM) and cost-based model (CBM). EBM does not incorporate the cost of misclassification in the model building phase while CBM does. EBM treats all errors equally likely, which is not the case with all real-world applications like credit card fraud detection, medical diagnosis etc. Shopping carts, credit card fraud detection system, loan approval system, medical diagnosis etc. are some example systems, which largely works in spread across the environment. Therefore, to perform classification for such data requires a distributed system. Moreover, in such applications, the volume of the data is very high. CBM in the distributed environment helps in reducing the overall misclassification cost. As part of our research, we are developing an algorithm which works in a distributed environment with a goal to reduce the overall misclassification cost. Moreover, this will solve the problem of learning from the highly imbalanced dataset as Cost-Sensitive classification is majorly applied in solving class imbalance problem.

Keywords: Data Science: Cloud Computing, Data Analytics and Machine Learning