Data Science

Data Science: Cloud Computing, Data Analytics and Machine Learning

Online Outlier Detection Algorithm on FPGA​

The project involves comparative study, implementation and analyse of various anomaly detection algorithms. The desired goal is to build a hardware (FPGA module) that runs an algorithm designed to detect anomalies on data streams.  A typical use case of this project is to generate real time alerts.

Ratnik Gandhi, Ativ Joshi, Pratik Padalia

Fast Implementation of Face Recognition on GPU

In this information era, we have most of our data secured by computers by incurring different security mechanisms such as passwords, encryption keys, fingerprints, faces as well iris data. Over Last three decades, face recognition have been a pervasive research problem in computer vision due to its wide applicability. Computation of high dimensional data in real-time can increase time complexity. To overcome time complexity, we can use hardware with more processing powers. Though high-end CPUs can reduce the computation time, GPUs can reduce computation time significantly, because they have been designed for specialized optimization for faster arithmetic operations than traditional processors exploiting the power of streaming multiprocessors.

Besides, these high-end hardware (CPUs & GPUs) can cost a fortune. We are designing faster algorithms for Face Recognition that runs on commodity hardware. We use Principal Component Analysis, and Linear Discriminant Analysis as a training model. Training time on GPU linearly increases whereas on CPU it remains quadratic. We also use Incremental algorithms for PCA and LDA to learn from videos in online fashion. By sacrifice of some frames, we had been able to preserve frame rates and retain recognition accuracy of around 90%.

Axat Chaudhary, Ratnik Gandhi, Mayank Jobanputra, Saumil Shah

Recommendation System for HR

Human Resource (HR) Management is one of the most essential part of a myriad of institutions across the globe. From an institution’s viewpoint, Recruiting talent has been one of the crucial problem faced by HR Department and which fits the job description. From a professional’s perspective, job hunting is also a challenging problem where one has to spend hours to find the right match of job with skills they have meeting their expectations. 

Most of this job hiring & hunting process is manual, where human intervention is needed to make decisions and we have attempted to automate that process. This project aims to provide a solution for both users - Institutions & Professionals - that recommends prospective employees and jobs respectively.

Axat Chaudhary, Ratnik Gandhi, Mayank Jobanputra, Saumil Shah

Protein-Protein interaction networks

Multiple databases provide free access to protein-protein interaction data. Graph theory provides powerful tools to analyse such data. The analysis has multiple possible applications like, prediction of interaction of a new protein with the proteins in the database (how would a new disease protein effect human biochemistry?) ,  identification of roles of special proteins in processes (which proteins to target to inhibit or enhance certain processes) and identification of functional groups of proteins (which proteins play a role metabolic processes ?). We have studies network representation of protein data to identify proteins with special roles and their relation to the structure of the network. 

Outcomes: Protein protein interaction networks were analysed to identify proteins with high between-ness values and their occurrence  in the network. Subgraphs of human protein interactome to identify important groups of proteins based on various centralities.  A disease-disease network was created with edge weight based on shared proteins. Degree distribution of the network was compared with standard network models.

Mitaxi Mehta, Manish Datt, Seema Aswani, Priyanka Nimawat

Speaker verification 

Speech biometrics has a big potential in security, service, medical  and human-computer interaction sectors. While speech data is typically smaller compared to imaging data, due to its high variability, classification and processing challenges have given rise to  interesting algorithms and a quest for improvements. We study speech features from short fixed one word speech to exctract features and study the effectiveness of the features for the purpose of speaker verification. 

Outcomes: Matlab codes for speaker identification have been developed. A comparative study of features is in progress.

Mitaxi Mehta, Pooja Patel

Analysis and visualization of health data

 Visualisation of data may give a consolidated view of a large dataset to highlight multiple key features of the data and aid in a decision process. Often the analysis required is meaningful in terms of partition of the data according to one or more parameters to answer questions like; What are gender based differences in occurrence of  anemia, high blood pressure, diabetes ?, What are the demography based (rural/urban) differences of the same parameters? The data may be further analysed to reflect sample sizes and variance to convey reliability. The analysis may be extended also to other databases like education and business too.

Outcomes: The CAB database was analysed for gender based and demography based partitions. A shiny gui was created for interactive visualization of the data. Work on partitioning and analysis of the data based on a threshold value of a continuous variable (example, Hemoglobin > 13) is in progress

Mitaxi Mehta, Yesha Bhavsar




Your browser is out-of-date!

For a richer surfing experience on our website, please update your browser.Update my browser now!