- Data Stream Mining Techniques
- X-ray Diffraction in Crystallography
- Face and Expression Recognition
- Crystallization and Solubility Studies
- Cloud Computing and Resource Management
- Anomaly Detection Techniques and Applications
- Machine Learning and Data Classification
- Artificial Intelligence in Healthcare
- Spam and Phishing Detection
- Data Mining Algorithms and Applications
- Time Series Analysis and Forecasting
- Big Data and Business Intelligence
- Advanced Malware Detection Techniques
- Network Security and Intrusion Detection
- Sentiment Analysis and Opinion Mining
- Gambling Behavior and Treatments
- Customer Service Quality and Loyalty
- Metaheuristic Optimization Algorithms Research
- Gene expression and cancer classification
- Topic Modeling
- Energy Load and Power Forecasting
- Innovation in Digital Healthcare Systems
- AI in cancer detection
- Text and Document Classification Technologies
- Customer churn and segmentation
Indian Institute of Information Technology Allahabad
2014-2019
Streaming data are potentially infinite sequence of incoming at very high speed and may evolve over the time. This causes several challenges in mining large scale streams real Hence, this field has gained a lot attention researchers previous years. paper discusses various associated with such streams. Several available stream algorithms classification clustering specified along their key features significance. Also, significant performance evaluation measures relevant streaming explained...
Data analytics and machine learning has always been of great importance in almost every field especially business decision making strategy building, healthcare domain, text mining pattern identification on the web, meteorological department, etc.The daily exponential growth data today shifted normal to new paradigm Big Analytics Machine Learning.We need tools perform online analysis streaming for achieving faster response as well maintaining scalability terms huge volume data.SAMOA (Scalable...
Web logs provide useful insight of large scale web based applications and helpful in deriving usage patterns. Since, patterns are available at a high rate volume also continuously updating real time environment, must be handled through modern big data architectures supported by powerful processing tools. generated log streams have most significant impact when it is feasible to analyze them they emitted. In proposed research work, an advanced stream analytics framework especially for has been...
The rise of globalization and market liberalization are changing the face competitiveness significantly. appearance modern technology in business processes has intensified competition put forth new challenges for service providing companies. To cope up with scenarios, companies shifting their attention on retaining existing customers rather hiring ones. This is more cost effective requires lesser resource as well. phenomenon abandoning company by a customer known churn this context,...
This paper presents an efficient Parkinson disease diagnosis system using Least Squares Twin Support Vector Machine (LSTSVM) and Particle Swarm Optimization (PSO). LSTSVM is a promising binary classifier has shown better generalization ability faster computational speed. PSO used for feature selection parameter optimization. dataset taken from UCI repository. The performance of proposed compared with other existing approaches in terms accuracy, sensitivity specificity. Experimental results...
Huge abundance e-commerce websites and online reviews have become crucial these days. These help customers in making decisions but one must go through huge pile of many sites. We summarized the into STARS on a scale 1-5 which are easy to perceive. So, for given customer review, we predict star rating review. Proposed approach this research work first pre-process review data then train different classifiers like Multinomial Naïve Bayes, Bigram Trigram Bigram-Trigram Random Forest. Finally,...
Smart buildings equipped with sensors and electronic devices as a Cyber Physical Systems (CPS) offer great research perspective to explore communication, computation controlling of physical by using real time Complex Event Processing (CEP) analytics. Since, CPS like Smart-Building involves the integration several types equipment interoperability, maintainability, signaling, bandwidth, reliability, security, privacy, authentications, data storage, heterogeneity cost effectiveness are critical...
Cancer is the most common death causing disease and breast cancer deadliest affecting women universally. Survivability forecast of a patient after surgery become challenging difficult task to reduce rate. This survival prediction associated with life woman hence efficient algorithms must be used for purpose. Many have been published in field post-surgical (PSS) during past three decades. These approaches involve statistical or machine learning methods patients; advised lumpectomy/mastectomy....
Over the years, use of Online Social Networks (OSNs) has exploded and thus, causing a need studying understanding users' behavior online. The excessive online social networking causes great increase in anomalies. Anomalies OSNs can signify irregular often illegal behavior. Detection such anomalies been used to identify malicious individuals, including spammers, sexual predators fraudsters. For detecting dataset Twitter network is analyzed for user via analyzing their tweets find whether it...
In current era, we are experiencing tremendous growth in database sizes, types, users, working environments and data access speeds. This situation coined a new term Big Data which large complex datasets used for extracting meaningful knowledge. One of the main challenges processing is its huge volume common characteristic collection textual also. Handling such voluminous big using conventional mining techniques as clustering becomes impractical because algorithmic incompetence to address...
Feature selection is one of the most significant steps in machine learning that reduces features space order to achieve faster and yielding simpler models with high accuracy interpretability. With rapid development technologies, large scale dimensional datasets are common today which degrades performance traditional feature techniques as they suffer scalability issues. Parallel an obvious solution deal this problem. Due advent many distributed computing frameworks scalable has become a...
Transfer learning is an emerging research area which extracts knowledge from one or more than source domains and utilizes this gained to perform some task in a target domain. It has emerged as popular topic recent years, because technique considered be helpful reducing the cost of labeling. many applications on different such Natural Language Processing, Image Video etc. The aim study transfer implement it for Sentiment Analysis Tweets by using Yelp reviews. We find that Learning approach...
From past ten years, the area of Feature Selection is favorite among researchers because increasing size datasets. Several existing feature selection algorithms do not scale well with large Sophisticated tools also work datasets on low configuration machine. This implements using spark cluster computing to have a scalable solution. In proposed algorithm, dataset partitioned vertically into several small Spark distributes it in form RDD (Resilient Distributed Dataset) machines cluster. The...
Ego-networks represent social circles and associations within graphs. The interconnection between an individual (ego) in a network other users (alters) that exist various groups or clusters are community driven continuously evolving. Manual identification of time consuming for large networks due to its busty nature unpredictable growth. In this research work, we propose clustering based real mining ego explore associations. Real data has been acquired with the help Twitter API. By applying...
Machine learning (ML) on Big Data has gone beyond the capacity of traditional machines and technologies. ML for large scale datasets is current focus researchers. Most algorithms primarily suffer from memory constraints, complex computation, scalability issues.The least square twin support vector machine (LSTSVM) technique an extended version (SVM). It much faster as compared to SVM widely used classification tasks. However, when applied having millions or billions samples and/or number...
As human society enters the era of knowledge economy, "industrial economic development Fuzzy Clustering Algorithm (FCM), which promotes science and technology social progress, has undergone profound changes in both connotation form, gradually become an important part national strategy." In order to investigate effects various policy alternatives on energy industry, a system simulation model expansion industry from position low-carbon economy is created based DPSIR theory perspectives energy,...