- Web Data Mining and Analysis
- Complex Network Analysis Techniques
- Digital Marketing and Social Media
- Information Retrieval and Search Behavior
- Technology Adoption and User Behaviour
- Information and Cyber Security
- Text and Document Classification Technologies
- Advanced Text Analysis Techniques
- Data Management and Algorithms
- Web visibility and informetrics
- Social Media and Politics
- Data Mining Algorithms and Applications
- FinTech, Crowdfunding, Digital Finance
- Natural Language Processing Techniques
- Opinion Dynamics and Social Influence
- Spam and Phishing Detection
- Wikis in Education and Collaboration
- Caching and Content Delivery
- Mental Health via Writing
- Semantic Web and Ontologies
- Microfinance and Financial Inclusion
- Cybercrime and Law Enforcement Studies
- Sentiment Analysis and Opinion Mining
- Scientific Computing and Data Management
- Network Security and Intrusion Detection
University of Hong Kong
2016-2025
Cooper University Health Care
2023-2024
Cooper Medical School of Rowan University
2024
Binus University
2024
George Washington University
2021
Chinese University of Hong Kong
2006-2020
Institut de Recherche Technologique SystemX
2016
Illinois Wesleyan University
2012
Arizona State University
2008
Education University of Hong Kong
2008
A major challenge facing all law-enforcement and intelligence-gathering organizations is accurately efficiently analyzing the growing volumes of crime data. Detecting cybercrime can likewise be difficult because busy network traffic frequent online transactions generate large amounts data, only a small portion which relates to illegal activities. Data mining powerful tool that enables criminal investigators who may lack extensive training as data analysts explore databases quickly...
The increasing popularity of Web 2.0 has led to exponential growth user-generated content in both volume and significance. One important type is the blog. Blogs encompass useful information (e.g., insightful product reviews information-rich consumer communities) that could potentially be a gold mine for business intelligence, bringing great opportunities academic research applications. However, performing intelligence on blogs quite challenging because vast amount lack commonly adopted...
The authors investigated censorship practices and the use of microblogs-or weibos, in Chinese-using 111 million microblogs collected between 1 January 30 June 2012. To better control for alternative explanations decisions attributable to an individual's characteristics choices, they used a matched case-control study design determine list Chinese terms that discriminate censored uncensored posts written by same microbloggers. This includes homophones puns created microbloggers circumvent...
We study the problem of clustering data objects whose locations are uncertain. A object is represented by an uncertainty region over which a probability density function (pdf) defined. One method to cluster uncertain this sort apply UK-means algorithm, based on traditional K-means algorithm. In UK-means, assigned representative has smallest expected distance object. For arbitrary pdf, calculating between and requires expensive integration computation. various pruning methods avoid such calculation.
Background: Severe social withdrawal behaviors among young people have been a subject of public and clinical concerns. Aims: This study aimed to explore the prevalence aged 12–29 years in Hong Kong. Methods: A cross-sectional telephone-based survey was conducted with 1,010 individuals. Social were measured proposed research diagnostic criteria for hikikomori categorized according (a) international duration criterion (more than 6 months), (b) local (less months) (c) but self-perceived as...
Chinese microblogs have drawn global attention to this online application's potential impact on the country's social and political environment. However, representative reliable statistics microbloggers are limited. Using a random sampling approach, study collected microblog data from service provider, analyzing profile pattern of usage for 29,998 accounts. From our analysis, 57.4% (95% CI 56.9%,58.0%) accounts' timelines were empty. Among 12,774 non-zero statuses samples, 86.9% 86.2%,87.4%)...
The limited information provided by peer-to-peer (P2P) lending platforms often is not sufficient for lenders to determine if a borrower trustworthy and able repay the loan. Using unique dataset from P2P platform, which allows seek directly borrowers respond questions comments, we examine impact of lender-borrower communication on funding outcomes loan performance. Our results show that only amount but also content such direct matters. Specifically, number lender comments negatively...
Many people face problems of emotional distress. Early detection high-risk individuals is the key to prevent suicidal behavior. There increasing evidence that Internet and social media provide clues people's In particul
Background: Internet-based learning programs provide people with massive health care information and self-help guidelines on improving their health. The advent of Web 2.0 social networks renders significant flexibility to embedding highly interactive components, such as games, foster processes. effectiveness game-based has not yet been fully evaluated. Objectives: aim this study was assess the a automated, Web-based, network electronic game enhancing mental knowledge problem-solving skills...
Valuable criminal-justice data in free texts such as police narrative reports are currently difficult to be accessed and used by intelligence investigators crime analyses. It would desirable automatically identify from text meaningful entities, person names, addresses, narcotic drugs, or vehicle names facilitate investigation. In this paper, we report our work on a neural network-based entity extractor, which applies named-entity extraction techniques useful entities reports. Preliminary...
Abstract A large number of studies have investigated the transaction log general‐purpose search engines such as Excite and AltaVista, but few reported on analysis logs for that are limited to particular Web sites, namely, site engines. In this article, we report our research analyzing engine Utah state government site. Our results show some statistics, terms per query, users same engines, others, topics used, considerably different. Possible reasons differences include focused domain users'...
The Web's dynamic, unstructured nature makes locating resources difficult. Vertical search engines solve part of the problem by keeping indexes only in specific domains. They also offer more opportunity to apply domain knowledge spider applications that collect content for their databases. authors used three approaches investigate algorithms improving performance vertical engine spiders: a breadth-first graph-traversal algorithm with no heuristics refine process, best-first traversal uses...
Annual Review of Information Science and TechnologyVolume 38, Issue 1 p. 289-329 Article Web mining: Machine learning for web applications Hsinchun Chen, Chen University ArizonaSearch more papers by this authorMichael Chau, Michael Chau author First published: 22 September 2005 https://doi.org/10.1002/aris.1440380107Citations: 41Read the full textAboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare text full-text accessPlease review our...
Text analysis of personal documents provides insight into the cognition those who complete suicide. Many are digitalized and easily found on Internet, which can be used to advance suicide research.(1) To examine temporal relationships between posting intensity language use sketch suicidal process a young man basis his blog entries. (2) investigate whether paper cases have similar or different patterns.Firstly, 193 entries 13-year-old boy posted during year prior were analyzed using Chinese...
Purpose A new business model online to offline (O2O) has emerged in recent years. Similar many models at an early stage, O2O inconsistent definitions which not only inhibit its adoption but also poorly differentiate from other existing models. To resolve the two issues, authors propose approach of definition development. Design/methodology/approach show usefulness approach, demonstrate differences among and with use distinctive thereby evaluate a practical perspective identify research...
Abstract It has become increasingly difficult to locate relevant information on the Web, even with help of Web search engines. Two approaches addressing low precision and poor presentation results current tools are studied: meta‐search document categorization. Meta‐search engines improve by selecting integrating from generic or domain‐specific other resources. Document categorization promises better organization retrieved results. This article introduces MetaSpider, a engine that real‐time...