D. S. Song

ORCID: 0009-0007-8666-8966
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Nuclear and radioactivity studies
  • Functional Brain Connectivity Studies
  • Marine and Coastal Research
  • Natural Language Processing Techniques
  • Risk and Safety Analysis
  • Nuclear reactor physics and engineering
  • Web Application Security Vulnerabilities
  • Radiative Heat Transfer Studies
  • Calibration and Measurement Techniques
  • Digital Media Forensic Detection
  • Web Data Mining and Analysis
  • Embedded Systems Design Techniques
  • Advanced MRI Techniques and Applications
  • Nuclear Materials and Properties
  • Engineering Applied Research
  • Semantic Web and Ontologies
  • Anomaly Detection Techniques and Applications
  • Scientific Computing and Data Management
  • Real-time simulation and control systems
  • Medical Imaging Techniques and Applications
  • Nuclear Engineering Thermal-Hydraulics
  • Neural dynamics and brain function
  • Hydraulic and Pneumatic Systems
  • Parallel Computing and Optimization Techniques
  • EEG and Brain-Computer Interfaces

George Mason University
2024

Seoul Women's University
2024

Korea Electric Power Corporation (South Korea)
1997-2001

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent Artificial General Intelligence (AGI). However, replicating such advancements in open-source models been challenging. This paper introduces InternLM2, an LLM that outperforms its predecessors comprehensive evaluations across 6 dimensions 30 benchmarks, long-context modeling, open-ended subjective through innovative pre-training optimization techniques. process InternLM2 is meticulously...

10.48550/arxiv.2403.17297 preprint EN arXiv (Cornell University) 2024-03-25

Large Language Models (LLMs) have demonstrated remarkable capabilities in various tasks, yet code generation remains a major challenge. Current approaches for obtaining high-quality data primarily focus on (i) collecting large-scale pre-training and (ii) synthesizing instruction through prompt engineering with powerful models. While faces quality consistency issues, instruction-based synthesis suffers from limited diversity inherent biases of LLMs. To address this gap, we introduce...

10.48550/arxiv.2502.11460 preprint EN arXiv (Cornell University) 2025-02-17

Recently, there has been a revived interest in system neuroscience causation models due to their unique capability unravel complex relationships multi-scale brain networks. In this paper, our goal is verify the feasibility and effectiveness of using causality-based approach for fMRI fingerprinting. Specifically, we propose an innovative method that utilizes causal dynamics activities identify cognitive patterns individuals (e.g., subject fingerprint) tasks task fingerprint). The key novelty...

10.48550/arxiv.2409.18298 preprint EN arXiv (Cornell University) 2024-09-26

The programming skill is one crucial ability for Large Language Models (LLMs), necessitating a deep understanding of languages (PLs) and their correlation with natural (NLs). We examine the impact pre-training data on code-focused LLMs' performance by assessing comment density as measure PL-NL alignment. Given scarcity code-comment aligned in corpora, we introduce novel augmentation method that generates comments existing code, coupled filtering strategy filters out code poorly correlated...

10.48550/arxiv.2402.13013 preprint EN arXiv (Cornell University) 2024-02-20

Open-source Large Language Models (LLMs) and their specialized variants, particularly Code LLMs, have recently delivered impressive performance. However, previous LLMs are typically fine-tuned on single-source data with limited quality diversity, which may insufficiently elicit the potential of pre-trained LLMs. In this paper, we present AlchemistCoder, a series enhanced code generation generalization capabilities multi-source data. To achieve this, pioneer to unveil inherent conflicts among...

10.48550/arxiv.2405.19265 preprint EN arXiv (Cornell University) 2024-05-29

Recently, there has been a revived interest in system neuroscience causation models due to their unique capability unravel complex relationships multi-scale brain networks. In this paper, our goal is verify the feasibility and effectiveness of using causality-based approach for fMRI fingerprinting. Specifically, we propose an innovative method that utilizes causal dynamics activities identify cognitive patterns individuals (e.g., subject fingerprint) tasks task fingerprint). The key novelty...

10.1145/3698587.3701342 article EN 2024-11-22
Coming Soon ...