large language model | VALIANT

ComCat: Expertise-Guided Context Generation to Enhance Code Comprehension

waddelma — Thu, 26 Mar 2026 19:50:56 +0000

Skyler Grandel; Scott Thomas Andersen; Yu Huang; Kevin Leach (2026).��.��ACM Transactions on Software Engineering and Methodology, 35(3), Article 82.��

Software maintenance makes up a large share of the total cost of software over its lifetime, and a big part of that cost comes from understanding existing code. One way to make code easier to understand is through documentation, especially comments that summarize what the code does or explain why it does it. In this work, we introduce ComCat, a system that uses large language models (LLMs, which are AI models trained on very large amounts of text) together with expert guidance to automatically generate useful comments for source code. ComCat is designed to choose the most relevant and informative comment for a specific piece of code. For C/C++ files, the system works in three steps: it first finds places where comments would be most helpful, then decides what kind of comment is needed, and finally writes the comment. In a study with human participants, ComCat’s comments improved code understanding on three software engineering tasks by up to 13% for most participants. The generated comments were also judged to be at least as accurate and readable as human-written comments, and they were preferred over standard ChatGPT-generated comments for up to 92% of code snippets. We also released a dataset containing code snippets, human-written comments, and human-labeled comment categories. Overall, ComCat shows that LLMs can be used to meaningfully improve how well people understand code.

Fig. 1.

ComCat��pipeline and study procedure. We use three instances of HSR to inform��ComCat’s design (1) and evaluate developer performance (2) and preference (3) with our tool.��ComCat��takes C/C++ code as input, using a Code Parser to identify code Snippets to be commented. These Snippets are classified, and the class of each Snippet is used in combination with our Template Catalog to create a prompt for each Snippet. These prompt ChatGPT, which outputs the commented code. This pipeline is informed by developer expertise, but it is fully automated and requires no human intervention.

Demystifying the Power of Large Language Models in Graph Structure Generation

waddelma — Wed, 25 Feb 2026 02:21:16 +0000

Wang, Yu; Rossi, Ryan A.; Park, Namyong; Ahmed, Nesreen K.; Koutra, Danai; Dernoncourt, Franck; & Derr, Tyler. (2025).��.��2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Proceedings of the Conference Findings, NAACL 2025, 8189–8204.��

Large Language Models (LLMs) have been very successful at analyzing graphs—such as predicting node classification (labeling items in a network) and link prediction (predicting missing connections). However, little research has explored whether LLMs can actually generate new graph structures. This study investigates that question.

We designed prompts that guide LLMs to write code that creates graphs with specific structural properties, using ideas from network science. Different types of networks—such as social networks or transportation networks—have different structural patterns. For example, the clustering coefficient measures how often triangles appear in social networks, while square patterns may reflect road layouts in transportation systems. We first tested whether LLMs could generate graphs that match these kinds of domain-specific structural properties.

Next, we selected the best-performing configurations and compared LLM-generated graphs with those produced by established graph generative models across multiple domains. Our results provide insight into how well LLMs can generate realistic network structures and where their strengths and limitations lie.

Knowledge distillation and dataset distillation of large language models: emerging trends, challenges, and future directions

waddelma — Fri, 19 Dec 2025 16:23:50 +0000

Fang, L., Yu, X., Cai, J., Chen, Y., Wu, S., Liu, Z., Yang, Z., Lu, H., Gong, X., Liu, Y., Ma, T., Ruan, W., Abbasi, A., Zhang, J., Wang, T., Latif, E., Liu, W., Zhang, W., Kolouri, S., Zhai, X., Zhu, D., Zhong, W., Liu, T., & Ma, P. (2026).��.��Artificial Intelligence Review,��59(1), 17.��

The exponential growth of Large Language Models (LLMs) continues to highlight the need for efficient strategies to meet ever-expanding computational and data demands. This survey provides a comprehensive analysis of two complementary paradigms: Knowledge Distillation (KD) and Dataset Distillation (DD), both aimed at compressing LLMs while preserving their advanced reasoning capabilities and linguistic diversity. We first examine key methodologies in KD, such as task-specific alignment, rationale-based training, and multi-teacher frameworks, alongside DD techniques that synthesize compact, high-impact datasets through optimization-based gradient matching, latent space regularization, and generative synthesis. Building on these foundations, we explore how integrating KD and DD can produce more effective and scalable compression strategies. Together, these approaches address persistent challenges in model scalability, architectural heterogeneity, and the preservation of emergent LLM abilities. We further highlight applications across domains such as healthcare and education, where distillation enables efficient deployment without sacrificing performance. Despite substantial progress, open challenges remain in preserving emergent reasoning and linguistic diversity, enabling efficient adaptation to continually evolving teacher models and datasets, and establishing comprehensive evaluation protocols. By synthesizing methodological innovations, theoretical foundations, and practical insights, our survey charts a path toward sustainable, resource-efficient LLMs through the tighter integration of KD and DD principles.

Fig. 2

Overview of knowledge distillation in LLMs. Knowledge is distilled from a teacher LLM, which has been trained on a large existing database. This knowledge, potentially enriched with current, task-specific data, is transferred to a smaller student LLM. By learning from both the teacher’s guidance and the current data, the student LLM becomes more efficient and effective at performing downstream tasks

Enhancing Code LLM Training with Programmer Attention

waddelma — Fri, 26 Sep 2025 19:52:44 +0000

Zhang, Yifan, Huang, Chen, Karas, Zachary, Nguyen, Thuy Dung, Leach, Kevin, & Huang, Yu. (2025). Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering.

Human attention, such as where programmers look while reading or writing code, provides valuable signals that are not yet fully used in training large language models (LLMs) for code. These signals offer insights that go beyond machine-driven attention. However, collecting eye-tracking data is complex and costly, and there has been little progress in systematically applying these signals for training code LLMs.

To address this, we propose a full pipeline that combines data augmentation and reward-based fine-tuning. Specifically, we introduce: (1) an eye-tracking path augmentation method to expand programmer attention datasets, (2) a pattern abstraction step that transforms raw fixations into learnable attention motifs, and (3) a reward-guided strategy that integrates these insights into a CodeT5 supervised fine-tuning process.

Our experiments show a +7.16 improvement in CodeBLEU on the CodeXGlue benchmark for code summarization, demonstrating that combining human and machine attention can significantly enhance code intelligence. We hope this work encourages further exploration of human-centered approaches in next-generation AI for Software Engineering (AI4SE).

What are the future directions for microplastics characterization? A regex-llama data mining approach for identifying emerging trends

waddelma — Mon, 25 Aug 2025 21:03:50 +0000

Gomes, Fernando, Bhansali, Shekhar, da Silveira Maranhão, Fabiola, Valladão, Viviane Silva, & Velasco, Karine. (2025). “.” Anais da Academia Brasileira de Ciências, 97, e2024 ��һ��ݶ�1345.

This study presents a new hybrid method to identify and analyze techniques used to study microplastics. By combining pattern-recognition software (regex) with the Llama 3.2:3b language model, we can better detect and understand both traditional and emerging techniques. Established methods like Raman and FTIR spectroscopy are examined alongside advanced tools such as X-ray Photoelectron Spectroscopy (XPS) and Surface-Enhanced Raman Spectroscopy (SERS). This approach improves both the speed and accuracy of identifying complex terms used in microplastics research. Using VOSDataAnalyzer and VOSviewer, we mapped connections and trends among related terms, identifying the 15 most commonly used and emerging techniques. Our analysis shows a shift toward more sensitive and innovative methods in microplastic studies. This Regex-Llama approach, introduced here for the first time, can be applied broadly to tasks such as studying pollutants in the environment, evaluating material breakdown in engineering, and assessing the health impacts of tiny contaminants. Overall, this strategy helps support environmental assessments and guide pollution reduction efforts across multiple fields.

Figure 1. �� Representation of the chemical structure of the most common polymers found in microplastic pollution, in sequence: Polyethylene (PE), Polypropylene (PP), Polystyrene (PS), and Polyethylene Terephthalate (PET).

LLMs as educational analysts: Transforming multimodal data traces into actionable reading assessment reports

waddelma — Mon, 25 Aug 2025 20:58:52 +0000

Davalos, Eduardo, Zhang, Yike, Srivastava, Namrata, Salas, Jorge Alberto, McFadden, Sara E., Cho, Sun-joo, Biswas, Gautam, & Goodwin, Amanda P. (2025). “.” In Lecture Notes in Computer Science (Vol. 15878, pp. 191-204).

Reading assessments are important for improving students’ understanding, but many educational technology tools focus mostly on final scores, offering little insight into how students actually read and think. This study explores using multiple types of data—including eye-tracking, test results, assessment content, and teaching standards—to gain deeper insights into reading behavior. We use unsupervised learning techniques to identify distinct reading patterns, and then a large language model (LLM) summarizes this information into easy-to-read reports for teachers, simplifying the interpretation process. Both LLM experts and human educators evaluated these reports for clarity, accuracy, relevance, and usefulness in teaching. Our results show that LLMs can effectively act as educational analysts, turning complex data into insights that teachers find helpful. While automated reports are promising, human oversight is still necessary to ensure the results are reliable and fair. This work moves human-centered AI in education forward by connecting data-driven analysis with practical classroom applications.

Fig.1. Proposed Pipeline for LLM-Driven Assessment Report Generation: By instructing LLMs to role-play as an educational analyst and providing assessment context and data, we construct a prompt that is used to generate a teacher-oriented assessment report

Leveraging large language models for accelerated learning and innovation in biogenic tissue-engineered vascular grafts

waddelma — Wed, 21 May 2025 15:49:45 +0000

Gomes, S. F., Jr.; Velasco, K.; Cunha, S.; Santos, J.; Aboelkheir, M. G.; Sumini, M.; Thiré, R.; Duarte, P. C., Jr.; Andrade, A. J. P.; Díaz-Martín, R. D.; Clebis, V. H.; Bhansali, S.; Pal, K.; Maranhão, F. “” Journal of Drug Delivery Science and Technology 108 (2025): 106935.��.��

This study uses Large Language Models (LLMs)—a type of advanced artificial intelligence—to help speed up progress in developing tissue-engineered vascular grafts (TEVGs), which are lab-made blood vessels used to repair or replace damaged ones. TEVGs face several major challenges, including making sure they’re safe for the body (biocompatibility), allowing living cells to grow and integrate properly, and improving how the grafts are designed (scaffold optimization).��

By using LLMs to review and analyze a large amount of scientific literature, the researchers were able to spot important trends and breakthroughs in materials and techniques used to build these grafts. This includes newer fabrication methods like hybrid 3D printing, electrospinning, and melt electrowriting, which help improve how blood flows through the grafts (hemodynamics) and make them mechanically stronger and more natural.��

One promising material combination the study highlights is polycaprolactone (PCL) mixed with collagen. These scaffolds support healthy cell growth and tissue remodeling, closely mimicking real blood vessels. The approach also tackles ongoing problems like blood clotting (thrombosis), stiffness mismatches between the graft and the body’s vessels, and poor performance of small-diameter grafts.��

By bringing AI into the research process, this study presents a new way to drive innovation in TEVGs and move closer to making them more effective and widely used in clinical settings.��

Fig. 1.��In vitro characterization of the rSMC-seeded ES-TIPS PEUU constructs (a-c and g-i) immediately after seeding and (d-f and j-i) after dynamic culture for 2 days in a spinner flask. (a, g and j). Nuclear staining (red��=��autofluorescence of scaffold, blue��=��nuclei); (b, e, h, k,n and o) H&E staining; (c, f, i and l). Masson’s trichrome staining; (d) F-actin staining (green��=��F-actin, blue��=��nuclei). (m–o) unseeded or poorly-seeded ES-TIPS scaffolds. Reprinted with permission from Copyright: © 2017 Ulf Bertram et al. This is an open-access article distributed under the Creative Commons Attribution 4.0 License. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)��

Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective

waddelma — Wed, 21 May 2025 15:36:05 +0000

Ni, Bo; Wang, Yu; Cheng, Lu; Blasch, Erik; Derr, Tyler. “.” Proceedings of the AAAI Conference on Artificial Intelligence 39, no. 12 (2025): 12417–12425.��.��

Recently, researchers have combined two powerful tools—Knowledge Graphs (which organize information like a big map of facts) and Large Language Models (which generate human-like text)—to make these models smarter and reduce their mistakes. This approach helps the models reason better by checking facts from the knowledge graph.��

However, these combined systems still struggle with knowing how confident they should be in their answers. This is important because in many real-world situations, making a wrong decision can be costly or harmful. It’s hard to add this “uncertainty awareness” because the system’s parts are complex and interact in complicated ways.��

To solve this problem, a new method called UAG (Uncertainty Aware Knowledge-Graph Reasoning) was created. UAG helps the system not only give answers but also say how confident it is in those answers. It uses special techniques to keep errors under control while improving accuracy. Tests show that UAG can reliably meet confidence goals and make predictions more precise, reducing unnecessary guesses by about 40% compared to older methods.��

��

Evaluating Sex and Age Biases in Multimodal Large Language Models for Skin Disease Identification from Dermatoscopic Images

waddelma — Wed, 23 Apr 2025 14:08:02 +0000

Wan, Zhiyu; Guo, Yuhang; Bao, Shunxing; Wang, Qian; Malin, Bradley A. “.” Health Data Science 5 (2025): 256. .��

Multimodal large language models (LLMs), like ChatGPT-4, have shown promise in healthcare, especially in areas like identifying skin diseases. However, there are concerns about how reliable these models are and whether they have any biases. In this study, we tested two popular models, ChatGPT-4 and LLaVA-1.6, to see how well they could identify three types of skin conditions—melanoma, melanocytic nevi, and benign keratosis-like lesions—using a dataset with around 10,000 images. We wanted to check if the models performed differently based on a person’s sex or age.��

When we compared the performance of ChatGPT-4 and LLaVA-1.6 with three other deep learning models, we found that ChatGPT-4 and LLaVA-1.6 did better overall than the traditional models, with ChatGPT-4 being 3% better and LLaVA-1.6 being 23% better. However, both models did worse than another advanced model called Swin-B. When it came to bias, ChatGPT-4 showed no bias toward any sex or age group, while LLaVA-1.6 was fair across all age groups but had some bias in sex. In contrast, Swin-B was found to be biased when identifying melanocytic nevi.��

Overall, the study suggests that LLMs like ChatGPT-4 and LLaVA-1.6 can be useful and fair tools for helping doctors with diagnosing skin conditions and screening patients. However, to be sure about their reliability and fairness, future research needs to test these models on even larger and more varied datasets.��

Fig. 1. A flowchart describing the fairness evaluation framework for multimodal large language models (LLMs) in skin disease identification. ChatGPT, Chat Generative Pre-trained Transformer; LLaVA, Large Language and Vision Assistant; VGG, Visual Geometry Group; ResNet, Residual Network.��

Enhancing Physician Flexibility: Prompt-Guided Multi-class Pathological Segmentation for Diverse Outcomes

waddelma — Wed, 23 Apr 2025 13:56:47 +0000

Cui, Can; Deng, Ruining; Guo, Junlin; Liu, Quan; Yao, Tianyuan; Yang, Haichun; Huo, Yuankai. “” BHI 2024 ��һ��ݶ� – IEEE-EMBS International Conference on Biomedical and Health Informatics, Proceedings (2024 ��һ��ݶ�). .��

��

The Vision Foundation Model has recently shown promise in analyzing medical images, especially because it can often work right away without needing to be retrained—this is known as zero-shot learning. This makes it easier and faster to apply AI in healthcare. However, when it comes to segmenting (or outlining) different parts of medical images, like diseased tissues in pathology slides, things get tricky. A simple click on a large image might point to something small like a single cell, or something big like a full tissue layer—so the AI needs to be very flexible in what it can detect.��

Most current models can predict general results, but they aren’t very good at adjusting to what a doctor specifically wants to look at. In this study, we tested whether using a Large Language Model (LLM) to guide the image analysis with different written instructions (called prompts) could make these models more adaptable, compared to traditional methods that rely on fixed task labels.��

Our work includes four key contributions:��

We built an efficient system that uses customized language prompts to help the AI flexibly identify different structures in medical images.��

We compared how well the model performs when given fixed prompts versus more natural, free-text instructions.��

We created a special dataset of kidney pathology images along with a variety of free-text prompts tailored to those images.��

We tested how well the model could handle new and different cases during analysis.��

Overall, our approach shows that allowing doctors to guide the AI with flexible language prompts could make medical image segmentation more accurate and user-friendly.��

��