研究方向与代表工作 Research & Representative Work

基因表达调控算法 Gene Expression Regulation Algorithms

基于机器学习方法开发基因表达与功能富集分析工具，致力于提升生命科学数据挖掘的准确性与效率。 Developing gene expression and functional enrichment analysis tools using machine learning methods, dedicated to improving the accuracy and efficiency of life science data mining.

KOBAS-i

高被引论文 Highly Cited Paper 引用 1,410+ Cited 1,410+

智能通路富集分析平台 Intelligent Pathway Enrichment Analysis Platform | Nucleic Acids Research, 2021 | 通讯作者 Corresponding Author

基于知识库与智能排序算法，实现基因集的生物学功能探索与交互可视化。 Based on knowledge bases and intelligent ranking algorithms, enabling biological function exploration and interactive visualization of gene sets.

CGPS

整合型基因集富集分析算法 Integrated Gene Set Enrichment Analysis Algorithm | Journal of Genetics and Genomics, 2018 | 通讯作者 Corresponding Author

利用机器学习整合多个主流富集工具结果，提升关键通路排序的生物学相关性。 Using machine learning to integrate results from multiple mainstream enrichment tools, improving the biological relevance of key pathway rankings.

生命科学高性能计算 High-Performance Computing for Life Sciences

针对生命科学计算高内存、高IO的特点，从体系架构、系统软件到评估方法进行全栈设计与优化。 Addressing the high-memory, high-IO characteristics of life science computing with full-stack design and optimization from architecture to system software to evaluation methods.

BioProfile

面向生命科学领域的开放式计算性能评估体系，基于真实软件负载刻画计算特征，指导集群设计、选型与调度优化。 An open computing performance evaluation system for life sciences, characterizing computational features based on real software workloads to guide cluster design, procurement, and scheduling optimization.

Axon OS

大模型原生的超智融合集群操作系统。 LLM-native Super-Intelligent Converged Cluster OS.

大规模并行集群架构 Large-Scale Parallel Cluster Architecture

设计建设P级GPU双精度算力集群，针对生命科学计算特点优化架构，实测GPU Linpack效率 74%，处于先进水平。 Designed and built a petaflop GPU double-precision computing cluster optimized for life science workloads, achieving 74% GPU Linpack efficiency.

高质量数据集建设 High-Quality Dataset Construction

面向生物医药领域的数据孤岛与质量参差问题，提出系统化数据治理框架，构建"可读、可懂、可信"的高质量数据资产。 Addressing data silos and quality issues in biomedicine, proposing a systematic data governance framework to build 'readable, understandable, trustworthy' high-quality data assets.

🧠

智能化应用 Intelligent Applications / 智能化应用Intelligent Applications

多模态表征Multi-modal Representation 智能代理Intelligent Agents 合成数据Synthetic Data 临床决策Clinical Decision Support

多模态融合分析、自然语言数据查询、隐私保护数据生成。 Multi-modal fusion analysis, natural language data queries, privacy-preserving data generation.

🕸️

知识图谱化 Knowledge Graph / 知识图谱化Knowledge Graph

实体识别Entity Recognition 标准ID映射Standard ID Mapping 知识融合Knowledge Fusion 图推理Graph Reasoning

实体链接至UMLS/NCBI，融合PrimeKG等公开知识库，构建语义网络。 Entity linking to UMLS/NCBI, fusing public knowledge bases like PrimeKG to build semantic networks.

🗂️

数据资源化 Data Resourcing / 数据资源化Data Resourcing

多源汇聚Multi-source Integration 湖仓一体Data Lakehouse 质量监控Quality Monitoring 数据血缘Data Lineage

多源异构数据统一接入、标准化存储、自动化质量校验与血缘追踪。 Unified multi-source heterogeneous data ingestion, standardized storage, automated quality validation, and lineage tracking.

"高质量数据集不是天然存在的，而是系统性工程的结果。" "High-quality datasets do not exist naturally; they are the result of systematic engineering."