Evaluating Extrapolation Ability of Large Language Model in Chemical Domain

Abstract

Solving a problem outside the training space, i.e. extrapolation, has been a long problem in the machine learning community. The current success of large language models demonstrates the LLM’s extrapolation ability to several unseen tasks. In line with these works, we evaluate the LLM’s extrapolation ability in the chemical domain. We construct a data set measuring the material properties of epoxy polymers depending on various raw materials and curing processes. LLM should predict the material property when novel raw material is introduced utilizing its chemical knowledge. Through experiments, LLM tends to choose the right direction of adjustment but fails to determine the exact degree, resulting in poor MAE on some properties. But LLM can successfully adjust the degree with only a one-shot example. The results show that LLM can extrapolate to new unseen material utilizing its chemical knowledge learned through massive pre-training.

Publication
ACL 2024 Workshop Language + Molecules
Taehun Cha
Taehun Cha
Ph.D. Candidate

There’s a cafe with my name.

Donghun Lee
Donghun Lee
Assistant Professor

Bridging artificial intelligence and mathematics, in both directions.