This work presents a scalable framework to generate high-quality training datasets from sparse seismic observations and an accompanying dataset from publicly available seismographic data source, enabling deep generative models to produce realistic earthquake waveforms despite limited data.