The datafile containing the patent-paper pairs (PPPs) is patentpaper_pairs.tsv. These are USPTO only, through 2021. Each PPP has a confidence score and the count of days between the publication of the paper and the filing of the patent. (If the patent is a continuation of another patent, the filing date of the original patent is used.) Also, when a paper is paired with multiple patents, an indicator variable reports whether those patents are continuations or otherwise identical.
Note that we link to papers captured by OpenAlex. We provide a redistribution of select OpenAlex data here.
A paper describing these data is coming soon, meanwhile, you can watch a video about the dataset, or click on DOCUMENTATION for the presentation slides.