pubmed_searcher
Search PubMed and run a batch pipeline (download, images, refs).
pypaperretriever.pubmed_searcher.PubMedSearcher
Search PubMed and manage retrieved articles.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search_string
|
str | None
|
Query used for PubMed search. |
None
|
df
|
DataFrame | None
|
Existing table of articles. |
None
|
email
|
str
|
Email address required by Entrez. |
''
|
Attributes:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
Table of article metadata and processing flags. |
search_string |
str | None
|
Stored search query. |
email |
str
|
Email address used for API calls. |
Initialize the searcher.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
search_string
|
str | None
|
Query to submit to PubMed. |
None
|
df
|
DataFrame | None
|
Existing table of articles. |
None
|
email
|
str
|
Email address required by Entrez. |
''
|
Source code in pypaperretriever/pubmed_searcher.py
search
search(
count: int = 10,
min_date: int | None = None,
max_date: int | None = None,
order_by: str = "chronological",
only_open_access: bool = False,
only_case_reports: bool = False,
) -> Self
Search PubMed for articles.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
count
|
int
|
Number of articles to retrieve. |
10
|
min_date
|
int | None
|
Minimum publication year. |
None
|
max_date
|
int | None
|
Maximum publication year. |
None
|
order_by
|
str
|
|
'chronological'
|
only_open_access
|
bool
|
If |
False
|
only_case_reports
|
bool
|
If |
False
|
Returns:
Name | Type | Description |
---|---|---|
Self |
Self
|
This instance. |
Source code in pypaperretriever/pubmed_searcher.py
download_articles
download_articles(
allow_scihub: bool = False,
download_directory: str = "pdf_downloads",
max_articles: int | None = None,
) -> Self
Download full-text PDFs for articles in df
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
allow_scihub
|
bool
|
Use Sci-Hub as a fallback source. |
False
|
download_directory
|
str
|
Directory to store downloaded PDFs. |
'pdf_downloads'
|
max_articles
|
int | None
|
Maximum number of articles to process. |
None
|
Returns:
Name | Type | Description |
---|---|---|
Self |
Self
|
The updated instance. |
Source code in pypaperretriever/pubmed_searcher.py
extract_images
Extract images from downloaded PDFs using :class:ImageExtractor
.
Only rows marked as successfully downloaded are processed.
Returns:
Name | Type | Description |
---|---|---|
Self |
Self
|
The updated instance. |
Source code in pypaperretriever/pubmed_searcher.py
fetch_references
Fetch references for each article in df
.
Returns:
Name | Type | Description |
---|---|---|
Self |
Self
|
The updated instance. |
Source code in pypaperretriever/pubmed_searcher.py
fetch_cited_by
Fetch citing articles for each entry in df
.
Returns:
Name | Type | Description |
---|---|---|
Self |
Self
|
The updated instance. |
Source code in pypaperretriever/pubmed_searcher.py
fetch_abstracts
Retrieve abstracts for articles missing them in df
.
Source code in pypaperretriever/pubmed_searcher.py
get_abstract
Fetch the abstract for a PMID.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pmid
|
str
|
Identifier of the article. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Abstract text. |
Source code in pypaperretriever/pubmed_searcher.py
download_xml_fulltext
Download XML full text for open-access articles.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
download_directory
|
str
|
Destination directory for XML files. |
'downloads'
|
Returns:
Name | Type | Description |
---|---|---|
Self |
Self
|
The updated instance. |
Source code in pypaperretriever/pubmed_searcher.py
save
Persist the internal DataFrame to CSV.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
csv_path
|
str
|
Output path for the CSV file. |
'master_list.csv'
|
Returns:
Name | Type | Description |
---|---|---|
Self |
Self
|
This instance. |
Source code in pypaperretriever/pubmed_searcher.py
save_abstracts_as_csv
Save only PMIDs and abstracts to a CSV file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename
|
str
|
Output filename. |
'abstracts.csv'
|
Returns:
Name | Type | Description |
---|---|---|
Self |
Self
|
This instance. |