Find the first commit where a function is called in a GitHub repo
Source:R/get_first_call_github.R
get_first_call_github.RdClones a GitHub repository into a local cache directory (if not already
present), checks out a target ref (branch or default), then scans commits
(oldest to newest) whose diffs under dir match a call-like pattern for
fname (e.g. foo( or pkg::foo(). For each candidate commit, it verifies
the call is present somewhere in the repository tree at that commit using
git grep.
Arguments
- github_user
Character scalar. GitHub username or organization.
- github_repo
Character scalar. GitHub repository name.
- fname
Character scalar. Symbol to search for (e.g.
"ggplot").- date_only
Logical; if
TRUE, return only theDateof the first verified call commit. Defaults toFALSE.- branch
Optional character scalar. Ref to check out before searching. If
NULLor empty, the function usesorigin/HEADwhen available, otherwiseHEAD.- max_commits
Maximum number of candidate commits to verify (in order). Use to cap runtime on very large histories. Defaults to
Inf(no cap).- cache_dir
Directory used to cache cloned repositories. Defaults to
getOption("ggext.git_cache", file.path(tempdir(), "gh_repo_cache")).- dir
Directory within the repo to search. Defaults to
"R".
Value
If a call is found:
If
date_only = TRUE, aDate.Otherwise, a one-row
data.framewith columns:github_user,repo,fname,first_call,author,message,url,file.
If not found (or if the repository cannot be searched), returns NA when
date_only = TRUE, otherwise NULL.
Details
Unlike functions that track latest state, this function is concerned only with historical "firsts". Therefore, if a cached clone already exists, no network fetch is performed. The cached repository is assumed to contain a complete history.
If the cached clone is detected to be shallow or invalid, it is deleted and recloned to ensure correctness.
The return value is either the commit date (when date_only = TRUE) or a
one-row data.frame with commit metadata and a GitHub URL.
This function emits diagnostic message() output when it cannot determine an
answer (e.g., repository not found/inaccessible, checkout failed, git errors)
or when fname is never called under dir.
Candidate commits are obtained with git log --reverse -G <regex> under
dir, using patterns that match fname calls like:
fname[[:space:]]*\\(::fname[[:space:]]*\\(
Each candidate is then validated by searching the repository tree at that
commit with git grep -E for the same call-like pattern.
This function shares the same on-disk Git clone cache as
get_first_commit() via the ggext.git_cache option. Repositories
cloned by either function are reused by the other.
For best performance across sessions, set a persistent cache location:
options(ggext.git_cache = "~/Library/Caches/ggext_git")