天天看點

github rest api使用

作者:全棧筆記

背景

1. 最近需要通過github rest api來擷取repository的檔案清單、檔案内容、檔案的送出資訊
2. 通過調研官方文檔: https://docs.github.com/en/rest?apiVersion=2022-11-28
3. 使用以下幾個api即可完成相應功能:
    a.  get_a_branch  : 擷取指定分支資訊
    b.  get_a_tree      : 擷取repository的tree結構下檔案清單資訊
    c.  get_a_blob     : 擷取指定檔案的内容
    d.  list_commits  : 擷取指定檔案的送出資訊           

準備工作

  1. 注冊github賬号
  2. 在github上建立repository
  3. 在github上生成rest api使用的personal access token

接口調用

get_a_branch

  • repository資訊定義
  • http_get方法封裝
import requests
import json
import base64

# config in .ipython/profile_default/startup/00-github_api_config.py
# owner = '<owner>'
# repo = '<repo>'
# github_personal_access_token = '<github personal access token>'

owner = owner
repo = repo
github_personal_access_token = github_personal_access_token
domain = 'https://api.github.com'


def http_get(uri=None, **kargs):

    url = f"{domain}{uri}"
    print(f'get {url} ...')

    res = requests.get(url=url, headers={
        'Accept': 'application/vnd.github+json',
        'Authorization': f'Bearer {github_personal_access_token}',
        'X-GitHub-Api-Version': '2022-11-28'
    }, params=kargs)

    if res.status_code != 200:
        print(f"{res.status_code=}, {res.text=}")
        raise Exception(f"{res.status_code=}, {res.text=}")

    result = json.loads(res.text)
    # print(res.text)

    # print(f"{type(result)=}")
    return result           
  • get_a_branch方法定義
  • 擷取'main'分支的資訊,重點關注傳回的tree_sha屬性,供下一步get_a_tree使用
# curl -L \
#   -H "Accept: application/vnd.github+json" \
#   -H "Authorization: Bearer <YOUR-TOKEN>"\
#   -H "X-GitHub-Api-Version: 2022-11-28" \
#   https://api.github.com/repos/OWNER/REPO/branches/BRANCH

def get_a_branch(branch):
    uri = f"/repos/{owner}/{repo}/branches/{branch}"
    return http_get(uri=uri)


result = get_a_branch('main')
tree_sha = result.get('commit').get('commit').get('tree').get('sha')

print(f"{tree_sha=}")           

get_a_tree

  • 遞歸擷取tree_sha對應的tree檔案清單
  • 重點關注傳回的path與sha屬性,供下一步git_a_blob使用
# curl -L \
#   -H "Accept: application/vnd.github+json" \
#   -H "Authorization: Bearer <YOUR-TOKEN>"\
#   -H "X-GitHub-Api-Version: 2022-11-28" \
#   https://api.github.com/repos/OWNER/REPO/git/trees/TREE_SHA
def get_a_tree(tree_sha):
    uri = f"/repos/{owner}/{repo}/git/trees/{tree_sha}?recursive=true"
    return http_get(uri)


result = get_a_tree(tree_sha)

file_list = []
for e in result.get('tree'):
    path = e.get('path')
    type = e.get('type')
    sha = e.get('sha')

    # print('\n---------------------------------')
    # print(f"{path=}, {type=}, {sha=}")

    if type == 'blob':
        file_list.append({
            'path': path,
            'sha': sha
        })

print(file_list)           

get_a_blob

  • 擷取指定檔案的檔案内容
  • 重點關注content屬性,需要進行base64解碼,即可得到檔案内容
# curl -L \
#   -H "Accept: application/vnd.github+json" \
#   -H "Authorization: Bearer <YOUR-TOKEN>"\
#   -H "X-GitHub-Api-Version: 2022-11-28" \
#   https://api.github.com/repos/OWNER/REPO/git/blobs/FILE_SHA
def get_a_blob(file_sha):
    uri = f'/repos/{owner}/{repo}/git/blobs/{file_sha}'
    return http_get(uri=uri)


for file in file_list:
    result = get_a_blob(file.get('sha'))
    content = result.get('content')
    content = base64.b64decode(content).decode(encoding='utf-8')
    file.setdefault('content', content)

print(file_list)           

list_commits

  • 擷取檔案的送出資訊
# curl -L \
#   -H "Accept: application/vnd.github+json" \
#   -H "Authorization: Bearer <YOUR-TOKEN>"\
#   -H "X-GitHub-Api-Version: 2022-11-28" \
#   https://api.github.com/repos/OWNER/REPO/commits
def list_commits(path=None):
    uri = f'/repos/{owner}/{repo}/commits'
    return http_get(uri=uri, path=path)


for file in file_list:
    path = file.get('path')
    result = list_commits(path)
    result = map(lambda item: {
        'message': item.get('commit').get('message'),
        'committer': item.get('commit').get('committer')
    }, result)
    # print(list(result))
    file.setdefault('commit', list(result))


print(json.dumps(file_list, indent=4))           

小結

  1. 通過以上幾步,就可以以rest api的方式擷取到 repository下對應分支所有檔案清單、檔案内容、送出資訊
  2. 對于需要進行源碼資料采集的在此基礎加工完成
  3. 配上jupyterlab程式設計更絲滑~~