GitHub is a web-based hosting service for version control using Git. Information technology is mostly used for storing and sharing computer source lawmaking. It offers all of the distributed version control and source code management functionality of Git as well every bit adding its ain features.

GitHub stores more than 3 million repositories with more than 1.vii million developers using it daily. With and so much data, it can exist quite daunting at first to find information one needs or exercise repetitive tasks, and that is when GitHub API comes handy.

In this tutorial, you lot are going to learn how to use GitHub API to search for repositories and files that much particular keywords(s) and retrieve their URLs using Python. You lot will learn also how to download files or a specific binder from a GitHub repository.

Project Setup

Personal Access Token

In order to access the GitHub API, you will need an access token to authorize API calls. Head over to GitHub to your token settings page. If you lot exercise not have a GitHub account, y'all will have to create one.

Click Generate New Token.

Enter the token clarification and bank check public_repo.

Gyre to the bottom and click Generate token.

Once your token is created, copy and save it somewhere for subsequently utilize. Note, once you leave this page you will non see that token over again.

Client Setup

The but package you lot demand to install for python is PyGithub . Run:

Note: PyGithub is a tertiary party library. Github just offers official client libraries for Cherry-red, Node.js and .NET.

Then, you need to import it.

GitHub API Test

With the admission token obtained earlier,  you need to test your connection to the API. First of all, create a constant to concur your token:

Then initialize the GitHub client.

Yous tin so endeavor getting your list of repositories to test the connection.

The result should be something like to the post-obit.

Practiced. Now you are all set up.

This tutorial covers the post-obit topics:

  1. Searching GitHub repos using the GitHub API
  2. Searching * . po   files using the GitHub API
  3. Downloading a folder from GitHub using svn

Before you proceed, brand a copy of the script with access token and then that y'all have two split scripts for each section

Searching GitHub Repos

Capture Keywords

The first affair you demand to do is capture keywords. But add the following snippet at the bottom of your script:

Accept notation of the suggestions in betwixt the square brackets. It is always good to guide the user on the kind of input you require and then that yous practise not spending a lot of trying to parse input provided.

One time the user provides the input y'all need to divide into a list:

Here, you are splitting the keywords provided and trimming them of whatsoever unnecessary white-space. Python's list comprehensions enable yous to perform all this in 1 line.

Search Repositories

Now you demand to add a function that volition receive the keywords and search GitHub for repos that match.

There's a couple of things happening in this function. Beginning of all, you are taking the keywords and forming a GitHub search query. GitHub search queries taking the following format.

In your function, '+in:readme+in:description'   are the qualifiers. One time the query has been formed, y'all submit the query to GitHub ordering the results past the number of stars in descending lodge. When you become the results y'all print the total number of repos found and then print the clone URL for each one. At the bottom of your script and the function phone call with keywords every bit the parameter and run the script.

When you submit python, django, postgres as the input to the script you should finish up with the post-obit output.

To make the output more usable, you need to add the number of stars side by side to each URL, make the following modification.

Running the script with the same input as before, will give the following output.

Searching GitHub Files

In this section, you will search for *.po files (translation files) that include the proper name of a specific language.

Capture Keyword

The first thing you need to do is capture keywords. Simply, add the following snippet at the bottom of your script:

Accept note of the suggestions in betwixt the square brackets. It is ever good to guide the user on the kind of input you crave so that you do not spending a lot of trying to parse input provided.

Search Files

Now you need to add a role that will receive the keyword and search GitHub for files that contain it.

There's a couple of things happening in this role. First of all, you are checking GitHub for the current API charge per unit limit. In club to prevent blocking of futurity API calls, it is always proficient to cheque the current status of your limits before doing whatever call. If your rate checks out, y'all are taking the keyword and forming a GitHub search query.

In your function, 'in:file extension:po' are the qualifiers. You are only interested in *.po files which incorporate your keyword. Also note the max_size   variable. It's used to limit the results returned to the outset 100. Once the query has been formed, y'all submit the query to GitHub ordering the results in descending society. When you go the results, you print the full number of files found and and then print the download URL for each one. At the bottom of your script add the function call with keyword every bit the parameter and run the script.

When you submit dutch as the input to the script you lot should end upwardly with the following output.

There is and so much that can be achieved with the GitHub API. You merely demand to take note of ane important thing. When generating a personal access token, but check what you need. This is but an extra precaution in example your script falls into the wrong easily.

Download Files

To download files resulted from the previous script, y'all can use the Requests library.

After importing requests, the first line is simply the file URL. The second line is sending a asking to connect to the URL. Finally, the final line writes the file content to a new file on the local motorcar.

You can add this portion of code to the loop for file in outcome  yous have created. In this case, y'all need to distinguish the file name maybe past its alphabetize number in the loop or by using filename = url [ url . rfind ( "/" ) + 1 : ]  to extract the filename from the URL.

Downloading GitHub Folders

In the third section of this tutorial, you are going to learn how to download a single folder/directory from a GitHub repository. Please note that this section does not require the use of the GitHub API so merely create a blank Python script.

Capture URL

The first matter you need to practise is to capture the URL of the folder you want to download. In the 2d script you had created earlier, add together the following.

When dealing with URLs, it's e'er good to validate them before doing anything with them. In that location are several methods of doing information technology. For this tutorial, you are going to use a library which focuses on validation. Run:

One time you have installed the parcel add the validation logic at the bottom of the script.

Before adding the part for downloading the folder, you need to add one more dependency.

SVN (Subversion) is a centralized version control arrangement, just similar git. Git does not take a native control for downloading a sub-directory from a repo. The only manner to go all the files from a sub-directory is to download all the files individually. This tin be really wearisome and thus the reason to utilize svn.

Notation. In order for the SVN Python packet to work, you lot need to make sure svn is installed on your organization and can exist launched from Last/Command Prompt.

Download the Folder

Once you have verified that svn is installed, add together the function for downloading the folder.

In order to make svn work with the provided URL, you  need to replace tree/primary with body. Git and svn share a lot of features but there are also a lot of differences between the two, the URL pattern existence one of them.

Finally, add together the function at the bottom of the script.

Now, try running the script, providing https : / / github . com / pallets / flask / tree / master / examples   as the URL. A binder called output should exist created with the contents of the folder specified in the URL.

Full Project Code (Searching Repos)

Full Projection Lawmaking (Searching Files)

Full Project Code (Downloading a Folder)

Rating: 4.5/5. From xi votes.

Please wait...