How Do I Download A Folder From Github
GitHub is a web-based hosting service for version control using Git. Information technology is mostly used for storing and sharing computer source lawmaking. It offers all of the distributed version control and source code management functionality of Git as well every bit adding its ain features.
GitHub stores more than 3 million repositories with more than 1.vii million developers using it daily. With and so much data, it can exist quite daunting at first to find information one needs or exercise repetitive tasks, and that is when GitHub API comes handy.
In this tutorial, you lot are going to learn how to use GitHub API to search for repositories and files that much particular keywords(s) and retrieve their URLs using Python. You lot will learn also how to download files or a specific binder from a GitHub repository.
Project Setup
Personal Access Token
In order to access the GitHub API, you will need an access token to authorize API calls. Head over to GitHub to your token settings page. If you lot exercise not have a GitHub account, y'all will have to create one.
Click Generate New Token.
Enter the token clarification and bank check public_repo.
Gyre to the bottom and click Generate token.
Once your token is created, copy and save it somewhere for subsequently utilize. Note, once you leave this page you will non see that token over again.
Client Setup
The but package you lot demand to install for python is PyGithub . Run:
Note: PyGithub is a tertiary party library. Github just offers official client libraries for Cherry-red, Node.js and .NET.
Then, you need to import it.
from github import Github |
GitHub API Test
With the admission token obtained earlier, you need to test your connection to the API. First of all, create a constant to concur your token:
ACCESS_TOKEN = 'put your token here' |
Then initialize the GitHub client.
Yous tin so endeavor getting your list of repositories to test the connection.
print ( g . get_user ( ) . get_repos ( ) ) |
The result should be something like to the post-obit.
< github . PaginatedList . PaginatedList object at . . . . . . > |
Practiced. Now you are all set up.
This tutorial covers the post-obit topics:
- Searching GitHub repos using the GitHub API
- Searching * . po files using the GitHub API
- Downloading a folder from GitHub using svn
Before you proceed, brand a copy of the script with access token and then that y'all have two split scripts for each section
Searching GitHub Repos
Capture Keywords
The first affair you demand to do is capture keywords. But add the following snippet at the bottom of your script:
if __name__ == '__main__' : keywords = input ( 'Enter keyword(s)[e.g python, flask, postgres]: ' ) |
Accept notation of the suggestions in betwixt the square brackets. It is always good to guide the user on the kind of input you require and then that yous practise not spending a lot of trying to parse input provided.
One time the user provides the input y'all need to divide into a list:
keywords = [ keyword . strip ( ) for keyword in keywords . split ( ',' ) ] |
Here, you are splitting the keywords provided and trimming them of whatsoever unnecessary white-space. Python's list comprehensions enable yous to perform all this in 1 line.
Search Repositories
Now you demand to add a function that volition receive the keywords and search GitHub for repos that match.
def search_github ( keywords ) : query = '+' . join ( keywords ) + '+in:readme+in:description' effect = g . search_repositories ( query , 'stars' , 'desc' ) impress ( f 'Plant {result.totalCount} repo(southward)' ) for repo in consequence : print ( repo . clone_url ) |
There's a couple of things happening in this function. Beginning of all, you are taking the keywords and forming a GitHub search query. GitHub search queries taking the following format.
SEARCH_KEYWORD_1 + SEARCH_KEYWORD_N + QUALIFIER_1 + QUALIFIER _N |
In your function, '+in:readme+in:description' are the qualifiers. One time the query has been formed, y'all submit the query to GitHub ordering the results past the number of stars in descending lodge. When you become the results y'all print the total number of repos found and then print the clone URL for each one. At the bottom of your script and the function phone call with keywords every bit the parameter and run the script.
keywords = [ keyword . strip ( ) for keyword in keywords . dissever ( ',' ) ] search_github ( keywords ) |
When you submit python, django, postgres as the input to the script you should finish up with the post-obit output.
Plant 54 repo ( s ) https : / / github . com / citusdata / django - multitenant . git https : / / github . com / dheerajchand / ubuntu - django - nginx - ansible . git https : / / github . com / chenjr0719 / Docker - Django - Nginx - Postgres . git https : / / github . com / nomadjourney / python - box . git https : / / github . com / laitassou / etherkar . git https : / / github . com / the - vampiire / medi_assessment . git https : / / github . com / mapes911 / django - vagrant - box . git https : / / github . com / sathyaNarayanC / registration - form . git https : / / github . com / joshimiloni / AAM - Book - Commutation . git https : / / github . com / dxvxd / vagrant - py3 - django - pgSQL . git https : / / github . com / desarroll0 / lostItems . git https : / / github . com / cjroth / example - docker - deject - projection . git . . . . . |
To make the output more usable, you need to add the number of stars side by side to each URL, make the following modification.
for repo in result : print ( f '{repo.clone_url}, {repo.stargazers_count} stars' ) |
Running the script with the same input as before, will give the following output.
Plant 54 repo ( south ) https : / / github . com / citusdata / django - multitenant . git , 181 stars https : / / github . com / dheerajchand / ubuntu - django - nginx - ansible . git , fifteen stars https : / / github . com / chenjr0719 / Docker - Django - Nginx - Postgres . git , six stars https : / / github . com / nomadjourney / python - box . git , iv stars https : / / github . com / laitassou / etherkar . git , ii stars https : / / github . com / the - vampiire / medi_assessment . git , 2 stars https : / / github . com / mapes911 / django - vagrant - box . git , 1 stars https : / / github . com / sathyaNarayanC / registration - form . git , one stars https : / / github . com / dxvxd / vagrant - py3 - django - pgSQL . git , ane stars https : / / github . com / joshimiloni / AAM - Book - Substitution . git , 1 stars https : / / github . com / desarroll0 / lostItems . git , i stars . . . . . |
Searching GitHub Files
In this section, you will search for *.po files (translation files) that include the proper name of a specific language.
Capture Keyword
The first thing you need to do is capture keywords. Simply, add the following snippet at the bottom of your script:
if __name__ == '__main__' : keyword = input ( 'Enter keyword[e.g french, german language etc]: ' ) |
Accept note of the suggestions in betwixt the square brackets. It is ever good to guide the user on the kind of input you crave so that you do not spending a lot of trying to parse input provided.
Search Files
Now you need to add a role that will receive the keyword and search GitHub for files that contain it.
1 2 iii iv 5 6 7 eight 9 10 11 12 thirteen 14 15 16 17 eighteen 19 20 | def search_github ( keyword ) : rate_limit = grand . get_rate_limit ( ) rate = rate_limit . search if rate . remaining == 0 : print ( f 'You have 0/{charge per unit.limit} API calls remaining. Reset time: {rate.reset}' ) render else : print ( f 'You lot have {rate.remaining}/{rate.limit} API calls remaining' ) query = f '"{keyword} english" in:file extension:po' consequence = g . search_code ( query , society = 'desc' ) max_size = 100 print ( f 'Institute {result.totalCount} file(s)' ) if result . totalCount > max_size : effect = result [ : max_size ] for file in result : impress ( f '{file.download_url}' ) |
There's a couple of things happening in this role. First of all, you are checking GitHub for the current API charge per unit limit. In club to prevent blocking of futurity API calls, it is always proficient to cheque the current status of your limits before doing whatever call. If your rate checks out, y'all are taking the keyword and forming a GitHub search query.
In your function, 'in:file extension:po' are the qualifiers. You are only interested in *.po files which incorporate your keyword. Also note the max_size variable. It's used to limit the results returned to the outset 100. Once the query has been formed, y'all submit the query to GitHub ordering the results in descending society. When you go the results, you print the full number of files found and and then print the download URL for each one. At the bottom of your script add the function call with keyword every bit the parameter and run the script.
. . . . search_github ( keywords ) |
When you submit dutch as the input to the script you lot should end upwardly with the following output.
You have 28 / 30 API calls remaining Found 196 file ( s ) https : / / raw . githubusercontent . com / iegor / kdesktop / d5dccbe01eeb7c0e82ac5647cf2bc2d4c7beda0b / kde - i18n / ar / messages / kdeedu / klettres . po https : / / raw . githubusercontent . com / iegor / kdesktop / d5dccbe01eeb7c0e82ac5647cf2bc2d4c7beda0b / kde - i18n / en_GB / messages / kdeedu / klettres . po https : / / raw . githubusercontent . com / iegor / kdei18n / d5c80ababe3d6a39dcde39605080ddf07856215e / en_GB / letters / kdeedu / klettres . po https : / / raw . githubusercontent . com / iegor / kdei18n / d5c80ababe3d6a39dcde39605080ddf07856215e / ar / letters / kdeedu / klettres . po https : / / raw . githubusercontent . com / iegor / kdesktop / d5dccbe01eeb7c0e82ac5647cf2bc2d4c7beda0b / kde - i18n / af / messages / kdeedu / klettres . po https : / / raw . githubusercontent . com / iegor / kdesktop / d5dccbe01eeb7c0e82ac5647cf2bc2d4c7beda0b / kde - i18n / az / letters / kdeedu / klettres . po https : / / raw . githubusercontent . com / iegor / kdesktop / d5dccbe01eeb7c0e82ac5647cf2bc2d4c7beda0b / kde - i18n / br / messages / kdeedu / klettres . po https : / / raw . githubusercontent . com / iegor / kdesktop / d5dccbe01eeb7c0e82ac5647cf2bc2d4c7beda0b / kde - i18n / bs / messages / kdeedu / klettres . po https : / / raw . githubusercontent . com / iegor / kdesktop / d5dccbe01eeb7c0e82ac5647cf2bc2d4c7beda0b / kde - i18n / cy / letters / kdeedu / klettres . po . . . . . . . . . . |
There is and so much that can be achieved with the GitHub API. You merely demand to take note of ane important thing. When generating a personal access token, but check what you need. This is but an extra precaution in example your script falls into the wrong easily.
Download Files
To download files resulted from the previous script, y'all can use the Requests library.
import requests url = "https://raw.githubusercontent.com/iegor/kdesktop/d5dccbe01eeb7c0e82ac5647cf2bc2d4c7beda0b/kde-i18n/ar/messages/kdeedu/klettres.po" r = requests . get ( url ) open ( "file.po" , "wb" ) . write ( r . content ) |
After importing requests, the first line is simply the file URL. The second line is sending a asking to connect to the URL. Finally, the final line writes the file content to a new file on the local motorcar.
You can add this portion of code to the loop for file in outcome yous have created. In this case, y'all need to distinguish the file name maybe past its alphabetize number in the loop or by using filename = url [ url . rfind ( "/" ) + 1 : ] to extract the filename from the URL.
Downloading GitHub Folders
In the third section of this tutorial, you are going to learn how to download a single folder/directory from a GitHub repository. Please note that this section does not require the use of the GitHub API so merely create a blank Python script.
Capture URL
The first matter you need to practise is to capture the URL of the folder you want to download. In the 2d script you had created earlier, add together the following.
url = input ( 'Enter folder url: ' ) |
When dealing with URLs, it's e'er good to validate them before doing anything with them. In that location are several methods of doing information technology. For this tutorial, you are going to use a library which focuses on validation. Run:
One time you have installed the parcel add the validation logic at the bottom of the script.
import validators . . . . if non validators . url ( url ) : print ( 'Invalid url' ) else : laissez passer |
Before adding the part for downloading the folder, you need to add one more dependency.
SVN (Subversion) is a centralized version control arrangement, just similar git. Git does not take a native control for downloading a sub-directory from a repo. The only manner to go all the files from a sub-directory is to download all the files individually. This tin be really wearisome and thus the reason to utilize svn.
Notation. In order for the SVN Python packet to work, you lot need to make sure svn is installed on your organization and can exist launched from Last/Command Prompt.
Download the Folder
Once you have verified that svn is installed, add together the function for downloading the folder.
from svn . remote import RemoteClient . . . . def download_folder ( url ) : if 'tree/primary' in url : url = url . supplant ( 'tree/master' , 'trunk' ) r = RemoteClient ( url ) r . export ( 'output' ) |
In order to make svn work with the provided URL, you need to replace tree/primary with body. Git and svn share a lot of features but there are also a lot of differences between the two, the URL pattern existence one of them.
Finally, add together the function at the bottom of the script.
if not validators . url ( url ) : print ( 'Invalid url' ) else : download_folder ( url ) |
Now, try running the script, providing https : / / github . com / pallets / flask / tree / master / examples as the URL. A binder called output should exist created with the contents of the folder specified in the URL.
Full Project Code (Searching Repos)
i 2 3 iv 5 half-dozen 7 8 9 10 11 12 13 14 15 16 17 18 xix xx 21 22 | from github import Github ACCESS_TOKEN = 'put your token here' g = Github ( ACCESS_TOKEN ) def search_github ( keywords ) : query = '+' . bring together ( keywords ) + '+in:readme+in:description' result = g . search_repositories ( query , 'stars' , 'desc' ) impress ( f 'Institute {issue.totalCount} repo(due south)' ) for repo in upshot : print ( f '{repo.clone_url}, {repo.stargazers_count} stars' ) if __name__ == '__main__' : keywords = input ( 'Enter keyword(southward)[e.g python, flask, postgres]: ' ) keywords = [ keyword . strip ( ) for keyword in keywords . split up ( ',' ) ] search_github ( keywords ) |
Full Projection Lawmaking (Searching Files)
1 two 3 four 5 6 7 viii 9 10 xi 12 13 14 15 16 17 eighteen 19 twenty 21 22 23 24 25 26 27 28 29 30 31 | from github import Github ACCESS_TOKEN = 'put your token hear' chiliad = Github ( ACCESS_TOKEN ) def search_github ( keyword ) : rate_limit = m . get_rate_limit ( ) rate = rate_limit . search if rate . remaining == 0 : impress ( f 'Yous have 0/{rate.limit} API calls remaining. Reset time: {rate.reset}' ) return else : print ( f 'Y'all take {rate.remaining}/{rate.limit} API calls remaining' ) query = f '"{keyword} english" in:file extension:po' result = g . search_code ( query , order = 'desc' ) max_size = 100 print ( f 'Found {result.totalCount} file(southward)' ) if result . totalCount > max_size : result = consequence [ : max_size ] for file in consequence : print ( f '{file.download_url}' ) if __name__ == '__main__' : keyword = input ( 'Enter keyword[e.thousand french, german language etc]: ' ) search_github ( keyword ) |
Full Project Code (Downloading a Folder)
1 two three 4 5 6 7 8 9 ten xi 12 13 fourteen 15 16 17 18 19 | import validators from svn . remote import RemoteClient def download_folder ( url ) : if 'tree/master' in url : url = url . supplant ( 'tree/principal' , 'trunk' ) r = RemoteClient ( url ) r . export ( 'output' ) if __name__ == '__main__' : url = input ( 'Enter folder url: ' ) if not validators . url ( url ) : print ( 'Invalid url' ) else : download_folder ( url ) |
Software Engineer & Dancer. Or is it the other way effectually? 🙂
DOWNLOAD HERE
Posted by: entrekinswuzzy.blogspot.com
0 Komentar
Post a Comment