Office365-rest-python-client: How to recursively download all sharepoint doc files from folders and subfolders?

Created on 2 Apr 2019  ·  9Comments  ·  Source: vgrem/Office365-REST-Python-Client

I have a requirement where I've to recursively download all the files from root folder, subfolder and sub of subfolder to Nth.

How can I go about it? Is there a method to list folders in a particular folder? Also, how can I list folders in the root Document Library? @vgrem @Bachatero

question

Most helpful comment

folder_list = []

def get_folder_relativeUrl(context, folder_relativeUrl):

    libraryRoot = context.web.get_folder_by_server_relative_url(folder_relativeUrl)
    folders = libraryRoot.folders
    context.load(folders)
    context.execute_query()

    for cur_folder in folders:
        folder_list.append(cur_folder.properties["ServerRelativeUrl"])
        get_folder_relativeUrl(context, cur_folder.properties["ServerRelativeUrl"])

    return folder_list

this way will give you the flat list contains Parent folder and Nth sub folders..

however, this is slower in term of performance.

All 9 comments

Got a link, which says we retrieve the entire folder and file structure tree using a query.

First answer of this link: https://sharepoint.stackexchange.com/questions/159105/with-rest-recursively-retrieve-file-and-folder-directory-structure

I am trying to replicate this from the above link using your api: /_api/web/Lists/GetByTitle('Documents')/Items?$select=FileLeafRef,FileRef

But when I try this using below code:

folder = ctx.web.lists.get_by_title('Documents')
folder = folder.get_items('$select=FileLeafRef,FileRef')

It fails with an error: "'str' object has no attribute 'payload'"

What to do?

Hi,

you might use approach of calling proc, which recursively calls itself, e.g.:

def printAllContents(ctx, relativeUrl):

try:

    libraryRoot = ctx.web.get_folder_by_server_relative_url(relativeUrl)
    ctx.load(libraryRoot)
    ctx.execute_query()

    folders = libraryRoot.folders
    ctx.load(folders)
    ctx.execute_query()

    for myfolder in folders:
        print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))
        printAllContents(ctx, relativeUrl + '/' + myfolder.properties["Name"])

    files = libraryRoot.files
    ctx.load(files)
    ctx.execute_query()

    for myfile in files:
        #print("File name: {0}".format(myfile.properties["Name"]))
        print("File name: {0}".format(myfile.properties["ServerRelativeUrl"]))
except:

    print('Problem printing out list of folders')   
    sys.exit(1)

m.

... you may then, for instance, download each file using ServerRelativeUrl which gets printed out ...

I posted my query here: in a more structured way.

FYI: The JSON there is just for representational/understanding purpose.

I'm not sure what you are getting at. I think the proc I've listed an example of just does that...recursively lists all folders/subfolders and files within these folders and subfolders...

Example of downloading the files as you go down the tree recursively...

outputDir = "d:\output"

def printAllContents(ctx, relativeUrl):

    try:

        libraryRoot = ctx.web.get_folder_by_server_relative_url(relativeUrl)
        ctx.load(libraryRoot)
        ctx.execute_query()

        folders = libraryRoot.folders
        ctx.load(folders)
        ctx.execute_query()

        for myfolder in folders:
            #print("Folder name: {0}".format(myfolder.properties["Name"]))
            print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))
            printAllContents(ctx, relativeUrl + '/' + myfolder.properties["Name"])

        files = libraryRoot.files
        ctx.load(files)
        ctx.execute_query()

        for myfile in files:
            print("File name: {0}".format(myfile.properties["ServerRelativeUrl"]))
            pathList = myfile.properties["ServerRelativeUrl"].split('/')
            fileDest = outputDir + "/"+ pathList[-1]
            downloadFile(ctx, fileDest, myfile.properties["ServerRelativeUrl"])

    except:

        print('Problem printing out list of folders')   
        sys.exit(1)
folder_list = []

def get_folder_relativeUrl(context, folder_relativeUrl):

    libraryRoot = context.web.get_folder_by_server_relative_url(folder_relativeUrl)
    folders = libraryRoot.folders
    context.load(folders)
    context.execute_query()

    for cur_folder in folders:
        folder_list.append(cur_folder.properties["ServerRelativeUrl"])
        get_folder_relativeUrl(context, cur_folder.properties["ServerRelativeUrl"])

    return folder_list

this way will give you the flat list contains Parent folder and Nth sub folders..

however, this is slower in term of performance.

Greetings,

since this question has been answered I propose to close it

Was this page helpful?
0 / 5 - 0 ratings