Office365-rest-python-client: How can I download SharePoint folder containing multiple files?

Created on 27 Mar 2019 · 15Comments · Source: vgrem/Office365-REST-Python-Client

My Python 3 code:

from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext

url = 'https://company.sharepoint.com/sites/abc'
ctx_auth = AuthenticationContext(url=url)
if ctx_auth.acquire_token_for_user(username='[email protected]', password='12345'):
ctx = ClientContext(url, ctx_auth)
lists = ctx.web.lists
ctx.load(lists)
ctx.execute_query()
for l in lists:
print(l.properties['Title'])

From the above code, I can list the items in the site. But my plan is to run this entire module in AWS Lambda using Python and download from SharePoint Documents and store in AWS S3.

A folder can have multiple files. I want to download the entire folder with all the files. Anyone did this? Any help? A working code shall be a great help as I am totally new to web scraping!

Source

AakashBasu

Most helpful comment

Don't thank me, @vgrem is to blame :) ... and I'm not sure, maybe there are other ways of achieving the same ....

right, to list all the folders inside Shared Documents document library you may try:

    list_object = ctx.web.lists.get_by_title(listTitle)
    folder = list_object.root_folder        
    ctx.load(folder)
    ctx.execute_query()

    folders = folder.folders
    ctx.load(folders)
    ctx.execute_query()

    for myfolder in folders:
        print("File name: {0}".format(myfolder.properties["Name"]))

Bachatero on 28 Mar 2019

🎉4 ❤1

All 15 comments

Hi,
perhaps you could do it in a loop, e.g.:

return sharepoint Documents library contents first using a function:

listTitle = "Documents"
site = "abc"

def fncPrintLibraryContents(ctx, listTitle):

try:

    list_object = ctx.web.lists.get_by_title(listTitle)
    folder = list_object.root_folder        
    ctx.load(folder)
    ctx.execute_query()

    files = folder.files
    ctx.load(files)
    ctx.execute_query()

    return files

except:

    print('Problem printing out library contents')   
    sys.exit(1)

then download each file by calling a proc, e.g.:

def downloadFile(ctx, fileName):

try:
    with open(fileName, "wb") as localFile:            
        relativeUrl = '/sites/{0}/Shared%20Documents/{1}'.format(site, fileName)
        response = File.open_binary(ctx, relativeUrl)
        localFile.write(response.content) 
        localFile.close()

except:

    print('Problem downloading file:', fileName)
    sys.exit(1)

myfiles = fncPrintLibraryContents(ctx, listTitle)

for myfile in myfiles:
print("Downloading file: {0}".format(myfile.properties["Name"]))
downloadFile(ctx,` myfile.properties["Name"])

Bachatero on 28 Mar 2019

👍1

pls, indent last two lines in the for loop, I can't seem to do it.
m.

Bachatero on 28 Mar 2019

Hey,

Thanks for such a quick reply. I am being able to successfully download the files, given, I have to give till the file name. But, to be able to recursively download all the files, I need to first list all the existing ones in a particular folder which after several trials, getting Not Found errors. Maybe I am going wrong somewhere, because my concept of Title is not right, so whenever I am trying to list a subfolder by giving that name as a title, I fail. I will go through your code and see if I am able to do it.

Meanwhile, my current running code (Downloading works fine, listing folders and files for root is working but whenever in Title I am giving any specific folder name other than Documents, it fails):

`from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.file import File
from office365.sharepoint.file_creation_information import FileCreationInformation

def read_folder_and_files(context, list_title):
"""Read a folder example"""
list_obj = context.web.lists.get_by_title(list_title)
folder = list_obj.root_folder
context.load(folder)
context.execute_query()
print("List url: {0}".format(folder.properties["ServerRelativeUrl"]))

files = folder.files
context.load(files)
context.execute_query()
for cur_file in files:
    print("File name: {0}".format(cur_file.properties["Name"]))

folders = context.web.folders
context.load(folders)
context.execute_query()
for folder in folders:
    print("Folder name: {0}".format(folder.properties["Name"]))

def download_file(context):
response = File.open_binary(context, "/sites/new/Shared Documents/2011-A/file1.csv")
print(response)
print(response.content)
with open(r"C:UsersaakashbDownloadstestfile1.csv", "wb") as local_file:
local_file.write(response.content)

ctx = None
url = 'https://company.sharepoint.com/sites/new'
ctx_auth = AuthenticationContext(url=url)
if ctx_auth.acquire_token_for_user(username='[email protected]', password='12345'):
ctx = ClientContext(url, ctx_auth)
read_folder_and_files(ctx, 'Documents')

print('entering function')

download_file(ctx)

print('exiting function')`

AakashBasu on 28 Mar 2019

1) Sorry for the broken structure of my code I gave you.
2) Just ran your code and checked, it is doing exactly what my code is doing in terms of listing. It is listing the files in the root (not inside any folder). But I want to do the same for folders.
3) I also want to list the folders. When I use @vgrem 's code of listing folders, it is not showing me the folders of the Documents, but showing folders like:

Folder name: SitePages
Folder name: Style Library
Folder name: _catalogs
Folder name: FormServerTemplates
Folder name: _private
Folder name: Sharing Links
Folder name: SiteAssets
Folder name: images
Folder name: Shared Documents
Folder name: Lists
Folder name: _cts

Which are none of the folders I have in the SharePoint Doc Lib.

So, in short, how can I list Doc Lib folders and their respective files to be downloaded?

AakashBasu on 28 Mar 2019

Hi,
please look at the issue here: https://github.com/vgrem/Office365-REST-Python-Client/issues/91
specifically at the line that goes like this:

folder = ctx.web.get_folder_by_server_relative_url(app_settings['urlrel'])

If it won't help then I'll get back to you to provide more details.
m.

Bachatero on 28 Mar 2019

... what I meant was using get_folder_by_server_relative_url method instead of get_by_title, e.g.

app_settings = {'urlrel': '/sites/abc/Shared Documents/TEST'}

def printFolderContents(ctx, listTitle):

try:

    #list_object = ctx.web.lists.get_by_title(listTitle)
    folder = ctx.web.get_folder_by_server_relative_url(app_settings['urlrel'])
    #folder = list_object.root_folder        
    ctx.load(folder)
    ctx.execute_query()
    #print(folder.url)

    files = folder.files
    ctx.load(files)
    ctx.execute_query()

    for myfile in files:
        print("File name: {0}".format(myfile.properties["Name"]))

except:

    print('Problem printing out library contents')   
    sys.exit(1)

Let me know if that helps ...

Bachatero on 28 Mar 2019

to download the files inside TEST folder within Shared Documents library you can for instance alter the above code to make it a function, such as:

def fncGetFolderContents(ctx, listTitle):

try:

    #list_object = ctx.web.lists.get_by_title(listTitle)
    folder = ctx.web.get_folder_by_server_relative_url(app_settings['urlrel'])
    #folder = list_object.root_folder        
    ctx.load(folder)
    ctx.execute_query()
    #print(folder.url)

    files = folder.files
    ctx.load(files)
    ctx.execute_query()

    #for myfile in files:
    #    print("File name: {0}".format(myfile.properties["Name"]))

    return files

except:

    print('Problem printing out library contents')   
    sys.exit(1)

and alter the download function a little, e.g:

def downloadFolderFile(ctx, fileName):

try:
    with open(fileName, "wb") as localFile:            
        relativeUrl = '/sites/{0}/Shared%20Documents/{1}/{2}'.format(site, yourFolder, fileName)
        #relativeUrl = app_settings['urlrel']
        response = File.open_binary(ctx, relativeUrl)
        localFile.write(response.content) 
        localFile.close()

except:

    print('Problem downloading file:', fileName)
    sys.exit(1)

myfiles = fncGetFolderContents(ctx, listTitle)

for myfile in myfiles:
print("Downloading file: {0}".format(myfile.properties["Name"]))
downloadFolderFile(ctx, myfile.properties["Name"])

Bachatero on 28 Mar 2019

Thanks a lot man! The two of you are really prompt in replies, as well as the API is absolutely awesome!

I will go through it ASAP and try to replicate. But, is there a way to list the folders? I mean, the latest code you gave will work when I know the folder name. In case I automate the process and new folder is created and files are kept, it won't work for the new folder, right? That's why I also wanted listing folder, just in-case. Anyway, the present solution should work for my use-case.

Lot of thanks to both of you. I will update here, once I run the experiment.

AakashBasu on 28 Mar 2019

Don't thank me, @vgrem is to blame :) ... and I'm not sure, maybe there are other ways of achieving the same ....

right, to list all the folders inside Shared Documents document library you may try:

    list_object = ctx.web.lists.get_by_title(listTitle)
    folder = list_object.root_folder        
    ctx.load(folder)
    ctx.execute_query()

    folders = folder.folders
    ctx.load(folders)
    ctx.execute_query()

    for myfolder in folders:
        print("File name: {0}".format(myfolder.properties["Name"]))

Bachatero on 28 Mar 2019

🎉4 ❤1

Fantastic. Iterative folder content printing and download worked!

Thank you,

AakashBasu on 29 Mar 2019

This code downloads corrupted pdf files. THey are empty - 156 bytes. Any ideas why?

mamonovayuliya on 9 Jan 2021

I am also getting corrupted pdf files with only 1kb filename by using above cosde. Any idea?

shivparashar1984 on 10 Jan 2021

I am also getting corrupted pdf files with only 1kb filename by using above code. Any idea?

I figured it out, for me the reason was the relative url. When I need to list folder content, I don't need to add /sites/sitename/library etc., it just has to be /library. But when I am downloading the files already, I need to add /sites/sitename/folder/file.

This is really weird, because I still can access and download files without adding /sites/sitename/, but the content is corrupted then. At the same time, if I add /sites/sitename/ when I am getting folder content, it throws an error, and only works if I start relative url with a library.

It is weird that every single resource suggests to add /sites/sitename to relative url for both folder content and file content.

mamonovayuliya on 11 Jan 2021

Thanks for suggestion. can you share final working code . If we want to download all contents of subfolder like /sites/sitename/Documents/somefolder then what would be final code?

shivparashar1984 on 11 Jan 2021

Thanks guys. This helps solve a lot of problems and issues faced while using the Sharepoint package.