(Post courtesy Partner Solution Consultant Andre Kieft)
It has been a while since I created a blog post, but recently I received a lot of questions and requests for advice on how to migrate file shares to SharePoint and use SkyDrive Pro (SDP). So I figured to create a blog post with the things you need to consider as a Small and Medium Business (SMB) partner when you are planning to migrate file share content into SharePoint and want to make use of SDP for synchronizing the SharePoint content offline.
Note: that these steps are both valid for SharePoint 2013 on-premises (on-prem) and SharePoint Online (SPO).
Step 1 – Analyze your File Shares
As a first step, try to understand the data that resides on the file shares. Ask yourself the following questions:
- What is the total size of the file share data that the customer wants to migrate?
- How many files are there in total?
- What are the largest file sizes?
- How deep are the folder structures nested?
- Is there any content that is not being used anymore?
- What file types are there?
Let me try to explain why you should ask yourself these questions.
If the total size of the file shares are more that the storage capacity that you have on SharePoint, you need to buy additional storage (SPO) or increase your disk capacity (on-prem). To determine how much storage you will have in SPO, please check the Total available tenant storage in the tables in this article. Another issues that may arise is that in SharePoint is that you reach the capacity per site collection. For SPO that is 100 Gigabyte, for on-premises the recommended size per site collection is around 200 Gigabyte. This would automatically mean that the content database is around 200 Gigabyte, which is the recommended size. Thought you can stretch this number up in on-prem, it is not recommended.
So, what should I do when my customer has more than 100 Gigabyte?
- Try to divide the file share content over multiple site collections when it concerns content which needs to be shared with others.
- If certain content is just for personal use, try to migrate that specific content into the personal site of the user.
How Many Files
The total amount of files on the file shares is important as there are some limits in both SharePoint as well as SDP that can result in an unusable state of the library or list within SharePoint but you also might end up with missing files when using the SDP client.
First, in SPO we have a fixed limit of 5000 items per view, folder or query. Reasoning behind this 5000 limit boils all the way down to how SQL works under the hood. If you would like to know more about it, please read this article. In on-prem there is a way to boost this up, but it is not something we recommend as the performance can significantly decrease when you increase this limit.
Secondly for SDP there is also a 5000 limit for synchronizing team sites and 20000 for synchronizing personal sites. This means that if you have a document library that contains more that 5000 items, the rest of the items will not be synchronized locally.
There is also a limit of 5 million items within a document library, but I guess that most customer in SMB won’t reach that limit very easily.
So, what should I do if my data that I want to migrate to a document library contains more than 5000 items in one folder?
- Try to divide that amount over multiple subfolders or create additional views that will limit the amount of documents displayed.
But wait! If I already have 5000 items in one folder, doesn’t that mean that the rest of the other document won’t get synchronized when I use SDP?
Yes, that is correct. So if you would like to use SDP to synchronize document offline, make sure that the total amount of documents per library in a team site, does not exceed 5000 documents in total.
So, how do I fix that?
- Look at the folder structure of the file share content and see if you can divide that data across multiple sites and/or libraries. So if there is a folder marketing for example, it might make more sense to migrate that data into a separate site anyway, as this department probably wants to store additional information besides just documents (e.g. calendar, general info about the marketing team, site mailbox etc). An additional benefit of spreading the data over multiple sites/libraries is that it will give the SDP users more granularity about what data they can take offline using SDP. If you would migrate everything into one big document library (not recommended), it would mean that all users will need to synchronize everything which can have a severe impact on your network bandwidth.
Largest File Sizes
Another limit that exists in SPO and on-prem is the maximum file size. For both the maximum size per file is 2 Gigabyte. In on-prem the default is 250 MB, but can be increased to a maximum of 2 Gigabyte.
So, what if I have files that exceed this size?
- Well, it won’t fit in SharePoint, so you can’t migrate these. So, see what type of files they are and determine what they are used for in the organization. Examples could be software distribution images, large media files, training courses or other materials. If these are still being used and not highly confidential, it is not a bad thing to keep these on alternative storage like a SAN, NAS or DVDs. If it concerns data that just needs to be kept for legal reasons and don’t require to be retrieved instantly, you might just put these on DVD or an external hard drive and store them in a safe for example.
Another important aspect to look at on your file shares is the depth of nested folders and file length. The recommended total length of a URL in SharePoint is around 260 characters. You would think that 260 characters is pretty lengthy, but remember that URLs in SharePoint often has encoding applied to it, which takes up additional space. E.g. a space is one character but in Unicode this a %20, which takes up three characters. The problem is that you can run into issues when the URL becomes to large. More details about the exact limits can be found here, but as a best practice try to keep the URL length of a document under 260 characters.
So, what if I have files that will have more than 260 characters in total URL length?
- Make sure you keep your site URLs short (the site title name can be long though). E.g. don’t call the URL Human Resources, but call it HR. If you land on the site, you would still see the full name Human Resources as Site Title and URL are separate things in SharePoint.
- Shorten the document name (e.g. strip of …v.1.2, or …modified by Andre), as SharePoint has versioning build in. More information about versioning can be found here.
When migrating file shares into SharePoint is often also a good momentum to clean up some of the information that the organization has been collecting over the years. If you find there is a lot of content which is not been accessed for a couple of years, what would be the point of migrating that data it to SharePoint?
So, what should I do when I come across such content?
- Discuss this with the customer and determine if it is really necessary to keep this data.
- If the data cannot be purged, you might consider storing it on a DVD or external hard drive and keep it in a safe.
- If the content has multiple versions, such as proposal 1.0.docx, proposal 1.1.docx, proposal final.docx, proposal modified by Andre.docx, you might consider just moving the latest version instead of migrating them all. This manual process might be time consuming, but can safe you lots of storage space in SharePoint. Versioning is also something that is build into the SharePoint system and is optimized to store multiple versions of the same document. For example, SharePoint only stores the delta of the next version, saving more storage space that way. Note that this functionality is only available in SharePoint on-prem.
Types of Files
Determine what kind of files the customer is having. Are they mainly Office documents? If so, then SharePoint is the best place to store such content. However, if you come across developers code for example, it is not a good idea to move that into SharePoint. There are also other file extensions that are not allowed in SPO and/or on-prem. A complete list of blocked file types for both SPO and on-prem can be found here.
So, what if I come across such file extensions?
- Well, you can’t move them into SharePoint, so you should either ask yourself, do I still need these files? And if so, is there an alternative storage facility such as a NAS, I can store these files on? If it concerns developer code, you might want to store such code on a Team Foundation Service Server instead.
Tools for analyzing and fixing file share data
In order to determine if you have large files or exceed the 5000 limit for example, you need to have some kind of tooling. There are a couple of approaches here.
- First off, there is a PowerShell script that has been pimped up by a German colleague Hans Brender, which checks for blocked file types, bad characters in files and folders and finally for the maximum URL length. The script will even allow you to fix invalid characters and file extensions for you. It is a great script, but requires you to have some knowledge about PowerShell. Another alternative I was pointed at is a tool called SharePrep. This tool does a scan for URL length and invalid characters.
- Secondly there are other 3rd party tools that can do a scan of your file share content such as Treesize. However such tools do not necessarily check for the SharePoint limitations we talked about in the earlier paragraphs, but at least they will give you a lot more insight about the size of the file share content.
- Finally there are actual 3rd party migration tools that will move the file share content into SharePoint, but will check for invalid characters, extensions and URL length upfront. We will dig into these tools in Step 2 – Migrating your data.
Step 2 – Migrating your data
So, now that we have analyzed our file share content, it is time to move them into SharePoint. There are a couple of approaches here.
Open with Explorer
If you are in a document library you can open up the library in the Windows Explorer. That way you can just do a copy and paste from the files into SharePoint.
But, there are some drawbacks using this scenario. First of all, I’ve seen lots of issues trying to open up the library in the Windows Explorer. Secondly, the technology that is used for copying the data into SharePoint is not very reliable, so keep that in mind when copying larger chunks of data. Finally there is also drag & drop you can use, but this is only limited to files (no folders) and only does a maximum of 100 files per drag. So this would mean if you have 1000 files, you need to drag them 10 times in 10 chunks. More information can be found in this article. Checking for invalid characters, extensions and URL length upfront are also not addressed when using the Open with Explorer method.
Pros: Free, easy to use, works fine for smaller amounts of data
Cons: Not always reliable, no metadata preservations, no detection upfront for things like invalid characters, file type restrictions, path lengths etc.
You could also use SDP to upload the data into a library. This is fine as long as you don’t sync more than 5000 items per library. Remember though that SDP is not a migration tool, but a sync tool, so it is not optimized for large chunks of data to be copied into SharePoint. Things like character and file type restrictions, path length etc. is on the list of the SDP team to address, but they are currently not there.
The main drawbacks of using either the Open in Explorer option or using SDP is that when you use these tools, they don’t preserve the metadata of the files and folder that are on the file shares. By this I mean, things like the modified date or owner field are not migrated into SharePoint. The owner will become the user that is copying the data and the modified date will be the timestamp of the when the copy operation was executed. So if this metadata on the files shares is important, don’t use any of the methods mentioned earlier, but use one of the third party tools below.
Pros: Free, easy to use, works fine for smaller amounts of data (max 5000 per team site library or 20000 per personal site)
Cons: No metadata preservations, no detection upfront for things like invalid characters, file type restrictions, path lengths etc.
3rd party tools
Here are some of the 3rd party tools that will provide additional detection, fixing and migration capabilities that we mentioned earlier:
- http://www.syntergy.com/products/sharepoint/more/bulkloader (now Metalogics)
(Thx to Raoul for pointing me to additional tools)
The list above is in random order, where some have a focus on SMB, while other more focused on the enterprise segment. We can’t speak out any preference for one tool or the other, but most of the tools will have a free trial version available, so you can try them out yourself.
So, when should I use what approach?
Here is a short summary of capabilities:
|Open in Explorer||SkyDrive Pro||3rd party|
|Amount of data||Relatively small||No more than 5000 items per library||Larger data sets|
|Invalid character detection||No||No||Mostly yes1|
|URL length detection||No||No||Mostly yes1|
|Metadata preservation||No||No||Mostly yes1|
|Blocked file types detection||No||No||Mostly yes1|
1This depends on the capabilities of the 3rd party tool.
SDP gives me issues when synchronizing data
Please check if you have the latest version of SDP installed. There have been stability issues in earlier released builds of the tool, but most of the issues should be fixed by now. You can check if you are running the latest version, by opening up Word-> File-> Account and click on Update Options-> View Updates. If your current version number is lower than the one you have, click on the Disable Updates button (click yes if prompted), then click Enable updates (click yes if prompted). This will force downloading the latest version of Office and thus the latest version of the SDP tool.
If you are running the stand-alone version of SDP, make sure you have downloaded the latest version from here.
Why is the upload taking so long?
This really depends on a lot of things. It can depend on:
- The method or tool that is used to upload the data
- The available bandwidth for uploading the data. Tips:
- Check your upload speed at http://www.speedtest.net and do a test for your nearest Office 365 data center. This will give you an indication of the maximum upload speed.
- Often companies have less available upload bandwidth then people at home. If you have the chance, uploading from a home location might be faster.
- Schedule the upload at times when there is much more bandwidth for uploading the data (usually at night)
- Test your upload speed upfront by uploading maybe 1% of the data. Multiply it by 100 and you have a rough estimate of the total upload time.
- The computers used for uploading the data. A slow laptop can become a bottle neck while uploading the data.
If you feel that there are things missing here, please let me know and I’ll try to add them to this blog post.