Press enter to see results or esc to cancel.

New module: Sitecore Shrink

I have just released the first version of my new Sitecore module! Shrink is a Sitecore utility module that gives you insight in the usage of your media library. Pretty much like a disk usage statistics viewer for your hard drives.

get insight in the usage of your media library items

But next to that, it also shows you which items are actually being used and published, so you can easily find media items that unnecessarily take up space in your database. And last but not least, it offers you multiple ways of cleaning up your media library!

dashboard

For a live demo of this module check out this video on YouTube: https://www.youtube.com/watch?v=qH4gbNJcXU4 !

Why?

I have been working on Sitecore projects for many years and we have multiple customers who saw their databases grow over the course of over 5 years of using their constantly evolving and upgraded Sitecore implementation. It’s great that we can keep those platforms evolving and that we have sites running on Sitecore for this long, but the downside is that this database growth has a negative impact on the agility of your platform. Both in speed or performance related issues, as in the Continuous Integration process, like with a database backup, rolling back deployments or rolling back content to your QA or development environment.

Of course, other solutions exist to keep your database small, like storing your media on disk and / or exposing it via a CDN, but I figured maybe more Sitecore users or implementation partners face the same issue on existing implementations. Or just want to do the annual spring cleaning on their not so huge databases to keep ‘m nice and organized!

This blog post mainly focusses on the design and development process. Please check out the README file of my GitHub repository if you are looking for details on how to install and configure this module. If you want to dive into the code itself and if you have suggestions for me on how to improve or extend my module, I would be very happy to hear from you! The best way to contact me in that case is via my Twitter account @rhabraken. This also goes for reporting issues, since this module is rather new, it may behave differently in situations that are new to me.

A Sitecore Job

I started out with developing all the features against the Sitecore API that my module would require, without having a graphical user interface at all. Because all of the actions potentially require a long time to execute, I then rewrote all actions to be executed as a Sitecore Job. I did several test runs and a vanilla Sitecore database took me a few seconds to scan, or less, and a rather well-aged Sitecore database (master only!) being 31 Gigabytes in size, accessing it remotely from my laptop while it was running as a copy of the production database, scaled in Azure to only a S1-database, took me up to 6 hours to scan. And it immediately proved me right: over half of the 24.5 Gb sized media library wasn’t used at all. So there’s your potential improvement of database portability, indexing times, index sizes, index lookup times and more.

Running code as a Sitecore job is surprisingly easy and straightforward:

Having finished the heart of my module, beating healthy next to my Sitecore installation, I took the plunge to transform this into a useful module. Using SPEAK. First by designing my user interface, then by thinking out how to handle the collected data and then by challenging myself to get more out of SPEAK than I have ever done before :).

For me, designing user interfaces, nothing beats the old pen and paper technique:

shrink-gui-design

JSON file storage

Since scanning the media library and generating the report can take a very long time, depending on the size of your database, I did need to find a way on how to store my data persistently. I considered a custom SQL database or table, but this feels clunky and cumbersome to deploy and maintain. Someone suggested using MongoDB, and this seemed a very elegant solution to me, but it would’ve taken me quite some time to implement and it also ties me to a minimum supported Sitecore version of 7.5. And most databases that have grown too large over time are older than that.

So I moved on and then thought: why not serialize my objects into a JSON format and write them to disk? A very quick solution and the best one in terms of portability too. This worked so well that I even switched to using these JSON files as a communication proxy between the C# code of my module and the JavaScript code of the SPEAK application. There is no direct data flow from my compiled code towards the SPEAK application: the JavaScript code uses WebAPI calls to request different types of actions, the C# library generates the report, serializes it to disk in JSON format and the JavaScript components will pick up the new files from disk subsequently. This turned out to be a nice separation and makes the application easier to maintain too. And the humongous database I mentioned earlier only generated 4.8 MB of JSON (uncompressed).

Custom SPEAK components

I did not use default SPEAK components for both the charts and the treeview, because they didn’t meet my requirements. The more complex chart types like the interactive and zoomable sunburst chart and the interactive donut chart are simply not available. The DoughnutChart would have come close to the donut chart I am using but still lacks functionallity I’m looking for, like a click event and the highlighted slice data. And the ItemTreeView component doesn’t let you filter items and has some twerks I wanted to get around. So that’s why I created a custom SPEAK component for my charts, using the same D3 visualization library as Sitecore does, and why I copied and extended the default ItemTreeView component of Sitecore.

Component 1: MediaUsageCharts

This is a rather straight forward JqueryUiUserControl that contains some simple HTML and an initialize function, together with functions to build up the actual charts, but that’s a ‘default’ D3 task. I have added WebAPI calls to help me out with some item related tasks (like figuring out the path or selecting a subset of the total list of items), which are far easier done in C# using LINQ statements.

The challenge came when I wanted to actually alter attributes of other SPEAK components on the same page, triggered upon a user interaction within my custom component. That wasn’t documented anywhere where I could find it, and reverse engineering existing SPEAK stuff didn’t help me out either. I have an onclick event attached to my D3 charts, but within that scope, you cannot reach the other components (and it would be bad design if you could, because that would made my custom component depending on other components). I really wanted to define an event within my custom SPEAK component and add a callback to it from my PageCode, but I couldn’t find how default SPEAK solves this issue and I’m still looking for that. If you happen to know this and if can point me in the right direction, that would be great!

At long last I came up with a simple solution, that still somehow feels like a workaround, but it works like a charm. I have defined a custom attribute within my component, called appContext:

And upon initialization of the PageCode, I pass a reference to myself (the PageCode) to my custom chart component via this attribute:

I then added a method in my PageCode to handle the click event and do things with other components (so the PageCode is still the only one accessing all different components and the custom chart component doesn’t have to know about the other components):

Lastly, within the custom chart component, I can call this method via the custom attribute that stores a reference to my PageCode context:

The funny thing is, that albeit having the app context available within all other events within the PageCode file, calling it from the chart component just loses this context and I had to send it back and forth. I find this solution somewhat elegant, but are open to change this if anyone comes with a better solution. Or maybe it helps you out when you hit the same issue.

Component 2: SelectableTreeView

I wanted my treeview to be able to do two things: select a subset of items starting at a certain point in my media library folder structure when clicking on a folder or item in the sunburst chart, and filtering the items based on the different analysis categories displayed by the donut charts. But both pre-selecting items as well as filtering items isn’t possible with the default component. Luckily, I stumbled upon this blog where Nikola Gotsev shows how to copy and modify the default treeview component: https://sitecorecorner.com/2014/12/18/speak-ui-filterable-treeview/. Filtering on templates looks a lot like filtering on specific items, so this was a very valuable resource for me!

I ended up adding an extra field to the template to store the items to filter on (“itemsToDisplay”) and use that for filtering items returned by the component. This filter replaces the hiddenItems filter, because I never want to hide any items for that would make my sunburst chart out of sync:

Furthermore,  the default item displays strange behavior in my opinion, when selecting items up front: when you select a folder that is folded (collapsed), a checkmark is shown, not a filled square indicating a partial selection of the item and its children. And when you just fold it out, the UI shows you all children as being selected, but upon reading the “checkedItemIds” attribute, it turns out those items aren’t actually selected. Quite confusing for the end user I’d say. To prevent any confusion and to let the user decide what to select, I disasbled the following line of code in the “appendLoadedChildren” function, which cancels the behavior of falsly copying the checked state onto the child items of the previously selected parent node:

Lastly, I used this helpful post on the community site to help me out with pre-selecting the correct root from my JavaScript PageCode file: https://community.sitecore.net/developers/f/8/t/799.

So in the end, I got my treeview component to do exactly what I wanted, although copying the code didn’t feel very rigid and I might have some work upon future Sitecore upgrades, but at least I’m still using 99% default SPEAK stuff and my changes are documented very well.

Oh, and one more thing… the pre-selecting of the correct root as mentioned in the above community article only works once. Because dynatree can only have one active node and switching the active node (finding it and deactivating it) turned out to be buggy, even more with possible intervention from the user. So I decided to reload the tree before navigating to a completely different part of the tree. But the tree.reload() function of dynatree doesn’t work for the ItemTreeView implementation of Sitecore, it adds an invalid root node upon each call. So I created my own simple reload function, removing all children form the root and re-initializing the tree. Quick and reliable:

Application flow: switching screens

Because all processes that you can start, like scanning, archiving or downloading, take some time, but also invalidate the current usage report, I’ve added a switch to a progress view. I simply implemented this using multiple border components and switching visibility between them. This is easier and better maintainable than redirecting between different SPEAK pages.

Auto-update

If you delete items from the media library using Shrink, it automatically updates its JSON data store. So as long as you are using the module when cleaning up, you do not have to execute a re-scan of the whole media library to learn what the effect of your actions is. However, if you delete items in any other way, Shrink doesn’t get notified (yet). So once in a while you need to perform a total re-scan to keep your report up-to-date and accurate. One of my ideas for future releases is to hook into the item:deleted event to update my JSON data store continuously.

Features & functionality

The following paragraphs give a quick overview of the functionality my module offers.

Sunburst chart

The sunburst chart shows you the relative file sizes of the different folders in your media libarry. You can hover over folders and files to see their item name and file size, and you can zoom in on folders too. Clicking on a folder does not only zoom in your view, but also opens up the corresponding folder in the tree view, to show you the actual items and folder structure for that part.

Donut charts

The donut charts show you the most important metrics of the analysis, showing you how much of your media items are not being used (not referenced by any items at all) or not being published to any of your publishing targets. It also gives you an overview of how many items have old versions within your media library.

Clicking on one of the slices of the donut charts filters the tree view to only show you the items of that category. So if you want to browse through all unreferenced items, just click on that slice of the corresponding donut chart!

Archive, Recycle, Delete

The best explanation on the different options for cleaning up your items comes from John West in his blog post https://community.sitecore.net/technical_blogs/b/sitecorejohn_blog/posts/archiving-recycling-deleting-and-restoring-items-and-versions-in-the-sitecore-asp-net-cms: “Archive data that you want to keep (for example, for audit purposes); recycle data that you may want to restore; delete data that you want to remove. For optimal performance and usability, recycle or remove as much data as you can, and archive whatever else you do not need in the Master database.”

Please keep in mind, that archiving and recycling only speeds up your queries, because it makes your indexes smaller, but if you really want to cut down the size of your database, only deleting items is going to help you out.

Download

This is actually the best way to clean up your media library without the risk of losing any data. If you choose to download the selected media items, Shrink will download all of them to your Content Management server in the exact same folder structure as in the media library, creating a local backup of your media items. If you select the deletion checkbox, all items will be deleted directly after downloading them, cleaning up your media library as you go.

Delete old versions

This task deletes any version other than the latest and the active one.

Expand

Because the default behavior of the treeview component is to only select items that are visible (i.e. not those within folded folders), it is a lot of work to open all of those folders. I’ve added an expand button that expands all folders currently visible. This makes it quicker to expand the tree in steps.

Treeview

The treeview displays the items and folder structure of your media library and lets you select the items you want to archive, recycle, download or delete. It also responds to the selected category in the donut charts, displaying only those items that are being referenced, or are not published at all for example. This helps you in analyzing your media library.

After using Sitecore Shrink

Mind that you need to publish (parts of) the media library after cleaning up to also clean up your web database or other publishing targets. And next to that, you might want to clean up your database by removing orphaned blobs (via the Sitecore Control Panel or using a SQL query) and do a database file shrink from your SQL Server Management Studio to release the freed and now unallocated space within your SQL database.

Future releases

For future releases, next to improving the existing media library usage statistics and clean up actions, I may extend this module with additional utilities to enhance the performance of your Sitecore implementations, by offering additional analyzing or clean up features like optimizing the templates, content structure or maybe even the size of your rendered Sitecore pages.

Important notes!

  • Please note that this module is rather new and only tested on a small number of databases and Sitecore instances. Please backup your original database before using this module and preferably test the module for your situation on any other than your production environment, to make sure no data is lost unintentionally upon using it.
  • Please do not install this module on a Content Delivery server. It is intended for the Content Management server role only and the different Web API calls to delete items could be potentially harmful on an exposed Content Delivery server.

Edit: my module is now available from Sitecore’s Marketplace too via https://marketplace.sitecore.net/Modules/S/Sitecore_Shrink.aspx

Comments

3 Comments

Andre

Hi Rob,

(I originally wrote this message on the Sitecore marketplace but hopefully you will see this here)

I have a question about your Sitecore Shrink utility. What happens if your CD and CM are on the same server? We have a non-production server that runs both roles and I would like to test out Shrink there. Would there be any anticipated issues or caveats in doing so? Please let me know, thanks.

Andre

Rob Habraken

Hi Andre,

great that you want to test the Shrink module in your environment! And sorry I didn’t reply to the message on the Marketplace, must have missed the email notification.

Running it on a combined CD/CM or single server setup is perfectly possible, only I would be considering it as a security vulnerability, since it contains WebAPI calls to actually delete content based on the Sitecore item GUID. So, if you run this module on an exposed server, please make sure the folders of the WebAPI calls are protected, for example via IP security. If you are running it on a non-production server that isn’t publicly exposed anyway, than that would be of no concern of course.

Let me know how the module works out for you and if you run into any issues!

Regards, Rob

Andre

Hi Rob,

Thanks for the response, I hope to try this out soon with my team and let you know the result.

Andre


Leave a Comment

Rob Habraken

Software Engineer, Technical Manager, Consultant, Sitecore MVP and overall technology addict. Specialist in web development, Microsoft technology and, of course, Sitecore.

https://www.robhabraken.nl