The data scientists behind the scenes and how they put a spotlight on dark data
Over the past 18 months I have learned a lot about analytics and big data, especially applied to the workforce. I spend a fair amount of my time speaking to customers, analysts and the media and besides the most common question of “do you have any examples you can share” I get asked about the people involved and process of developing these applications. In the spirit of “people are the most important resource a company has” I want to showcase that side of big data.
For timekeeping and scheduling, dark data…( to save you a quick trip to your favorite search engine, I mean data that is collected but not typically used, yet is still required) ….in this case the audit trails of any change made to a timecard or schedule.
In Kronos there are sixteen different types of edits you can make to a timecard or schedule. Each of these edits represents tiny trades of time and money between an employee and the company. Individually most are inconsequential, but in aggregate they represent tens or hundreds of millions of dollars. By and large most of these changes are transactions that everyone agrees to and are necessary… The employee forgot to clock in so the supervisor adds in an “in punch”. Or an employee calls in sick and the supervisor changes a paycode from regular to sick in the schedule.
Occasionally however there are situations where the changes are indicative of an issue. For example, a supervisor changes a couple of minutes around during the week on an employee’s time card and eliminates premium pay. Or a supervisor changes a schedule after the fact to represent that an employee only worked the hours they were scheduled.
These small changes are usually lost in the millions of annual transactions that occur throughout the year. And because they are so small they are usually missed by most reports and audit teams. Only when the employee affected has the courage to speak up does a company become aware of it. By this time the consequences for all involved are significant; from degraded morale on the part of the employee to unnecessary cost in terms of productivity, turnover and financial impact for the company.
As the economy improves and companies feel the pain of turnover and lost performance when employee engagement sags, we have been engaged by companies to understand how they can identify these situations. The companies know the answer is in the data because when someone files a grievance and points out the specific situation and dates, the HR department can immediately see what happened in the transactions.
The challenge is seeing these changes sooner; especially before someone is so frustrated they file a grievance or the behavior becomes obvious to all. This is where one of our data scientists who has a PhD in computer science realized that this is a very similar challenge to what retailers face when they are trying to understand what the millions of customer clicks represent on a website. The customers aren’t telling them why they are clicking the way they do and only a fraction of the clicks result in an order.
So the data scientist applied the same machine learning techniques on timekeeping data that retailers use when they analyze their web server logs. The result of his work however was very difficult to interpret unless you understood machine learning and clustering techniques. To simplify this we had one of our visualization experts re-imagine the output in a way that a lay person could understand. Her interpretation was amazing in its simplicity!
Secondly, the data scientist had created a very flexible tool. The first prototype had a number of tuning parameters requiring the user to take output from past results and enter it in to help weight certain parameters for future analysis. We recognized that aside from a data scientist, we couldn’t expect a typical business user to be able to perform this tuning. So we focused what the tool could do and eliminated the tuning parameters.
An example of the machine learning dashboard in Workforce Auditor
We were very nervous and excited about analyzing our first data set (we went in without knowing anything about the customer or their practices to ensure we didn’t bias the analysis). When we researched the results, there it was…we had found an issue that was previously unknown to that company. We tried it a second, third and fourth time. Each time we found something important to the customer that they suspected but couldn’t prove or they were completely unaware of. These small changes were indicative of million dollar + issues that were looming for these companies but had now been avoided….very exciting stuff.
We found supervisors gaming schedules to improve their own bonuses (the company since tweaked the rules of the bonus). We found a store manager working extremely hard to rebuild her schedule each week because the forecast and automated schedule she received was off (the company immediately re-tuned the forecast for her store). There were many more examples and we realized that we had developed quite a versatile tool. Its power is that it can evaluate the actions of thousands of employees and narrow it down to just a handful of situations that require further investigation in a matter of minutes.
With so many positive results we fast tracked the technology. It’s now available as Workforce Auditor and is included with our Workforce Analytics platform.
Take a ways from this experience?
1) Skills and experience really count in developing big data applications, no one is going from “excel guru” to building a machine learning application overnight and it takes multiple people to get it right
2) involving (internal or external) customers and their data is essential; no one could ever build this without deep domain knowledge and many different data sets to trial
3) By focusing on the business problem rather than the technology we created something that was streamlined and easy to use rather than a feature laden product showcasing the power of machine learning.
When I have a little more time to write, I want to share how the newest member of our team used scheduling data and a network map to uncover undisclosed relationships in a company and what it was costing them….stay tuned!