Tag Archives: data scientist

4 Ways to Avoid Being a Foolish Leader

April Fools’ Day is just one day a year, but there are common mistakes an insight leader is prone to (and that could end up making him look like a fool) all year ’round.

This isn’t surprising when you consider the breadth of responsibility within the customer insight leadership role. Such leaders have multi-disciplinary technical teams to manage and an increasing demand across from areas of modern business to improve decisions and performance.

Like most of the lessons I’ve learned over the years, the following has come from getting it wrong myself first. So, there’s no need for any of my clients or colleagues to feel embarrassed.

Beyond the day of pitfalls for the gullible, then, here are four common — but foolish — mistakes I see customer insight leaders still making.

1. Leaving data access control with IT

Data ownership and data management are not the sexiest responsibilities up for grabs in today’s organizations. To many, they appear to come with a much greater risk of failure or at least blame than any potential reward. However, this work being done well is often one of the highest predictors of insight team productivity.

Ask any data scientist or customer analyst what they spend most of their time doing, and the consistent answer (over my years of asking such questions) is “data prep.” Most of the time, significant work is needed to bring together the data needed and explore, clean and categorize it for any meaningful analysis.

But, given the negative PR and the historical role of IT in this domain, it can be tempting for insight leaders to leave control of data management with IT. In my experience, this is almost always a mistake. Over decades (of often being unfairly blamed for anything that went wrong and that involved technology), IT teams and processes have evolved to minimize risk. Such a controlled (and, at times, bureaucratic) approach is normally too slow and too restrictive for the demands of an insight team.

I’ve lost count of how many capable but frustrated analysts I have met over the years who were prevented from making a difference because of lack of access to the data needed. Sometimes the rationale is data protection, security or even operational performance. At the root, customer insight or data science work is, by nature, exploratory and innovative, and it requires a flexibility and level of risk that run counter to IT processes.

See also: 3 Skills Needed for Customer Insight

To avoid this foolish mistake, I recommend insight leaders take on the responsibility for customer data management. Owning flexible provision of the data needed for analysis, modeling, research and database marketing is worth the headaches that come with the territory. Plus, the other issues that come to light are well worth insight leaders knowing well — whether they be data quality, data protection, or something regulation- or technology-related. Data leadership is often an opportunity to see potential issues for insight generation and deployment much earlier in the lifecycle.

2. Underestimating the cultural work needed to bring a team together

Data scientists and research managers are very different people. Data analysts, working on data quality challenges, see the world very differently from database marketing analysts, who are focused on lead performance and the next urgent campaign. It can be all too easy for a new insight leader to underestimate these cultural differences.

Over more than 13 years, I had the challenge and pleasure of building insight teams from scratch and integrating previously disparate technical functions into an insight department. Although team structures, processes and workflows can take considerable management time to get working well, I’ve found they are easy compared with the cultural transformation needed.

This should not be a surprise. Most research teams have come from humanities backgrounds and are staffed by “people people” who are interested in understanding others better. Most data science or analysis teams have come from math and science backgrounds and are staffed by “numbers people” who are interested in solving hard problems. Most database marketing teams have come from marketing or sales backgrounds and are more likely to be motivated by business success and interested in proving what works and makes money. Most data management teams have come from IT or finance backgrounds and are staffed by those with strong attention to detail, who are motivated by technical and coding skills and who want to be left alone to get on with their work.

As you can see, these types of people are not natural bedfellows. Although their technical expertise is powerfully complementary, they tend to approach each other with natural skepticism. Prejudices that are common in society and education often fuel both misunderstanding and a reluctance to give up any local control to collaborate more. Many math and science grads have grown up poking fun at “fluffy” humanities students. Conversely, those with a humanities background and strong interest in society can dismiss data and analytics folk as “geeky” and as removed from the real world.

So, how can an insight leader avoid this foolish oversight and lead cultural change? There really is no shortcut to listening to your teams, understanding their aspirations/frustrations/potential and sharing what you learn to foster greater understanding. As well as needing to be a translator (between technical and business languages), the insight leader also needs to be a bridge builder. It’s worth remembering classic leadership lessons such as “you get what you measure/reward,” and “catch people doing something right.” So, ensure you set objectives that require cooperation and recognize those who pioneer collaboration across the divides. It’s also important to watch your language as a leader — it should be inclusive and value all four technical disciplines.

3. Avoiding commercial targets because of lack of control

Most of us want to feel in control. It’s a natural human response to avoid creating a situation where we cannot control the outcome and are dependent on others. However, that is often the route to greater productivity and success in business.

The myth still peddled by testosterone-fueled motivational speakers is that you are the master of your own destiny and can achieve whatever you want. Collaboration, coordination and communication are key to making progress in the increasingly complex networks in today’s corporations. For that reason, many executives are looking for those future leaders who have a willingness to partner with others and to take risks to do so.

Perhaps it is particularly the analytical mindset of many insight leaders that makes them painfully aware of how often a target or objective is beyond their control. When a boss or opportunity suggests taking on a commercial target, what strikes many of us (at first) is the implied dependency on other areas to deliver, if we are to achieve it.

See also: The Science (and Art) of Data, Part 1

For that reasons, many people stress wanting objectives that “measure what they can control’.” Citing greater accountability and transparency for their own performance can be an exercise in missing the point. In business life, what customer insights can produce on their own is a far-smaller prize than what can be achieved commercially by working with other teams. Many years ago, I learned the benefit of “stepping forward” to own sales or marketing targets as an insight leader. Although many of the levers might be beyond my control, the credibility and influencing needed were not.

Many insight leaders find they have greater influence with their leaders in other functions after taking such a risk. Being seen to be “in this together” or “on the spike” can help break down cultural barriers that have previously prevented insights being acted upon and that generate more profit or improve more customers’ experiences.

4. Not letting something fail, even though it’s broken

A common gripe I hear from insight leaders (during coaching or mentoring sessions) is a feeling of suffering for “not dropping the ball.” Many are working with disconnected data, antiquated systems, under-resourced teams and insufficient budgets. Frankly, that is the norm. However, as aware as they are of how much their work matters (because of commercial, customer and colleague impact), they strive to cope. Sometimes, for years, they and their teams work to manually achieve superhuman delivery from sub-human resources.

But there is a sting in the tale of this heroic success. Because they continue to “keep the show on the road,” their pleas for more funds, new systems, more staff or data projects often fall on deaf ears. From a senior executive perspective (used to all the reports needing more), the evidence presents another “if it ain’t broke, don’t fix it” scenario. They may empathize with their insight leader but also know they are managing to still deliver what’s needed. So, requests get de-prioritized.

In some organizations, this frustration can turn to resentment when insight leaders see other more politically savvy leaders get investment instead. Why were they more deserving? They just play the game! Well, perhaps its time for insight leaders to wake up and smell the coffee. Many years ago, I learned you have to choose your failures as well as your successes. With the same caution with which you choose any battles in business, it’s worth insight leaders carefully planning when and where to “drop the ball.”

How do you avoid this foolish mistake? Once again, it comes back to risk taking. Let something fail. Drop that ball when planned. Hold your nerve. If you’ve built a good reputation, chances are it will also increase the priority of getting the investment or change you need. You might just be your own worst enemy by masking the problem!

Phew, a longer post than I normally publish here or on Customer Insight Leader. But I hope those leadership thoughts helped.

Please feel free to share your own insights. Meanwhile, be kind to yourself today. We can all be foolish at times….

Data Science: Methods Matter (Part 2)

What makes data science a science?

Methodology.

When data analytics crosses the line with simple formulas, much conjecture and an arbitrary methodology behind it, it often fails in what it was designed to do —give accurate answers to pressing questions.

So at Majesco, we pursue a proven data science methodology in an attempt to lower the risk of misapplying data and to improve predictive results. In Methods Matter, Part 1, we provided a picture of the methodology that goes into data science. We discussed CRISP-DM and the opening phase of the life cycle, project design.

In Part 2, we will be discussing the heart of the life cycle — the data itself. To do that, we’ll take an in-depth look at two central steps: building a data set, and exploratory data analysis. These two steps compose the phase that is  extremely critical for project success, and they illustrate why data analytics is more complex than many insurers realize.

Building a Data Set

Building a data set, in one way, is no different than gathering evidence to solve a mystery or a criminal case. The best case will be built with verifiable evidence. The best evidence will be gathered by paying attention to the right clues.

There will also almost never be just one piece of evidence used to build a case, but a complete set of gathered evidence — a data set. It’s the data scientist’s job to ask, “Which data holds the best evidence to prove our case right or wrong?”

Data scientists will survey the client or internal resources for available in-house data, and then discuss obtaining additional external data to complete the data set. This search for external data is more prevalent now than previously. The growth of external data sources and their value to the analytics process has ballooned with an increase in mobile data, images, telematics and sensor availability.

See also: The Science (and Art) of Data, Part 1

A typical data set might include, for example, typical external sources such as credit file data from credit reporting agencies and internal policy and claims data. This type of information is commonly used by actuaries in pricing models and is contained in state filings with insurance regulators. Choosing what features go into the data set is the result of dozens of questions and some close inspection. The task is to find the elements or features of the data set that have real value in answering the questions the insurer needs to answer.

In-house data, for example, might include premiums, number of exposures,    new and renewal policies and more. The external credit data may include information such as number of public records, number of mortgage accounts, number of accounts that are 30+ days past due among others. The goal at this point is to make sure that the data is as clean as possible. A target variable of interest might be something like frequency of claims, severity of claims, or loss ratio. This step is many times performed by in-house resources, insurance data analysts familiar with the organization’s available data, or external consultants such as Majesco.

At all points along the way, the data scientist is reviewing the data source’s suitability and integrity. An experienced analyst will often quickly discern the character and quality of the data by asking themselves, “Does the number of policies look correct for the size of the book of business? Does the average number of exposures per policy look correct? Does the overall loss ratio seem correct? Does the number of new and renewal policies look correct? Are there an unusually high number of missing or unexpected values in the data fields? Is there an apparent reason for something to look out of order? If not, how can the data fields be corrected? If they can’t be corrected, are the data issues so important that these fields should be dropped from the data set? Some whole record observations may clearly contain bad data and should be dropped from the data set. Even further, is the data so problematic that the whole data set should be redesigned or the whole analytics project should be scrapped?

 

Once the data set has been built, it is time for an in-depth analysis that steps closer toward solution development.

Exploratory Data Analysis

Exploratory data analysis takes the newly minted data set and begins to do something with it — “poking it” with measurements and variables to see how it might stand up in actual use. The data scientist runs preliminary tests on the “evidence.” The data set is subjected to a deeper look at its collective value. If the percentage of missing values is too large, the feature is probably not a good predictor variable and should be excluded from future analysis. In this phase, it may make sense to create more features, including mathematical transformations for non-linear relationships between the features and the target variable.

For non-statisticians, marketing managers and non-analytical staff, the details of exploratory data analysis can be tedious and uninteresting. Yet they are the crux of the genius involved in data science project methodology. Exploratory Data Analysis is where data becomes useful, so it is a part of the process that can’t be left undone. No matter what one thinks of the mechanics of the process, the preliminary questions and findings can be absolutely fascinating.

Questions such as these are common at this stage:

  • Does frequency increase as the number of accounts that are 30+ days past due increases? Is there a trend?
  • Does severity decrease as the number of mortgage trades decreases? Do these trends make sense?
  • Is the number of claims per policy greater for renewal policies than for new policies? Does this finding make sense? If not, is there an error in the way the data was prepared or in the source data itself?
  • If younger drivers have lower loss ratios, should this be investigated as an error in the data or an anomaly in the business? Some trends will not make any sense, and perhaps these features should be dropped from analysis or the data set redesigned.

See also: The Science (and Art) of Data, Part 2

The more we look at data sets, the more we realize that the limits to what can be discovered or uncovered are small and growing smaller. Thinking of relationships between personal behavior and buying patterns or between credit patterns and claims can fuel the interest of everyone in the organization. As the details of the evidence begin to gain clarity, the case also begins to come into focus. An apparent “solution” begins to appear and the data scientist is ready to build that solution.

In Part 3, we’ll look at what is involved in building and testing a data science project solution and how pilots are crucial to confirming project findings.

How to Resist Sexy Analytics Software

Who’s made the mistake of buying apps or sexy analytics software just based on appearance?

Go on, own up. I’m sure at one time or other, we have all succumbed to those impulse purchases.

It’s the same with book sales. Although it should make no difference to the reading experience, an attractive cover does increase sales.

But if you approach your IT spending based on attractiveness, you’re heading for trouble.

Now you may be thinking. Hold on, that’s what my IT department is there to protect against. That may be the case in your business, but as Gartner has predicted, by 2017 the majority of IT spending in companies is expected to be made by the CMO, not the CIO.

There are advantages to that change. Software will need to be more accessible for business users and able to be configured without IT help, and the purchasers are likely to be closer to understanding the real business requirements. But, as insight teams increase their budgets, there are also risks.

This post explores some of the pitfalls I’ve seen business decision makers make. Given our focus as a blog, I’ll be concentrating on the purchase of analytics software on the basis of appearance.

1. The lure of automation and de-skilling:

Ever since the rise of BI tools in the ’90s, vendors have looked for ways to differentiate their MI or analytics software from so many others on the market. Some concentrated on “drag and drop” front ends, some on the number of algorithms supported, some on their ease of connectivity to databases, and a number began to develop more and more automation. This led to a few products (I’ll avoid naming names) creating what were basically “black box” solutions that you were meant to trust to do all the statistics for you. They became a genre of “trust us, look the models work” solutions.

Such solutions can be very tempting for marketing or analytics leaders struggling to recruit or retain the analysts/data scientists they need. Automated model production seems like a real cost saving. But if you look more deeply, there are a number of problems. Firstly, auto-fitted models rarely last as long as ‘hand crafted’ versions, and tend to degrade faster as it is much harder not to have overfitted the data provided. Related to this, such an approach does not benefit from real understanding of the domain being modeled (which is also a pitfall of outsourced analysts). Robust models benefit from variable and algorithm selection that are both appropriate to the business problem and know the meaning of the data items, as well as any likely future changes. Lastly, automating almost always excludes meaningful “exploratory data analysis,” which is a huge missed opportunity as that stage more often than not adds to knowledge of data and provides insights itself. There is not yet a real alternative to the benefits of a trained statistical eye during the analytics and model building process.

2. The quick fix of local installation:

Unlike all the work involved in designing a data architecture and appropriate data warehouse/staging/connectivity solution, analytics software is too often portrayed as a simple matter of install and run. This can also be delusory. It is not just the front end that matters with analytics software. Yes, you need that to be easy to navigate and intuitive to work with (but that is becoming a hygiene factor these days). But there is more to consider round the back end. Even if the supplier emphasizes its ease of connectivity with a wide range of powerful database platforms. Even if you know the investment has gone into making sure your data warehouse is powerful enough to handle all those queries. None of that will protect you from lack of analytics grunts.

See Also: Analytics and Survival in the Data Age

The problem, all to often, is that business users are originally offered a surprisingly cheap solution that will just run locally on their PCs or Macs. Now, that is very convenient and mobile, if you simply want to crush low volumes of data from spreadsheets or data on your laptop. But the problem comes when you want to use larger data sources and have a whole analytics team trying to do so with just local installations of the same analytics software (probably paid for per install/user). Too many current generation cheaper analytics solutions will in that case be limited to the processing power of the PC or Mac. Business users are not warned of the need to consider client-server solutions, both for collaboration and also to have a performant analytics infrastructure (especially if you also want to score data for live systems). That can lead to wasted initial spending as a costly server and reconfiguration or even new software is needed in the end.

3. The drug of cloud-based solutions:

With any product, it’s a sound consumer maxim to beware of anything that looks too easy or too cheap. Surely, such alarm bells should have rung earlier in the ears of many a marketing director who has ended up being stung by a large final “cost of ownership” for a cloud-based CRM solution. Akin to the lure of fast-fix local installation, cloud-based analytics solutions can promise even better, no installation at all. Pending needing firewall changes to have access to the solution, it offers the business leader the ultimate way to avoid those pesky IT folk. No wonder licenses have sold.

But anyone familiar with the history of the market leaders in cloud-based solutions (and even the big boys who have jumped on the bandwagon in recent years), will know it’s not that easy. Like providing free or cheap drugs at first, to create an addict, cloud-based analytics solutions have a sting in the tail. Check out the licensing agreement and what you will need to scale. As use of your solution becomes more embedded in an organization, especially if it becomes the de facto way to access a cloud-based data solution, your users  thus license costs will gather momentum. Now, I’m not saying the cloud isn’t a viable solution for some businesses. It is. But beware of the stealth sales model that is implicit.

4. Oh, abstraction, where are you now I need you more than ever?

Back in the ’90s, the original business objects product created the idea of a “layer of abstraction” or what was called a “universe.” This was configurable by the business (but probably by an experienced power user or insight analyst who knew the data), but more often than not benefited from involvement of a DBA from IT. The product looked like a visual representation of a database scheme diagram and basically defined not just all the data items the analytics software could use, but also the allowed joins between tables, etc. Beginning to sound rather too techie? Yes, obviously software vendors thought so, too. Such a definition has gone the way of metadata, perceived as a “nice to have” that is in reality avoided by flashy-looking workarounds.

The most worrying recent cases I have seen of lacking this layer of abstraction are today’s most popular data visualization tools. These support a wide range of visualizations and appear to make it as easy as “drag and drop” to create any you want from the databases to which you point the software (using more mouse action). So far, so good. Regular readers will know I’m a data visualization evangelist. The problem is that without any defined (or controlled, to use that unpopular term) definition of data access and optimal joins, the analytics queries can run amok. I’ve seen too many business users end up in confusion and have very slow response times, basically because the software is abdicating this responsibility. Come on, vendors, in a day when Hadoop et al. are making the complexity of data access more complex, there is need for more protection, not less!

Well, I hope those observations have been useful. If they protect you from an impulse purchase without having a pre-planned analytics architecture, then my time was worthwhile.

If not, well, I’m old enough to enjoy a good grumble, anyway. Keep safe! 🙂

Helping Data Scientists Through Storytelling

Good communication is always a two-way street. Insurers that employ data scientists or partner with data science consulting firms often look at those experts much like one-way suppliers. Data science supplies the analytics; the business consumes the analytics.

But as data science grows within the organization, most insurers find the relationship is less about one-sided data storytelling and more about the synergies that occur in data science and business conversations. We at Majesco don’t think it is overselling data science to say these conversations and relationships can have a monumental impact on the organization’s business direction. So, forward-thinking insurers will want to take some initiative in supporting both data scientists and business data users as they work to translate their efforts and needs for each other.

In my last two blog posts, we walked through why effective data science storytelling matters, and we looked at how data scientists can improve data science storytelling in ways that will have a meaningful impact.

In this last blog post of the series, we want to look more closely at the organization’s role in providing the personnel, tools and environment that will foster those conversations.

Hiring, supporting and partnering

Organizations should begin by attempting to hire and retain talented data scientists who are also strong communicators. They should be able to talk to their audience at different levels—very elementary levels for “newbies” and highly theoretical levels if their customers are other data scientists. Hiring a data scientist who only has a head for math or coding will not fulfill the business need for meaningful translation.

Even data scientists who are proven communicators could benefit from access to in-house designers and copywriters for presentation material. Depending on the size of the insurer, a small data communication support staff could be built to include a member of in-house marketing, a developer who understands reports and dashboards and the data scientist(s). Just creating this production support team, however, may not be enough. The team members must work together to gain their own understanding. Designers, for example, will need to work closely with the analyst to get the story right for presentation materials. This kind of scenario works well if an organization is mass-producing models of a similar type. Smooth development and effective data translation will happen with experience. The goal is to keep data scientists doing what they do best—using less time on tasks that are outside of their domain—and giving data’s story its best possibility to make an impact.

Many insurers aren’t yet large enough to employ or attract data scientists. A data science partner provides more than just added support. It supplies experience in marketing and risk modeling, experience in the details of analytic communications and a broad understanding of how many areas of the organization can be improved.

Investing in data visualization tools

Organizations will need to support their data scientists, not only with advanced statistical tools but with visualization tools. There are already many data mining tools on the market, but many of these are designed with outputs that serve a theoretical perspective, not necessarily a business perspective. For these, you’ll want to employ tools such as Tableau, Qlikview and YellowFin, which are all excellent data visualization tools that are key to business intelligence but are not central to advanced analytics. These tools are especially effective at showing how models can be used to improve the business using overlaid KPIs and statistical metrics. They can slice and dice the analytical populations of interest almost instantaneously.

When it comes to data science storytelling, one tool normally will not tell the whole story. Story telling will require a variety of tools, depending on the various ideas the data scientist is trying to convey. To implement the data and model algorithms into a system the insurer already uses, a number of additional tools may be required. (These normally aren’t major investments.)

In the near future, I think data mining/advanced analytics tools will morph into something able to contain more superior data visualization tools than are currently available. Insurers shouldn’t wait, however, to test and use the tools that are available today. Experience today will improve tomorrow’s business outcomes.

Constructing the best environment

Telling data’s story effectively may work best if the organization can foster a team management approach to data science. This kind of strategic team (different than the production team) would manage the traffic of coming and current data projects. It could include a data liaison from each department, a project manager assigned by IT to handle project flow and a business executive whose role is to make sure priority focus remains on areas of high business impact. Some of these ideas, and others, are dealt with in John Johansen’s recent blog series, Where’s the Real Home for Analytics?

To quickly reap the rewards of the data team’s knowledge, a feedback vehicle should be in place. A communication loop will allow the business to comment on what is helpful in communication; what is not helpful; which areas are ripe for current focus; and which products, services and processes could use (or provide) data streams in the future. With the digital realm in a consistent state of fresh ideas and upheaval, an energetic data science team will have the opportunity to grow together, get more creative and brainstorm more effectively on how to connect analytics to business strategies.

Equally important in these relationships is building adequate levels of trust. When the business not only understands the stories data scientists have translated for them but also trusts the sources and the scientists themselves, a vital shift has occurred. The value loop is complete, and the organization should become highly competitive.

Above all, in discussing the needs and hurdles, do not lose the excitement of what is transpiring. An insurer’s thirst for data science and data’s increased availability is a positive thing. It means complex decisions are being made with greater clarity and better opportunities for success. As business users see results that are tied to the stories supplied by data science, its value will continue to grow. It will become a fixed pillar of organizational support.

This article was written by Jane Turnbull, vice president – analytics for Majesco.

Leveraging the Power of Data Insights

The vast majority of insurance companies lack the infrastructure to mobilize around a true prescriptive analytics capability, and small- and medium-sized insurers are especially at risk, in terms of leveraging data insights into a competitive advantage. Small- and medium-sized insurers are constrained by the following key resource categories:

    • Access and ability to manage experienced data scientists
    • Ability to acquire or develop data visualization, machine learning and artificial intelligence capability
    • Experience and staff to manage extensive and complex data partnerships
    • Access to modern core insurance systems and data and analytics technology to leverage product innovation insights and new customer interactions

Changing customer behaviors, non-traditional competition and internal operational constraints are putting many traditional insurance companies—especially the smaller ones—at risk from a retention and growth perspective. The marketplace drivers create several pain points or constraints for small and medium size insurers, such as can be seen in the following graphic:

Screen Shot 2016-02-15 at 2.53.12 PM
This is excerpted from a research report from Majesco. To read the full report, click here.