This article was first published on February 10, 2018 on my linkedin page: https://www.linkedin.com/pulse/art-science-leonardos-lesson-todays-data-scientists-gyasi-dapaa/
If there is any single wisdom transpired by Leonardo’s Vitruvian Man, then it’s the belief that humans can be whatever they desire —call that bio-plasticity. The second, and also the subject of interest, is the creativity recipe: Science plus Art! This is true for Data Science as with all sciences whether Engineering, Economics, Physics, Psychology, Biology or any other. I normally say, just for the sake of a shoddy wit, that Data Science is 90% science and the other half is arts, to emphasize the equal importance of the two. This article describes how science and art interact to produce creative data insights, and also provides a framework for managing data scientists based on their scientific and artistic aptitudes.
The Science, and Why it’s Necessary
The essence of knowing the science is taken for granted as no data solution can be fashioned without such technical knowhow. A data scientist is as empowered as the size of his basket of data mining techniques such as linear regression models, non-linear regression models, clustering analysis, classification tree analysis, principal component analysis, neural networks, genetic algorithms, memory-based algorithms, discrimnant analysis, just to mention a handful. Even though the list is limitless, a data scientist ought not to know all, thank goodness, to be useful as many of the methods tend to be substitutes of each other. The prudent data scientist will therefore first master the popular ones such as regression models, classification trees, and clustering analysis, and continuously build on them in a pareto fashion. Even better, he should strive to curate a diversified acumen by learning at least one technique from each of the four major bodies of analytics:
Reports/Visualizations: Leverage historic data to inform us about the state of our world or organization as of the day of analysis. Examples are dashboards, pie charts, histograms, maps, etc.
Classification: Describes the defining characteristics of an event of interest such as customer retention, revenue, fraud, etc. Example techniques are classification trees, clustering analysis, and regression models.
Prediction/Estimation: Estimates the magnitude of things such as sales, or the likelihood of an event such as a customer terminating a relationship with a vendor. Example techniques are neural networks, linear regression models, non-linear regression models, survival analysis, multinomial logistic regressions, etc.
Forecasting: Estimates the magnitude and likelihood of future events. This is predominantly Time Series Analysis Models even though there have been occasions when non-traditional techniques like neural networks have been modified to forecast future events.
Optimization: Given what we know today(from our reports), our expectations of the future(from our forecasts), and a sophisticated understanding about correlational and causal factors(from our classification and predictive analytics), what should we do to help us optimally achieve our objectives? This is typically a quantitative (mathematical) exercise.
The data scientist who patiently curates such a diversified portfolio of skill set—I call him the Octopus Dataist—will be able to solve myriads of business problems and indubitably be of immense value to the business enterprises he serves. This is why I advise upcoming data scientist to rather invest their time in mastering new techniques rather than new languages. For let’s face it, a clustering analysis in SAS will yield the same results in R; however, if you don’t know clustering analysis, you are handicapped in creating clusters of anything regardless of how multilingual you are. And it’s for the same reason I advise analytics leaders to support a flexible infrastructure of tools to meet the eclectic preferences of their teams rather than forcing their analysts to be fluent in only one language: The cumulative cost of relearning a new language or working with a second-preferred tool supersedes by far the cost of supporting the tool, even if a commercial one.
The Art and Why it’s Important
A helpful picture to keep in mind when creating data solutions especially for the business is to think of the business problems as round holes, and your statistical techniques as square pegs. When faced with a business problem, the data scientist first reaches into his toolbox to find the best square-peg—The Science. The richer his toolkit, the more flexible he is in finding a workable option. However, there is rarely a perfect match, and so the data scientist will need to chisel the chosen peg to fit snugly into the hole—The Art.
To demonstrate concretely, suppose you are a data scientist for a health company interested in evaluating the determinants or risks of heart attack of a patient. The merely scientific data miner will recoil into his shop and exert himself to running the most accurate logistic regression with no consideration of the perspectives or technical aptitude of the stakeholders. The artful data scientist will rather seek to understand the motive behind the question, relevant historic works and context if any, the analytical maturity level of the stakeholders of the solution, sources of data, the intended applications of the solution, and other pertinent factors.
Assessing the technical acumen of his business partners, the artful data scientist may use a linear rather than logistic probability model: This makes his model transparent, and his model effects easy to interpret. Many may be the number of statistically significant variables in his model but he would retain only four to six of the most predictive for simplicity sake. He rightly reckons that since this is one of the first predictive models of the company, it’s better to start small than big. As he obtains results from his analysis, he discusses them, even if preliminary, with his business partners to check for business sense and gauge their emotional connection to the emerging product. And instead of merely presenting point estimate predictions, he correctly conjecture that they—the point estimates—may be mistaken for God’s calls; thus he also presents confidence intervals to describe the bounds of equally likely outcomes and to also reinforce the uncertainty in his model predictions!
All of these behaviors seem small but go a long way to bolster the credibility of the business in the analytics product and enhances its probability of being assimilated into operations.Science is all the acts you do to produce an accurate solution; art is all the things you do to produce a tailored solution.
Breeding a High-Performance Data Science Team
If people are the greatest assets to an organization, then cultivating a well-skilled data science team should be the primary responsibility of the chief data science officer (CDO). As a general rule, he or she should not hire anyone who doesn’t have a thriving level of comfort with analytic techniques, and is not willing to learn them. Given the importance of science and art in the success of the department, the CDO should also maintain two career development tracks: Technical and Business. The technical allows analysts who have a skewed interest in the science to continue to move up the career ladder as they grow in their technical understanding, ability and sophistication. The business track allows analysts with a good mix of science and art and a competency in managing relationships between the data science team and its stakeholders—typically non-technical— to move up as well. While most data science functions have a business track, the technical tracks are normally rare; and this has resulted in teams which have good business managers but lack technical managers with the right level of scientific sophistication to create products nuanced enough to thrill business partners and clients.
In Closing…
Analytics remains a powerful yet foreign utility in business. Its ultimate power resides not in its theoretical elegance, but in its usefulness to and applications in business. The burden therefore rests on data scientists to manufacture analytics products that win the rational and emotional buy in of their business partners and stakeholders. We need to painstakingly craft technical solutions that solve the business problem, and work harder to showcase its beauty for the most technically aloof mind to appreciate. When we do this, our value and status will rise wherever we serve.