Data analytic software designed to assist audit professionals in detecting material mis-statements and fraud has dramatically improved its capabilities in recent years. Growing numbers of practices have invested substantial resources in training professionals on the use of this software, such as ACL, ActiveData for Excel and IDEA. The primary benefit of using these tools is that 100% of the “structured” data, such as revenue cycle customer sales orders, bills of lading, sales invoices, and subsequent cash receipt records, can be more efficiently tested for the presence of rules-based red flags, such as duplicate sales invoices and accounts receivable aging anomalies. Manual audit processes are rarely performed to conduct these tests because of the high expense, and testing small samples of the relevant data is less likely to detect rules-based red flags that may be indicative of material fraud schemes.
On the other hand, a picture is worth a thousand words. This is especially true for the estimated 65% of the population who are visual learners. Visual analytics is an exploratory and iterative process involving the creative and dynamic discovery of potential fraud schemes. It builds on humans’ natural ability to absorb and comprehend greater amounts of information through the use of distinctive patterns, shapes, and shadings than through analysis of numerous columns of numeric data. Instead of creating static, simplistic bar charts and scatter plots and relying on a finite number of embedded rules-based red flags for potential fraud, visual analytics can create customised, multidimensional, or layered graphics, resulting in more granular analyses of all of the structured data. As a result, visual analytic software may more easily uncover otherwise hidden relationships between data elements, enabling the discovery of new fraud red flags. So visual analytic software may be an attractive alternative to IDEA and ACL for auditors of larger, publicly held entities to reduce the risk of undetected material misstatements, including fraud.
Senior management of publicly held entities and their auditors have an increased incentive to employ this software since the SEC announced its increased use of cutting-edge visual analytic software to improve the speed with which it can identify financial statement fraud and audit failures. Unfortunately, many audit practices have not embraced the opportunity to adopt this software due to perceived cost-effectiveness issues. Visual analytics’ intuitive functionality and ability to efficiently conceptualise complex data may nonetheless result in lower training costs than IDEA and ACL, as well as comparable licensing fees.
In addition, unstructured text data—which is not created and stored in a predefined, standardised format and thus to a large extent cannot be analysed by IDEA, ACL and visual analytic software—has exploded in growth and can provide additional clues as to the existence of material fraud schemes. For example, many individuals appear willing to reveal sensitive and incriminating narrative and pictorial data in supposedly private communications and social network postings that they would not consider disclosing in a financial report or a business meeting. Other types of unstructured data include corporate memos and e-mails, PDF files, social media postings, and audio and video files.
Forensic practices should seriously consider implementing text analytic capabilities. Audit practices also may benefit from an awareness of text analytic capabilities and consider applying them to high-risk engagements. Text analytic licensing costs however, may be substantially higher than those for IDEA, ACL, and visual analytic software.
IDEA and ACL Software
It is worth noting that IDEA and ACL have recently developed basic unstructured text analytic capabilities, such as the ability to import and analyse PDF and plain text files for rules-based keywords that have been found in the past to be highly correlated with fraud schemes. IDEA’s “looping search” addon and ACL’s core software package offer this capability. Searching massive unstructured databases for such keywords may be a potentially fruitful initial application of text analytics.
IDEA and ACL also have the ability to employ a more advanced approach to detecting incriminating keywords within PDF and plain text files, called concept extraction, through their respective “word list maker” and “scripthub” utilities. Unlike in rules-based searches, users do not supply keywords that the software must search for. Instead, concept extraction asks the software to rank the most frequently occurring words; auditors then apply professional judgment to the list in order to detect previously unknown incriminating words.
Visual Analytic Software
Recent advances have made visual analytic software from leading providers such as Tableau and Qlik more accessible to auditors, not only embedding complex macros, but also creating an easy-to-use format. Tableau and Qlik’s visual displays allow for multiple “measures” and “dimensions” to be easily clicked and dragged to a column or row. Measures involve continuous metrics that are normally the focus of the analysis, such as initial sales and purchasing data and related per-unit prices and returns. Dimensions define the granularity of the analysis, such as time period (e.g., yearly, quarterly, monthly, or weekly data), type of product or service, region of the country, subsidiary within the parent organisation, or employees previously flagged as potential suspects.
Leading visual analytic software also has the ability to convert street, city, county, state, and country locations into latitude and longitude coordinates that can enhance an auditor’s ability to pinpoint the location of likely fraud schemes. Tableau and Qlik automatically assign geographic roles and coordinates to fields with common geographical names; they can also be manually assigned to fields that are not automatically recognised. Financial transactions, asset information, customer data, and contracts are among the records that may contain such references to locations.
Structured data also may be converted into circles, lines, and colors to make them easier to distinguish. For example, differently sized circles may denote cities with different population sizes, while lines may suggest different streets connecting two or more cities. Differently colored circles may represent disparate regions of the country or subsidiaries within a parent company.
A heat map is a graphical representation of data wherein distinct data elements, such as the perceived level of fraud risk within client business processes based on the likelihood of occurrence and financial impact, are represented by different colors. A low fraud likelihood and financial impact might be colored green, a moderate fraud likelihood and financial impact might be yellow, and a high fraud likelihood and financial impact might be red.
An additional layer of data may be added to a heat map to reflect the extent of internal control resources devoted to mitigating fraud risks. Therefore, if processes with high fraud risk are under-controlled and processes with low fraud risk are over-controlled, auditors may offer a value-added recommendation to reallocate scarce control resources by eliminating unnecessary controls within low-risk processes and redirecting these resources to high-risk processes. These important differences may be less distinguishable if the data is presented in numeric columns and rows.
Leading visual analytic software also can import and analyse structured data from multiple, complex enterprise resource planning (ERP) relational databases and legacy mainframes, as well as the data cloud.
Text Analytic Software
Leading software from SAS, SAP, and IBM uses natural language processing (NLP) rules and advanced statistics to reveal hidden meanings in virtually any type of unstructured data—including PDF and plain text files, memo explanations of general journal entries, e-mails, social network postings, and publicly available websites—with the objective of identifying corrupt intent that can enhance high-risk audit engagements. Through advanced concept extraction and link analysis, text analytics complement visual analytics.
Text analytic software also can analyse structured data in ways that are not possible with IDEA, ACL, or visual analytic software, such as cluster analysis, market segmentation, and nearest neighbor capabilities. Core text analytic results can then be input into visual analytic software and integrated with preexisting visuals to provide a deeper view into potential fraud schemes.
Advanced concept extraction involves not only identifying potentially incriminating keywords within unstructured plain text and PDF files, but also potentially incriminating key phrases within unstructured e-mails and social network postings, which may provide more robust insights into the nature of fraud schemes.
Link analysis involves determining who is talking to whom, about what, and when. It is used to evaluate relationships between individuals and organisations. For example, link analysis can be used to pinpoint the most common recipients of, and responders to, a primary suspect’s unstructured e-mail and social network communications, thus identifying likely co-conspirators. These communications may also include keywords associated with similar past fraud schemes. These links can be more easily identified through visualizing the results. Fortunately, leading text analytic software also has some embedded visual capabilities.
Cluster analysis uses various statistical algorithms to identify groups of similar records and label them according to the group to which they belong. Instead of distinguishing between dependent and independent variables, cluster analysis examines interdependent relationships across all records. Two practical applications of cluster analysis are market segmentation and nearest neighbor analyses.
Market segmentation analysis can apply clustering techniques to structured socio-demographic data such as income, education, and type of housing, to identify distinct clusters of potential customers who are more likely to purchase certain products and services. From a marketing perspective, more disadvantaged clusters, such as those with lower income and education levels and more multifamily housing, may not receive as many expensive product and service advertisements as more advantaged clusters within a given city, county, or state. Nearest neighbor analysis uses an advanced computer algorithm to measure the distance between dissimilar groups or clusters. The core results of market segmentation and nearest neighbor analyses can then be input into visual analytic software and integrated with pre-existing visuals to provide a deeper view into potential fraud schemes.
Leading text analytic software can import and analyse unstructured and structured data from multiple, complex ERP relational databases and legacy mainframes, as well as the data cloud.