Transforming Analytics with Large Language Models: Design Principles and Limitations

by Ramesh Panuganty, Founder & CEO

by Ramesh Panuganty, Founder & CEO

In recent months, large language models (LLMs) have gained immense popularity, raising expectations of a transformative impact on various enterprise use-cases and workflows. In this technical blog, we will explore the potential of LLMs in the context of handling analytical queries. To begin, let's provide an introduction to LLMs.

Introduction to Large Language Models (LLMs)

Large language models (LLMs) have emerged as powerful AI models, fueled by massive datasets and advanced natural language processing techniques. These models, such as GPT-3 and BERT, possess the ability to understand and generate human-like text, making them versatile tools for a wide range of applications.

Design Principles of an Analytical Platform

Before we delve into the role of LLMs in analytics, let's establish the fundamental design principles of an analytical platform:

1. Data Understanding

An analytical platform must be able to comprehensively understand the data it handles in order to provide accurate and meaningful insights. This includes understanding the following:

Data layout: The way that the data is structured and organized. For example, is it stored in relational databases, data lakes, or data warehouses?
Data schema: The definitions of the data types and structures, such as tables, columns, and relationships.
Data catalog: The metadata about the data, such as descriptions, lineage, and ownership.

By understanding the data, an analytical platform can:

Interpret queries correctly and return accurate results.
Generate visualizations that are relevant and easy to understand.
Help users to understand the data and its limitations.

2. User Interface

An analytical platform should have a user-friendly interface that makes it easy for users to query the data and explore the results. The interface should be flexible enough to accommodate different user needs, such as:

Data scientists who need to perform complex queries and analysis.
Business analysts who need to generate reports and dashboards.
Executives who need to quickly access key insights.

The interface should also be responsive and scalable, so that it can handle a large number of users and queries.

3. Empowering Users

An effective analytical platform should empower users to explore the data and discover insights independently. This means providing users with the tools and resources they need to:

Write and execute queries.
Create and share visualizations.
Collaborate with other users.
Access and use analytical models.

The platform should also be designed to be easy to learn and use, so that users can quickly get started and start generating value.

4. Uncovering Hidden Insights

Analytics should go beyond answering known questions. It should also help users to discover new insights and patterns that they may not have been aware of. This can be done by using a variety of techniques, such as:

Machine learning algorithms to identify patterns in the data.
Natural language processing to extract insights from text data.
Statistical analysis to identify correlations and trends.

By uncovering hidden insights, analytics can help users to make better decisions and improve their performance.

5. Presentation of Outcomes

The presentation of analytical outcomes should be clear and understandable, so that users can easily interpret and act on them. This means using visualizations, tables, and charts that are appropriate for the audience and the type of insights being presented.

The platform should also provide users with the ability to customize the presentation of results to meet their specific needs. For example, users should be able to filter the results, drill down into specific areas, and export the results to other applications.

6. Data Governance

Strict data governance controls are essential to ensure the security and compliance of analytical data. These controls should manage access to the data and ensure that it is used only for authorized purposes.

The data governance controls should also be integrated with existing authorization and authentication systems, so that users can easily access the data they need without having to manage multiple credentials.

The Role of LLMs in Analytics

Now, let's explore how LLMs can contribute to analytics within the framework of these design principles:

Text Generation

LLMs can be used to generate text in a variety of formats, including reports, articles, and even creative content. This can be useful for analysts who need to communicate their findings to a variety of audiences.

An analyst could use an LLM to generate a report summarizing the results of a data analysis. The report could be tailored to the specific audience, such as executives or other analysts. The LLM could also be used to generate charts and graphs to illustrate the results.

However, it is important to note that LLMs are not perfect. They can generate text that is factually correct but lacks contextual understanding.

For example, an LLM might generate a report that describes "last month sales were awesome at 2.3 million dollars", without considering the broader context, such as whether the sales figure is truly awesome or below expectations.

Search Interface

LLMs can be used to create a natural language search interface for querying data. This can make it easier for users to find the data they need, even if they are not familiar with the underlying database or data warehouse.

Queries like "how was the sales growth last quarter compared to the previous quarter" may require significant testing and customization to function optimally.

However, the effectiveness of LLM-powered search interfaces depends on the complexity of the queries and the customization required for specific environments. For example, complex queries may require significant testing and customization to function optimally.

Additionally, it is important to note that LLMs are not perfect. They can sometimes misinterpret natural language queries or return irrelevant results. Therefore, it is important to review the results of LLM-powered searches carefully before using them.

Discovering Insights

LLMs can potentially uncover insights in specific scenarios. However, they operate based on the principle of "you only get what you ask for." However, it is important to note that LLMs can only uncover insights that are explicitly programmed into them. To provide insights beyond user queries, operationalizing LLM-generated insights presents a challenge. Ranking and presenting meaningful insights from a potentially large pool of results require careful consideration.

One way to operationalize LLM-generated insights is to use a machine learning model to rank the insights by their importance. The model could be trained on a dataset of insights that have been manually rated by human experts.

For example, imagine two insights "last week hot Cappuccino sales went up by 2% of average weekly because of a marketing campaign", vs "this month sales of chai tea latte dipped by 3.8% for mobile app purchases." Ranking and picking one of them as a headline notification would not be possible in a generic way, because the context of business impact would not be understood.

Another way to operationalize LLM-generated insights is to present them to users in a way that makes it easy for them to identify and explore the most important insights. For example, the insights could be presented in a dashboard or report that highlights the most important trends and patterns in the data.

Overall, LLMs have the potential to revolutionize the way that we do analytics. However, it is important to be aware of the limitations of LLMs and to use them responsibly.

Limitations of LLMs in Analytics

While LLMs offer valuable capabilities, certain critical aspects of analytics fall outside their purview:

Data Governance: Managing data access, metadata, and chart management for governance purposes must be addressed separately from LLMs. These functions are crucial for ensuring data security, privacy, and compliance with regulations.
Exploratory Visualizations: The creation of exploratory visualizations, which allow users to interactively explore data, requires specialized tools and platforms that are distinct from LLMs. For example, you have a monthly sales chart this yea and like to ‘exclude all coffee products’. How do you do that in the visualization that the LLM gives you?

In summary, LLMs hold significant promise for revolutionizing analytics by enhancing text generation, enabling natural language search, and potentially uncovering insights. However, organizations should recognize that they are not a one-size-fits-all solution. Integrating LLMs into analytical platforms should be a strategic decision, considering both their strengths and limitations, while addressing critical data governance and visualization aspects separately.

How MachEye Enhances LLMs for Analytics

MachEye has spearheaded the integration of LLMs into its modern analytics platform architecture, pioneering a groundbreaking analytics encounter known as SearchAI. In our blog on Copilot for data analytics, we delve deeper into how MachEye's SearchAI harnesses LLMs to enhance the analytics landscape. Here's a breakdown of how MachEye's SearchAI leverages LLMs to elevate the analytics landscape:

User Access Layer

MachEye enables users with more than ever to ask questions that would always result in 100% accurate answers with ambiguity. The Intelligent Search empowers a fast, intuitive, and AI-powered search experience that handles incomplete searches and ambiguity corrections and always gives instant answers that are always accurate.

Headlines are brief snippets of personalized insights that are identified as they happen in data and presented in crisp text and audio-visuals. They offer a true click-less analytics experience. As part of MachEye’s click-less intelligence, they are generated automatically, without requiring search. Headlines are personalized for every user, and are generated for different insights such as anomalies, analogies, segmentation, growth rate, trends, distribution, seasonalities, and correlations found in data.

The suggested searches & suggested attributes are generated based on the usage patterns, both active active and passive feedback, and catalog updates so that the suggestions are always personalized and relevant.

MachEye's suggested searches help you overcome the challenge of starting with an empty search bar with suggested starter questions, guiding your exploration of data. As users pick any of the suggested searches, one can easily ask follow-up questions or apply the suggestions in exploring further very easily.

Modelling / Catalog

A typical enterprise data store contains several fact tables and dimension tables. These tables would have relationships, cardinality and join conditions. It is important to have these entities joined in the right way to fetch the right results. This way, you can view and use combined data for your analysis.

A data catalog is an organized inventory of data assets in the workspace. It contains the data properties or metadata that describes the entities and attributes that are part of the data warehouse. MachEye’s automated data catalog not only ensures a fast onboarding of your data store, but also enriches metadata using proprietary language models.

Business metrics help to track the key metrics (or KPI’s) in your business that are aligned to your business objectives that you want to track closely. Since objectives can vary from organization to organization, it is essential for businesses to define them as per their requirements. MachEye provides you the ability to define business metrics to track these drivers and identify how they are affecting the performance.

MachEye uses LLMs to help generate improved friendly names that are not possible with custom language models.

For example, a column “value” can have a friendly name as “User Rating” and immensely improves consumption of analytics for business users.

MachEye empowers data analysts to create data models that serve the searchable needs of every business user. MachEye always had custom language models to generate alternative uses for common business attributes, metrics & KPIs.

Data Governance

Data governance is important to maintain secure access and usage of data. It is implemented to ensure that the right users get access to the right data. MachEye ensures that data governance can be enforced seamlessly even at granular levels, without writing any code.

You can create and apply secure data governance and access policies for all members and data in your workspace. Policies help you manage the data that members can access to search and find answers to their questions. These policies ensure data governance even beyond search results. They can be enforced to manage disambiguation suggestions, recommendations, headlines, and insights received by the members.

Create Granular Access Control policies with MachEye for granular access control to specific entities, attributes, and data attributes. Enforce policies to impact search, insights, headlines, and generative analytics.

MachEye's data governance empowers users to perform intelligent searches with enterprise controls.

Technology Stack

Beyond answering the question, MachEye identifies a series of signals that are relevant to the context of the user query and runs the additional dataset with AI models on infrastructure that gets deployed on the fly. The process of bringing human-in-loop to leverage AI for faster and reliable answers, while maintaining the guardrails of enterprise controls, gives immense control and ease of governance.

Understanding “What” is not enough. Answering the “Why” behind business changes is crucial to find root causes and make informed actions. Advanced configurable AI models answer the “Why” question by discovering anomalies, outliers, segments, trends and more from data. Actionable insights are provided to business users based on their job role, interest areas, and behavior for a personalized data experience.

LLMs are utilized to improve feedback, relevance, and deduplication in generating actionable insights. The system identifies context-relevant signals, deploys AI models, and presents insights with suitable visualizations, text, and audio narratives, ensuring users receive the most pertinent information.

Any discovered insights would go through a series of de-duplication and rank order such that user will see the most business relevant insights presented with best-fit visualizations, text and audio narratives.

Presentation Layer

Using the art of data storytelling, search results and insights are presented in the form of audio-visuals. Text summaries and audio narrations are created on-the-fly using natural language generation, making business insights easy and interesting to consume. These interactive stories hide the complexities and encourage users to dive deep into insights and recommendations.

Delivering the appropriate visualizations along with accompanying text narrations significantly enhances the user experience for enterprise users. MachEye elevates the user experience by offering answers, insights, and why-analysis in both text and audio formats. This innovative approach results in an eight-fold enhancement in data consumption, making analytics more accessible and engaging for users.

MachEye takes it a step further by presenting answers, insights, and why-analysis in both text and audio formats, resulting in an eight-fold improvement in consumption.

Imagine a user receiving a headline from MachEye, as an interactive audio-visual, saying “The Iced Coffee sales last week declined by 12.42% in Northern California compared to the prior week, as the marketing campaign X ended two weeks ago.”

This provides users with an experience akin to having an analyst personally guide them through understanding the insights. Each of your business users will have a dedicated co-pilot to assist them with their everyday analytics needs.

How does MachEye leverage LLMs and shape the modern analytics stack?

By harnessing the strengths of our foundational and proven technologies in intelligent search, actionable insights, and interactive audio-visuals, combined with the capabilities of Large Language Models (LLMs), MachEye's AI-powered enterprise search provides answers, insights, why analysis, and interactive stories. This is all achieved while maintaining 100% accuracy, trust, data governance, and enterprise security. Thanks to human-in-the-loop machine learning, SearchAI system continually enhances itself, all while keeping your data securely managed.