If a fintech firm with text data at their disposal is not using it to employ natural language processing models – a branch of artificial intelligence that teaches machines to understand, analyze, and generate human language – they are missing out.
Natural language processing models or NLP can and should be employed regularly to assess a firm’s internal and external text material to understand the sentiments of the customers as well as those of employees. It can also be used to identify important themes or business trends for the company to assess and integrate into their business strategy.
This is particularly so with the emergence of generative AI, making natural language processing capabilities more powerful than ever.
That is the clear message from data scientist Sumedha Rai in an interview with Fintech Nexus as well as in presentations at two recent conferences in New York City this spring – the AI in Finance Summit and MLConf 2024 gathering of AI and machine learning experts.
However, these are just two of the results that firms can get out of ongoing text analysis via NLP models.
Rai adds that such NLP tools, used together with other machine learning and AI solutions, can also be used to rapidly summarize and translate documents, understand important tags in text data, personalize interactions with customers, and catch fraudsters by picking up anomalies in their communications.
Rai is a senior data scientist at a micro-investment firm in New York City, where she spends a great deal of time analyzing user sentiment and themes, reviewing data to assist in investment decisions, and creating fraud prevention models. She also researches with the Center for Data Science and other affiliated departments at New York University.
She notes that perhaps the most important benefit that comes with regular text analysis via NLP – aside from greater efficiency — is that “people (employees) will have far more time to think about the creative stuff,” related to product development and one’s business strategy, which is a distinct competitive advantage.
Text relevant for NLP analysis or summarization includes everything from customer feedback, postings, complaints, social media comments, emails and survey results to transaction data, company website and internal data, employee communications, claims calls, agent feedback, regulatory, compliance, and legal data.
The benefits of quarterly or ongoing assessment of such texts via NLP, Rai says, is that fintech firms can more easily customize services, build better chatbots, detect fraud, summarize and translate global compliance and regulatory documents, and gain a better understanding of employee satisfaction levels.
One type of text analysis – using NLP for topic modeling – can be used to track the topics that are uppermost in the minds of one’s customers – including what they like or don’t like about a product — and is an activity that Rai believes may be underutilized by many fintech firms.
Using this technique, “Fintech firms should consider all of their problems and challenges and see how much signal they have received for these problems in the form of text. They should then leverage NLP analysis of text data to help solve many of these issues,” Rai says.
NLP models that can assist with this exercise include Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), LDA2vec, and BERTopic and its different variations though, for fintech firms in particular, using FinBERT, a transformer model that was specifically pretrained on financial text, is also a great choice.
Among these model choices, however, Rai is particularly partial to the BERT models because they are bi-directional in design and capture context based on this bi-directionality.
“They (BERT models) also have contextual embeddings, which enable the models to understand a word by considering all other words around it and take into account the context for each occurrence of a given word,” Rai says.
She adds: “Additionally, we now have access to powerful word embeddings from GenAI models, some of which are freely downloadable. However, BERT is a great choice for establishing a baseline when working with LLMs, particularly when working with financial text.”
Rai also highlighted the importance of making full use of Named Entity Recognition (NER), a subfield of NLP that pertains to tagging text so that named entities – individual words, phrases, or sequences of words – can be easily categorized.
“NER is a base technology that is very underused but, in fact, can be employed in multiple ways to better understand what entities customers are most interested in, allowing you to better tailor your communications with them,” Rai says.
She notes that NER analysis gives us a way to extract all critical information a lot faster from a large body of text and it can be used to flag risky interactions or anomalies that may indicate potential fraud. In this way, it plays a pivotal role in one’s ongoing sentiment analysis and text classification.
One particularly helpful feature, says Rai, is NER’s ability to help one “eyeball compliance documents really fast,” so that one can quickly extract key information from lengthy documents and review it later in an efficient manner.
With the introduction of Generative AI models, Rai says, fintech firms now have access to a powerful tool for text analysis where minimal coding is involved, when using the out-of-box solution directly. However, the tradeoff may be in the level of accuracy that may be lost in using out-of-the-box Gen AI models versus fine tuning a model for specific tasks.
“Generative AI models are pre-trained and so, for a simple text analysis, a pre-trained model can often do the job,” Rai says, adding that with multiple generative AI models to choose from, she favors the ease of use of Chat GPT which continues to improve in accuracy and also has easily accessible APIs to integrate the GPT models into code.
She also finds Meta’s LLAMA models – LLAMA 3 in particular – to be powerful and helpful and it is free to use.
However, Rai warns that fintech firms do have to keep in mind that there are risks in using out-of-the-box generative AI models.
“No sensitive or customer data should be fed to these models. These are hosted systems and the data goes out of your local machines and to a server where the model resides,” Rai says noting that the data from interactions can be analyzed by the companies making the LLMs to improve performance and reliability of their systems.
“Even if you are using the enterprise version of these models, I would still make sure that your data has been stripped of all personally identifiable information (PII) before it is fed into a model or used to query the model,” Rai says.
Evaluating models for bias, discrimination, data security, data privacy, hallucinations, and respectful content creation is also key, Rai says, and starts with looking at what sort of data you are ingesting into the model, making sure all classes, genders, and geographies are represented and also by employing a diverse team of people to work on models as opposed to only one person.
Increasingly, Rai says, some fintech firms are hiring red teams from the outside of their company to conduct a thorough assessment and to ensure that a firm’s working models have been “de-biased.” are not generating biased results that can result in discriminatory practices.
One Gen AI time saver that Rai particularly liked involved asking Chat GPT to create a logo, tagline, and launch press release for a fantasy fintech firm.
“The results were impressive,” Rai said, noting that on an ongoing basis, Chat GPT continues to improve and to impress.