Despite the risks, trading desks can leverage sensitive information to their advantage while complying with regulations and protecting that data.
That is one takeaway from New Edges: How to Make the Most of Trading Flow, a study commissioned by LeapYear and produced by Aite-Novarica Group. It analyzed current data sharing techniques and offered new ones.
In late 2021, Aite-Novartica Group surveyed market participants to glean their opinions on sharing data flow within firms. Two-thirds of participants were from front-office sell-side desks across asset classes. Most of the remaining were front office personnel at asset management firms.
LeapYear enables responsible analysis and sharing of sensitive data. They achieve this thanks to fundamental advances in machine learning and cryptography and partnerships with financial institutions and healthcare companies.
Companies want to monetize through data sharing but see risk
Founder Ishaan Nerurkar has a background in working with complex systems and applying mathematical techniques to interpret data in complicated environments. His early work showed how those efforts could help solve problems for government agencies, non-profits, and multinationals.
Along the way, he learned that many organizations are interested in monetizing their data internally and externally but are reluctant due to privacy concerns. Cloud technology, including storage and new analytical capabilities, makes that data easier to leverage. It can provide new insights into market movements and who is driving changes.
But, challenges remain.
“Say you’re a bank, and you see trading information from your clients,” Nerurkar said. “You want to make that available to your traders and clients as a resource. It’s a valuable opportunity for you to pursue.
“But the challenge is that you can’t reveal the trades of any of your clients because they’re trusting you as their bank with their data.”
Some institutions anonymize the data by removing names. Others aggregate the data or otherwise obfuscate it. There are issues with those approaches. Their trading information is available even if a client’s name is erased. Motivated parties could figure it out.
Demand for data is high; protections are low
Nerurkar highlighted two main points from the study. The demand for trading data is high. Many internal trading groups want that data for demand forecasting and risk management. External clients of major broker-dealers look to that data to understand market trading activity.
The results also confirmed what Nerurkar has consistently heard from buy-side participants – current data protection approaches are insufficient.
“All involve either aggregating the data or redacting it, or aggregating at an industry or sector level, rather than at the trade or security level,” Nerurkar said. “All the approaches attempting to solve this problem can be compromised to leak information about competitors’ traits. Or they aggregate the data so much that it is no longer valuable.
“The high-level takeaway is there’s a significant opportunity in this space because the data is extremely valuable. The only reason it’s not commercialized is that there’s been no good way to retain its value while protecting the privacy of the individual clients.”
That is where LeapYear comes in, Nerurkar said. They are a trusted third party specializing in protecting the data while retaining its value and ability to be shared.
Related:
Problems with current data sharing approaches
When institutions aggregate data, they look at every trade and assign it to an industry or sector. A client would not see that someone specifically traded Apple but would view the cumulative movement of all tech companies.
That protects privacy by hiding specific trades but devalues the data. Someone on the other side may be evaluating a short position, so seeing sector-wide demand is useless, especially if a company is performing counter to the sector as a whole.
“Aggregation is probably the most common, and it’s also why clients today don’t find flow data from their brokers to be that useful,” Nerurkar noted. “It’s so highly aggregated that it does not help investors understand the performance of any particular company or the interest in a specific company.
“Its main shortcoming is that it devalues the data to the point where it’s effectively useless.”
Issues with aggregation
Aggregation rules or statistical disclosure control stipulate when data will be shared. It could be setting a minimum for the number of clients that have to hold positions in a company before data is shared. Perhaps it is a set maximum percentage of stock someone can hold in a company.
“If you had 100 clients trading a stock, but one of them is responsible for 90%, you don’t want to disclose it. You’d be effectively revealing their trades,” Nerurkar explained. “Conversely, if you only had one client trading the stock, you don’t want to disclose that either.
“The important thing about this approach is that rules can be broken. Conversely, some brilliant people are incentivized to figure out how to get around these restrictions to determine what’s going on in the market.
“You could institute arbitrary rules, but they’re easily broken. We’ve done demonstrations for banks showing how to break these rules easily. If you aggregate the data, you devalue it, and putting in these types of rules puts your clients at risk.”
Internal demand for data sharing is insatiable, but risks need to be understood
Many groups within institutions are pushing for broader access to quality data. Research teams, for example, want to derive granular insights to understand company performance and general market trends better. Market maker desks wish to reduce risk and lower trading costs.
One hindrance is that information barriers are put in place for good reasons, Nerurkar said. They help ensure banks act in their clients’ best interests.
“You never want a scenario where your bank is trading against you,” he said. “And also for regulatory reasons as well as to manage headline risk as well. You don’t want scenarios where the public believes you are looking at cardholder data and learning something personally identifiable or that you’re monetizing data about individual people.”
“That’s where LeapYear helps. Our software ensures that insights can be gleaned about the data without disclosing something about an individual cardholder, merchant or trading party in the process.”
New data sharing measures offer promise
The best data-sharing technique for your institution depends on your goals, Nerurkar said. Differential privacy is a process that makes it impossible to determine from looking at output if an original data set includes a specific individual’s data. Other benefits include defenses against future privacy attacks and a privacy budget that determines how much privacy is lost over multiple queries.
K-anonymity combines similar data sets while hiding identifying information of contributors. In data masking, the data element is substituted at a transactional level with other random characters. Pseudonymization replaces data elements at the transaction level with similar data characters.
Synthetic data uses machine learning to analyze real data sets while producing fake ones with similar characteristics and diversity. Multiparty computation deploys cryptography so parties with limited trust can share and analyze data without uncovering the underlying data set.
“These approaches are designed to allow you to do computing on encrypted data without decrypting. They’re precious when you’re trying to solve the problem, and you don’t have trusted infrastructure on which to compute,” Nerurkar explained. “If you were trying to store data in the cloud but didn’t trust your cloud provider, these encryption techniques are useful for that.
“Or you’re trying to get two or three different banks to analyze their shared data jointly, but you don’t have a trusted execution environment to run that calculation.”
When faster isn’t better in data sharing
Your need for speed depends on your use case, Nerurkar said. Getting new data in real-time isn’t helpful in sectors with less activity. You’re wasting resources on unnecessarily complex systems. Aggregating these types of data will give you more significant insights.
“If you tried to share (this) data real-time, you’d have to obfuscate it so much that you’d learn nothing useful,” Nerurkar said. “But if you (aggregate it) at the daily level, when you share the data, you wouldn’t talk to obfuscate it as much because it has many more data points that you can you can use”
Institutional banks are the biggest winners
Perhaps the biggest benefactors of these new approaches are institutional banks, Nerurkar suggested. They can receive aggregated data from their retail arms.
It’s a significant step forward from the status quo from even two years ago, when there was a clear line between a bank’s institutional and retail arms. They either didn’t share data at all, or the information they shared wasn’t valuable.
LeapYear helps their clients share that data across their various sections in manners that respect the privacy and confidentiality of cardholders and merchants.
“All parties that have any relationship with the bank are fully protected from their data being disclosed,” Nerurkar said. “Yet the data can be made available in a very rich way for analytics to the institutional parts of the business and their clients.”
Just say yes
While most of the industry knows there are opportunities to share data internally and externally, they say “no” due to not knowing the art of the possible, Nerurkar explained. Invest the time to learn the ins and outs of these different techniques so you can generate maximum benefit and differentiate yourselves from competitors.
“Sometimes business leaders will say, ‘I’m not sharing my agency trading data at all because I can’t compromise my clients’, which makes perfect sense,” Nerurkar said. “Or they’ll say, ‘I’m not going to share my credit card data because I can’t compromise my merchants’.”
“Only part of that statement is true. You can’t compromise your merchants. You can’t compromise your paying clients. But it doesn’t mean you can’t get insights from the data. The most important thing we’ve done at LeapYear is come up with a separation… We enable you to share insights from the data without compromising the parties in the data.”