Andre Franca, PhD
United Kingdom
3K followers
500+ connections
View mutual connections with Andre
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
View mutual connections with Andre
Welcome back
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
or
By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.
New to LinkedIn? Join now
Experience
View Andre’s full profile
Other similar profiles
-
Yoshiyuki Hamajima
Greater LondonConnect -
Hendrik Brackmann
LondonConnect -
Michael Russell
United KingdomConnect -
Douggie Melville-Clarke
EdinburghConnect -
Somita Y.
Chief Data Officer at Animal Friends Insurance
Greater LondonConnect -
Alejandro Ortega Ancel
LondonConnect -
Sorush Lajevardi
London Area, United KingdomConnect -
Ben Houghton
LondonConnect -
Matthew Sattler
Partner, Pilot Wave Holdings
Miami, FLConnect -
Ahmed Khamassi
LondonConnect -
Syed Sameer Rahman
Greater Cardiff AreaConnect -
Huy V.
LondonConnect -
Martin Sewell
CambridgeConnect -
Andreea-Ingrid Cross
LondonConnect -
Charles Fouquet
ED at Morgan Stanley
LondonConnect -
Rob Otter
Greater LondonConnect -
Sean Durkin
Manchester Area, United KingdomConnect -
Petr Vaclav
United KingdomConnect -
Krzysztof Osiewalski
Greater LondonConnect -
Himanshu Bhararra
Executive Director specializing in Analytics, Risk Management, Customer and Product Strategy
LondonConnect
Explore more posts
-
Jim Dowling
RAG using tabular data with function calling - my talk from @pydatalondon is now available on youtube: The talk is scattered with references to Monty Python's Quest for the Holy Grail (tabular data is the holy grail of RAG for Enterprises). https://lnkd.in/dAkSUMiE
13 -
Andrei Lopatenko 🇺🇦
Research from Harward on the impact of the Human Preference Alignment method on the model trustworthiness There were many papers published recently focused on trade-offs and comparisons of various alinements methods Listen a quite good podcast that reviews this area with a lot of insights about RLHF and other methods (Cohere Researcher Arash Ahmadian on TalkRL, The Reinforcement Learning podcast https://lnkd.in/gSbduVBh) In this paper, the authors focus on the trustworthiness properties of the models under various alignment training Also, the evaluation methods are interesting They focus on the worst case behaviors etc vs average cases. https://lnkd.in/g2cBgDYa
3 -
Vincent Moens
I was (rhetorically) asked why should anyone care about anything else than single file implementations (SFI) in RL research, so here is my take: First let me acknowledge that SFI such as CleanRL are great and this lib in particular has contributed to RL in an unprecedented manner. I could only wish I had the skills to make something as impactful as that! Having it all in one file that you can put side to side with the paper is super useful: helps you understand what is going on in the algorithm, and lets you identify very precisely where you can tweak the precise line that you think needs improvement. It’s not only clear but also efficient: since each file on out considers a specific scenario (limited environments and hyper parameters) you can safely over optimise for that one. So if your research is about changing that a bunch of lines in the DQN algorithm and show that you get an improved SOTA result in Atari results, go for it! Same if you want to show/test that your task can be solved by a regular SAC implementation and it looks more or less like that one in the example file. Now one scenario where do SFI fall short is whenever you need anything that is not strictly accounted for by the implementation: eg you need to use a stateless environment, you need some specific replay buffer etc. Another issue is that the “proper” way of recycling these scripts for research is to copy paste part of the script. That means that you are isolating a bunch of code that is undocumented and take responsibility for what you’re doing with it. Comparatively, a modular framework will provide you with documented and tested chunks of code (classes or functions) which responsibility and coverage is clearly defined. If the collaborator has done his job properly, you should also have an example of how to run this primitive (In fact, SFI are mostly not uni-tested, the macro-tests they achieve mostly comes from proven reproducible learning curves). Regarding efficiency, sure a SFI is very fast at doing what it does, but you won’t be able to scale it a lot: if you need hugely distributed solutions, or highly engineered ways of offloading RAM/GPU usage on disk these repos will provide a limited help and guidance. Finally, I think this isn’t a black or white situation: one can perfectly recycle a SFI and replace the buffer or data collection part with a modular implementation (such as TorchRL ‘s) #reinforcementlearning #research
291 Comment -
Neil Ashton
Do you agree with Florian Menter that RANS models for #CFD will never go away? I tend to agree with him. Whilst there has been clear advances in scale resolving methods such as Hybrid RANS-LES and WMLES methods (thanks in part to greater availability of #HPC and GPUs) - there are some problems where RANS do as well, and it would simply be a waste of money to do anything more. If this is true, then should we not put more investment into continuing developing these? I'm told by many of my academic colleagues that it's near impossible to get funding for RANS development these days. The reality is that in industry most people use either the Spalart-Allmaras RANS model or the k-omega SST RANS model and both of these were developed in the 1990's!
15131 Comments -
Antonio Montano 🪄
Apollo Research released a new mechanistic interpretability approach. They use the loss landscape to identify computationally relevant features and interactions. Then, they build a full interaction graph and interpret it. Theory: arxiv.org/abs/2405.10927 Experimental: arxiv.org/abs/2405.10928 In the theory paper, they modify singular learning theory to study the loss landscape for interpretability. They find 3 ways that parts of the network can end up being irrelevant to performance. Networks that generalize well are likely to have lots of this irrelevant structure! They introduce the Local Interaction Basis (LIB) as a way to remove this irrelevant structure and to find the most computationally relevant features. They also connect the loss landscape to modularity, deriving a formula for finding modules inside a network. In the second paper, they then empirically test LIB. LIB is a coordinate transform into a basis where interactions are as sparse as possible. Intuitively, it’s like a fancy PCA where we orthogonalize the features and their interactions at the same time. After identifying the LIB features, they use integrated gradients to build a graph of feature interactions. They use the derived modularity metric to search for modules in the graph. On small models, e.g. a modular addition transformer and a CIFAR MLP, LIB graphs are smaller, more sparsely activated and more interpretable than a PCA baseline, with identifiable modules that correspond to separate computation. Code 👉 https://lnkd.in/dksJWb-X #machinelearning
1 -
Lewis Cole
This year's Nobel Prize in Physics has been jointly awarded to John Hopfield and Geoff Hinton - much has been written about the latter's contributions while the Hopfield Network is largely forgotten about. In my blog I wrote about the Hopfield Network - and how in particular you can view this as an application of a Spin Glass model (one of the most significant class of models in modern complexity theory). You can learn to code one from scratch to create a (very) rudimentary and (very) limited image recognition here: https://lnkd.in/ehkG4kxc The work that Hinton won the prize for is an extension of the Hopfield Network idea to create the Boltzmann Machine. This extension was largely adding stochasticity to the determinism of the Hopfield Network. #NobelPrize #Physics #AI #Complexity #Mathematics #SpinGlass #MachineLearning #HopfieldNetwork #NeuralNetwork
13 -
Pratyush Lohumi
🔍 Google's Gemma Scope: Illuminating the Inner Workings of Large Language Models 🧠 Google researchers have developed a powerful new tool called Gemma Scope, designed to shed light on how each layer in Gemma 2 large language models responds to input tokens. 📊 Key insights: - Sparse autoencoders (SAEs) can transform embeddings into interpretable representations - Each index in the transformed embedding corresponds to a distinct concept - SAE weights indicate the strength of each concept in the input Gemma Scope enables: - Manual and automatic labeling of concepts in each layer - Steering the model by adjusting SAE outputs to generate concept-specific text This groundbreaking tool paves the way for answering critical questions about LLMs, such as how fine-tuning and chain-of-thought prompting influence a model's internal representations. 🤔 #GemmaScope #LargeLanugageModels #TransformerInterpretability #GoogleAI #MachineLearningResearch
1 -
Hai Huang
This paper https://lnkd.in/eGaCaD7s shows that calibrating LLMs to the distribution of their training set always leads to hallucination, at a probability lower bounded by Good-Turing frequency estimation. Good-Turing frequency estimation https://lnkd.in/ewrh3RZC was developed to estimate the frequency of discovering a new species. Assuming the discovery of new species follows i.i.d., then its frequency can be reasonably estimated by the historical discovery rate, which is N_1 / N, where N_1 is the number of species observed only once (a.k.a., new discoveries), and N is the total number of observations. Now, calibrating LLMs forces them to emit only in-distribution facts (known species) and zeros out the probability of out-of-distribution facts (unknown species). Therefore, calibrated LLMs will hallucinate when the question is OOD, at a rate of N_1 / N. And there is no cure, as non-calibrated LLMs can be even worse. #artificialintelligence #machinelearning #deeplearning
20818 Comments -
Arjun Bansal
Evaluations of LLM applications is a big buzzword right now... but when do you really need new tools? Our latest blog post from Wenzhe Xue cuts through the noise to show with examples how built-in libraries such as pytest are often enough to get started. As complexity grows along the following dimensions, we're here to help! Metric based ➡ Human review Off the shelf eval or LLM as a judge ➡ Custom eval models trained on your data Offline ➡ Online, realtime https://lnkd.in/gGb3Sjah
22 -
Matthew Long
Needs review: https://lnkd.in/gpEiHaT3 To add to arXiv, need following: To endorse another user to submit to the math-ph (Mathematical Physics) archive, an arXiv submitter must have submitted 4 papers to math-ph earlier than three months ago and less than five years ago.
-
Anjor Kanekar
Spent some time today experimenting with GraphRAG - https://lnkd.in/eVqcVr3A It's a really interesting idea. Essentially, by employing a better data structure (like a graph in this case), one can improve the model's semantic understanding of the data. This is because now it has easy access to abstractions it didn't before. I indexed my favourite plasma physics paper that I read thousands of times during grad school and built simple RAGs - one the standard way, and one using GraphRAG and the difference is tremendous! To the question "what are the main themes of this article?", baseline RAG picked out different phrases and gave an answer that on the surface sounds reasonable but isn't quite there. It also mentioned one theme where the model just gets it completely wrong. Whereas, graph RAG did manage to extract a few relevant themes. Caveat: I hacked this together very quickly and haven't spent much time improving the data extraction. I reckon the graph RAG performance will improve a lot with better data. Planning to try and find some time in the coming days to (1) improve the data quality (2) build a nicer UX to easily compare the two applications The code and screenshots of results are on github under MIT license, as always: https://lnkd.in/eVkvJrcM
343 Comments -
Laszlo Sragner
Apparently, someone (maybe KPMG) is running a supply chain due diligence on pydantic on behalf of NASA. The thread is interesting. I link in the comment, but I summarise in a couple of points: - "This is so dumb. Do they know what OSS is?" Dumbness is not an issue in regulations. There was literally a supply chain attack recently on Linux, and if successful, a state actor would have gained access to half the internet (which makes you wonder what else is happening out there). You can read more about this if you search "hacker news xz," which is a fascinating story; I might write about it. - "KPMG ripping of NASA and the taxpayers" Maybe the astronauts and the rocket scientists don't want to spend their time on this? Outsourcing these are pretty standard to those poor people whose entire job is just this. (Shed a tear for their souls) - "Can you do it for free for NASA?" Why would he? They have an enormous budget. Pydantic must have a good place for that money. - "Did not OSS maintainers suffer enough?" They can ignore the request, and NASA will have no other choice but to either fork pydantic or recreate it from scratch. They have no other choice. They are not doing this because they have nothing better to do. - "AI will replace these people, can't wait for it." That's just mean. - "This is so dumb." They plan to send people to the Moon, not exactly a "move fast and break things" mission. Look at what happened when this principle was ignored at Boeing. NASA wants to ensure that if there is a bug or issue with pydantic, someone will be there to fix it. They also want to ensure that no nefarious actors have a chance to take down anything serious or at least have a contained blast radius. This is a negotiation. How much the maintainer can charge for this? What is the alternative? Recreate pydantic from scratch? How many lines of code? How many engineers and how much time to recreate? How much time does it take to review a fork and maintain compatibility? This will provide a BATNA ("Best Alternative to Negotiated Agreement) that you can use as a baseline. On the other hand, you have your costs to actually fulfil the above agreement. Some quoted lowball numbers, which indicate that the entire deal can be done in 1-2 weeks. It won't, and you will regret accepting the deal. There is a huge crisis in OSS, especially with single-person maintainers. Huge companies rely on these softwares, but because there is no business behind them, the individual probably can't or doesn't want to deal with the commercial side. They started doing it in their spare time and now things go out of hand. [continued in comments]
8940 Comments -
Andrew Ng
Much has been said about many companies’ desire for more compute (and data) to train large foundation models. I think it’s under-appreciated that we also have nowhere near enough compute available for inference on foundation models. Years ago, when I was leading teams at Google, Baidu, and Stanford that focused on scaling up deep learning, many semiconductor makers, data center operators, and researchers asked me if AI would continue to make good use of more compute if they kept delivering it. For many desktop workloads, like running a web browser, a faster CPU doesn’t help much beyond a certain point. So do we really need faster and faster AI processors? Each time, I confidently replied “yes!” and encouraged them to keep scaling up compute. (Sometimes, I added half-jokingly that I had never met a machine learning engineer who felt like they had enough compute. 😀) Fortunately, this prediction has been right so far. However, beyond training, we are also far from exhausting the benefits of faster and higher volumes of inference. Today, a lot of LLM output is for human consumption. A human might read around 250 words per minute, which is around 6 tokens per second (250 words/min / (0.75 words/token) / (60 secs/min)). So it might seem there’s little value to generating tokens much faster than this. But in an agentic workflow, an LLM might be prompted repeatedly to reflect on and improve its output, use tools, plan and execute multiple steps, or implement multiple agents that collaborate. So, we might generate hundreds of thousands of tokens or more before showing any output to a user. This makes fast token generation very desirable and makes slower generation a bottleneck to taking better advantage of existing models. That’s why I’m excited about the work of companies like Groq, which can generate hundreds of tokens per second. Recently, SambaNova also showed it can hit hundreds of tokens per second. Incidentally, faster, cheaper token generation will also help make running evaluations (evals), which can be slow and expensive today since it involves iterating over many examples, more palatable. Fortunately, both training and inference are rapidly becoming cheaper. I spoke with Cathie Wood and Charles Roberts of the investment firm ARK, which is famous for its bullish predictions on tech. They estimate that AI training costs are falling 75% a year. If they are right, a foundation model that costs $100M to train this year might cost $25M to train next year. Further, they report that for “enterprise scale use cases, inference costs seem to be falling at an annual rate of ~86%, even faster than training costs.” I don’t know how accurate these specific predictions will turn out to be, but with progress in both semiconductors and algorithms, I do see training and inference costs falling rapidly. This will be good for application builders and help AI agentic workflows lift off! [Original text: https://lnkd.in/dJ9tVGh7 ]
2,89880 Comments -
Saurabh Sarkar
LoLCATs: Demystifying Linearized Attention in Large Language Models In their recent blog post "How to Linearize LLMs for Me and You", researchers from Hazy Research introduced LoLCATs, a groundbreaking method to efficiently scale large language models by linearizing attention mechanisms. Large language models (LLMs) like GPT have achieved great success, but they can be extremely computationally expensive, primarily due to the self-attention mechanism, which grows quadratically with the input size. Enter LoLCATs (Learnable Linearized Attention Transformers), a new approach to make LLMs faster and more scalable. Let’s break down LoLCATs, including how they linearize attention, why this is important, and how it works in practice. Self-Attention: Why It’s Powerful but Expensive Traditional self-attention works by comparing every word in a sequence with every other word. In mathematical terms, this creates an N×N attention matrix, where N is the length of the sequence. If you have 1,000 words in a sentence, this requires 1,000,000 comparisons. As sequences get longer, the model has to perform exponentially more work, making it slow and resource-intensive. Linearized Attention: Making Attention Faster LoLCATs linearize attention by approximating this full pairwise comparison process. Instead of calculating all N^2 comparisons, LoLCATs reduce the complexity to O(N). Here’s how: Low-Rank Approximation: Rather than calculating the full attention matrix, LoLCATs decompose it into smaller, more manageable matrices. This low-rank approximation allows the model to compute relationships between tokens without comparing every pair. Factorized Attention: LoLCATs modify the attention mechanism to compute attention scores in two steps: First, it projects the inputs into the query and key spaces using efficient transformations (often kernel-based). Then, instead of computing the full QK^T product, it computes an approximate dot product between queries and keys, reducing the number of operations. The result is that instead of building a full attention map, LoLCATs compress the attention process, maintaining essential relationships without all the heavy computation. Learnable Linear Attention: LoLCATs introduce a learnable linear attention mechanism, allowing the model to improve its approximations over time through training. This makes the system adaptable and able to fine-tune its understanding of relationships between words as the model trains on more data. Why is This Important? By reducing attention complexity to linear time, LoLCATs make it feasible to handle much larger models and longer sequences. For instance, in traditional models, scaling to sequences of 10,000 or 100,000 tokens becomes practically impossible due to resource demands. LoLCATs, by comparison, handle these sequences much more efficiently, reducing both memory usage and processing time. Reference: https://lnkd.in/eJ9XGBmQ
82 Comments -
Ajay S.
Understanding causality is a formidable challenge for Large Language Models (LLMs) and is considered a pivotal benchmark for assessing their proximity to human cognitive abilities. Despite research efforts employing prompt-based training on causal relationships, achieving satisfactory results remains elusive. The absence of a standardized causal dataset for benchmarking exacerbates this issue. In light of this, a recent study proposes a solution in the form of CausalBench, a comprehensive benchmark designed to evaluate LLMs' understanding of causality. This benchmark addresses the limitations of existing evaluation methodologies, offering a diverse set of tasks derived from the causal research community. By encompassing three distinct causal learning tasks, CausalBench enables a thorough comparison of LLM performance against traditional causal learning algorithms. https://lnkd.in/dvV4B7FS
3 -
William W Collins
10 new citations to articles by Maja Pantic by 3NuS-ZhUKADclVaheTkTeXkml-ghkXierZhhZeX.Vhf@scholar-alerts.bounces.google.com via Maja Pantic ([Global] Quantum Computing) URL: https://ift.tt/lBWeTm3 [PDF] High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model W Zhong, J Lin, P Chen, L Lin, G Li - arXiv preprint arXiv:2408.05416, 2024 Audio-driven talking face video generation has attracted increasing attention due to its huge industrial potential. Some previous methods focus on learning a direct mapping from audio to visual content. Despite progress, they often struggle with the ambiguity of the mapping process, leading to flawed results. An alternative strategy involves facial structural representations (eg, facial landmarks) as intermediaries. This multi-stage approach better preserves the appearance details but suffers from … • Cites: Diffused heads: Diffusion models beat gans on talking-face … [PDF] Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation S Lei, X Cheng, M Lyu, J Hu, J Tan, R Liu, L Xiong… - Proceedings of the 62nd …, 2024 In the field of speech synthesis, there is a growing emphasis on employing multimodal speech to enhance robustness. A key challenge in this area is the scarcity of datasets that pair audio with corresponding video. We employ a methodology that incorporates modality alignment during the pre-training phase on multimodal datasets, uniquely facilitating zero-shot generalization through the process of freezing the video modality feature extraction component and the encoder … • Cites: Auto-avsr: Audio-visual speech recognition with automatic labels [PDF] A Comprehensive Survey on EEG-Based Emotion Recognition: A Graph-Based Perspective C Liu, X Zhou, Y Wu, Y Ding, L Zhai, K Wang, Z Jia… - arXiv preprint arXiv …, 2024 Compared to other modalities, electroencephalogram (EEG) based emotion recognition can intuitively respond to emotional patterns in the human brain and, therefore, has become one of the most focused tasks in affective computing. The nature of emotions is a physiological and psychological state change in response to brain region connectivity, making emotion recognition focus more on the dependency between brain regions instead of specific brain regions. A significant … • Cites: Estimation of continuous valence and arousal levels from faces in … Emotion recognition using hierarchical spatial-temporal learning transformer from regional to global brain C Cheng, W Liu, L Feng, Z Jia - Neural Networks, 2024 Emotion recognition is an essential but challenging task in human–computer interaction systems due to the distinctive spatial structures and dynamic temporal dependencies associated with each emotion. However, current approaches fail to accurately capture the intricate effects of electroencephalogram (EEG) signals across different brain regions on emotion recognition. Therefore, this paper des...
-
Shubrashankh Chatterjee
Quality of retrieval has a huge impact on how good the RAG system is, without this background lots of teams start thinking that its the LLM generation that is at fault but rather its the context that you might be adding while doing retrieval . Below example again from the wonderful paper https://buff.ly/4exxRtJ ,
1 -
Shubrashankh Chatterjee
Quality of retrieval has a huge impact on how good the RAG system is, without this background lots of teams start thinking that its the LLM generation that is at fault but rather its the context that you might be adding while doing retrieval . Below example again from the wonderful paper https://buff.ly/4exxRtJ ,
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More