
As the Northern Hemisphere’s Spring Equinox is almost here (Wednesday 20 March 2024 at 1106 SGT), it is getting clearer to me that the crescendo around generative AI unleashed in November 2022, is peaking and entering the “valley of disillusionment” (a term articulated by one of those hype generating, pay-to-play entities).
There clearly is value in the idea of foundation models, whether it is trained on 1 million tokens or 100 billion. What is really needed is to ensure that we rapidly enable the 8 billion humans on earth to benefit from all of this and have agency over their data and the use of their data.
The for-profit entities, while there is a place for them, with their quarterly earnings motives cannot and should not be the ones that capitalize this innovation at the expense of the sovereignty of individual data and agency.
In 1987, when I was in grad school at Oregon State University, I was doing a CS minor in AI. The instructor was Bruce D’Ambrosio and I thoroughly enjoyed the course. It was all about autonomous agents, expert systems, LISP etc. Hindsight tells me that when I was doing AI it was during the 2nd AI winter. I could not find anything meaningful and useful to do in the AI field. I instead, dove head on into VLSI design, which then prompted me to pivot to software and more precisely, Free Software (later to become reframed with a marketing label as Open Source Software).
Many more things needed to have happened over the next thirty-seven years to bring us to 2024. The move from ARPAnet -> Internet, Linux, the Web, the commoditization of hardware (computer, networking, storage) and the explosion of content on the Web via blogs, digitized documents, forums, the laying of optical fibres and enormous amounts of undersea cables (here’s an interesting history of undersea cables and Singapore), the evolution of “smart” mobile devices. All of this progress was underpinned by open source tools and techniques. The pace at which all of this progress happened was precisely because of the democratization of software via the Free and Open Source movement.
The contents and the high enough speeds of connection, meant that one could scrape/harvest/”steal” contents (“steal” – if the contents were not on a Creative Commons and/or Open Source licenses and/or not in the public domain) and with the “big data” (of the late 2000s/early 2010s), the brute force method of predicting the “next word” – “Attention Is All You Need” – in 2017, has together, helped bring us to today’s generative AI hype and technology bubble.
Generative AI is a great idea. When used in a manner that respects the data privacy, with data provenance enabled, along with data governance, and deployed in specific, narrow areas of use, generative AI can be very good. Take an open source foundation model (those properly honouring the Open Source licensing terms), have that foundation model be trained on YOUR private corpus of information/data/documents, we have an excellent tool to help raise your productivity.
I have Ollama models running on my own systems (fully decked out Intel NUCs and Lenovo M910q – I will document it, but in the meantime, check out: https://mobiarch.wordpress.com/2024/02/19/run-rag-locally-using-mistral-ollama-and-langchain/). I am not interested in funneling my contents and data to a provider who might say one thing today about privacy etc and sing another tune tomorrow “it’s a business decision”.
All of this preamble to now bring to attention, the projects called Kwaai.ai and AI Verify Foundation.
This is all about empowering both you and I, do be custodians of our own data and ensuring that we will not be hijacked by those who look to monetizing our data at our expense.
AI Verify Foundation‘s whose raison d’etre is to build open source tools to test AI solutions for fairness, unbiasedness etc is seeking global collaboration on building these tools.
The work both Kwaai.ai and AI Verify Foundation are doing is very important.
It is, in many ways, akin to the launch of the Free Software Foundation in 1985 and the Open Source Initiative in 1998.
There are lots of initiatives globally, some overlapping areas, some in areas that are novel and interesting. Let’s all come together to do it right for all of us.
[Picture above is on a CC0 Public Domain license from Peter Griffin who has released this “Circle” image under Public Domain license. It means that you can use and modify it for your personal and commercial projects. If you intend to use an image you find here for commercial use, please be aware that some photos do require a model or a property release. Pictures featuring products should be used with care.]
It is very important that we ensure AI development is taking shape thru open source communities.
There has been a lot of push back to this approach.
I have been writing about the need for open source model in AI.
Sharing link to my latest blog that has links to other blogs.
It is important that we voice our strong support for developing various projects of AI thru open source communities.
Thanks, Sachin for your comment. There will always be push backs and FUD when anything is done in an open source manner. One has to just follow the money and it will be obvious where the concerns are coming from.
This is a very thoughtful and balanced article with a nice historical perspective. I fully agree that rather than focusing on the hype, we will be better served by focusing on responsible use of AI with the factors you outlined: “data privacy, with data provenance enabled, along with data governance, and deployed in specific, narrow areas of use”. Nicely done!
Thanks, Namee. We need to keep repeating these so that people can better appreciate what is at stake.