Introduction

The escalating use of Generative AI in content creation has given rise to the widespread dissemination of AI-generated content across blogs, articles, and newsletters. Some people might leverage it as a side income source, particularly when they incorporate Google Ads into their blogs featuring autogenerated content. However, the content generated using AI could mislead viewers because the inherent nature of Generative AIs are just predictive models; they lack ethical considerations. They might portray harmful and false information based on their dataset, and even detailed prompts failed to mitigate the bias and existing discrimination. To address this issue, I propose a project that leveraging my knowledge to categorise content, label them as either AI-generated or Human-written. With that, it will let reader to exercise caution when they are reading and analysing the content, bearing in mind that AI-labeled content is solely based on mathematical model.

My Plan

In this project, I am planning to do a Literature Review in the context of Large Langugae Model’s progression; identifying the types, advantages, disadvantages of models, and how throughout the years the more “advance” models are replacing the “old” ones. Then, I will generate the dataset for this project using different service provider’s APIs. Finally using National Supercomputing Center’s resources to run my LLM fine-tuning, and result comparison. Apart from the usual flow, I also would like to develop hybrid model and see if this approach will generate a better performance.

When research for similar work on classifying AI-generated content, I stumbled across a paper named - Arabic ChatGPT Tweets Classification Using RoBERTa and BERT Ensemble Model. One interesting approach is they use Hybrid transformer-based model. Developed by combining the hidden outputs of the RoBERTa and BERT models using a concatenation layer, then adding dense layers with Relu activation employed as a hidden layer to create non-linearity and a softmax activation function for multiclass classification.

I find their approach to be greatly inspirational! And if you’re ready, let’s explore together!