WORLD JOURNAL OF INNOVATION AND MODERN TECHNOLOGY (WJIMT )

E-ISSN 2504-4766
P-ISSN 2682-5910
VOL. 8 NO. 1 2024
DOI: https://doi.org/10.56201/wjimt.v8.no1.2024.pg62.71


Customized Web Application for Addressing Language Model Misalignment through Reinforcement Learning from Human Feedback

Anietie Uwah, Anthony Edet.


Abstract


Recent years have seen tremendous progress in the field of artificial intelligence, which has sparked the creation of cutting-edge tools like OpenAI ChatGPT. The OpenAI GPT -3 family of big language models serves as the foundation for ChatGPT, which is enhanced through the use of supervised and reinforcement learning methodologies. Its goal is to produce text that can't be distinguished from human-written information. It can hold conversations with users in a way that is surprisingly clear-cut and uncomplicated. Reinforcement Learning from Human Feedback (RLHF) is the technique employed. Human input and machine learning methods (Supervised Learning) are used to train the model. It is employed in the training phases to reduce biased, damaging, and false outputs. The resulting InstructGPT models are much better at following instructions than GPT-3. Above all, customized ChatGPT web application that can fine-tune a given input and generate text that is of high quality, harmless, truthful and appropriate, without biased outputs. A key motivation for our work is to increase helpfulness and truthfulness output while mitigating the harms and biases of language models. In conclusion, our results show that reinforcement learning from human feedback (RLHF) techniques is effective at significantly improving the alignment of general-purpose AI systems with human intentions.


keywords:

Artificial Intelligence; ChatGPT; OpenAI; Reinforcement Learning; Human Feedback; InstructGPT models


References:


Alessio, H.M.; Malay, N.; Maurer, K.; Bailer, A.J.; Rubin, B. ( 2018). Interaction of proctoring and
student major on online test performance. Int. Rev. Res. Open Distrib. Learn. 19, 166–185.
[CrossRef].

Azaria, A. (2022). ChatGPT usage and limitations. Preprint. DOI:10.13140/RG.2.2.26616.11526
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H. P. d. O., Kaplan, J., Edwards, H., Burda, Y.,
Joseph, N., Brockman, G., et al. (2021). Evaluating large language models trained on code.
arXiv preprint arXiv:2107.03374.

Chesterman, S. (2023). AI-generated content is taking over the world. But who owns it? The Straits
Times,
https://www.straitstimes.com/opinion/ai-generatedcontent-is-taking-over-the-
world-but-who-owns-it.

Christiano, P. F., Leike, J., Brown, T., Martic, M., Legg, S., and Amodei, D. (2017). Deep
reinforcement learning from human preferences. In Advances in Neural Information
Processing Systems, pages 4299–4307.


DOWNLOAD PDF

Back


Google Scholar logo
Crossref logo
ResearchGate logo
Open Access logo
Google logo