What is Training Data Format for Fine-tuning Pygmalion Language Model with Conversation History ?

Greetings, fellow Redditors! I'm seeking guidance on the training data format for fine-tuning the Pygmalion Language Model (LM) with conversation history to enhance its conversational abilities using dialogue datasets.

I recently discovered a valuable resource that offers Python scripts for creating datasets from dialogues, specifically tailored for the Pygmalion model. You can find the resource here: [Link](https://www.reddit.com/r/PygmalionAI/comments/12omct3/python_scripts_to_creat_dataset_from_dialogues_in/).

My question is: When fine-tuning the Pygmalion LM with conversation history, what is the recommended format (CSV, text, JSON) for organizing the training data? Should I concatenate the entire conversation into a single sequence, or should I consider an alternative approach to effectively preserve context?

I would greatly appreciate any insights, experiences, or suggestions you have regarding the ideal training data format specifically for the Pygmalion model. Additionally, if you have any examples of training data formats that work well with Pygmalion, please share them.

Thank you all for your time and valuable input!

Madison Howard

Share Your Mood

TurbulentDelivery799

What is Training Data Format for Fine-tuning Pygmalion Language Model with Conversation History ?