How to find out what you can read from data – Data Scientist?
Hello,
I'm just wondering if, with our advancing technology, there already exists… a generic algorithm or ML model that allows one to derive real knowledge from a large amount of data without being specifically trained or programmed for those datasets (in the sense of domain knowledge).
As far as I know, algorithms CAN'T be that generic (please correct me if I'm wrong), and ML/AI models would need to be trained for that.
I'm also now wondering how data scientists work with data when the data is outside their domain of expertise? Or do they only work in one domain? Even for the task of data visualization (unless it's a very simple one), I would need some knowledge (or a guiding expert) of the domain to make it possible.
Is this correct or am I thinking wrong about these issues?
I asked your question unchanged ChatGPT, here is the result:
“Generic algorithms for knowledge from data:
Current algorithms and ML models are generally not completely generic. You need specific training on domain data to extract meaningful patterns and knowledge. There are progress in transfer learning and few-shot learning, but these models also require a certain adaptation to specific tasks.
Data scientists without domain knowledge:
Data scientists can also work with data from other domains, but they often need the support of domain experts to interpret the data correctly and formulate relevant questions. In some cases, however, a data scientist can apply general model techniques to different fields, but understanding of the data remains important.
Data visualization without domain knowledge:
For complex visualizations, domain knowledge is often required to interpret the data correctly and to select the correct visualizations. In simple visualizations (e.g. histograms, scatter diagrams), technical knowledge is sufficient, but for deeper insights, context knowledge is crucial.”
—
So, from here again man 😉
LLMs like ChatGPT can already be quite generic about the knowledge area when they are trained with wide-ranging data, but in the end they only create text. Other models create images or music, etc. The AI models differ considerably from one another because they are tailored to the type of input and output data.
Whether the result of ChatGPT is enough for your definition of “real knowledge” you have to decide.