Synthetic Data Evaluation — II, Compare the time-series synthetic data generated with its origin.

hey hi there, i understand it's been a long journey through various ways to generate synthetic data from classical machine learning methods to the deep learning approach, and also we have seen how to evaluate our newly generated synthetic data. Before moving to our last topic “Evaluate Time Series Synthetic Data” i will light some key areas where we can use a data synthesizer.

First of all, as mentioned before can be used at a production level where we don't have access to the export. Secondly, a year back there was a huge demand for data in the field of computer vision like in autonomous cars to train the object detection model. Similarly in other medical bio life science fields to create simulations and so on so forth.

gartner

Read the article:

https://blogs.nvidia.com/blog/2021/06/08/what-is-synthetic-data/

Now since we can get an idea data synthesizer can be used in every way. Let’s complete our walkthrough with the Time Series Synthetic Data Evaluation method.

#load the dataset, in our case we can use the built-in demo datasetfrom sdv.metrics.demos import load_timeseries_demo
real_data, synthetic_data, metadata = load_timeseries_demo()

the metadata is the dictionary representation of the student_placements metadata will look somewhat like this

{'fields': {'start_date': {'type': 'datetime', 'format': '%Y-%m-%d'},
'end_date': {'type': 'datetime', 'format': '%Y-%m-%d'},
'salary': {'type': 'numerical', 'subtype': 'integer'},
'duration': {'type': 'categorical'},
'student_id': {'type': 'id', 'subtype': 'integer'},
'high_perc': {'type': 'numerical', 'subtype': 'float'},
'high_spec': {'type': 'categorical'},
'mba_spec': {'type': 'categorical'},
'second_perc': {'type': 'numerical', 'subtype': 'float'},
'gender': {'type': 'categorical'},
'degree_perc': {'type': 'numerical', 'subtype': 'float'},
'placed': {'type': 'boolean'},
'experience_years': {'type': 'numerical', 'subtype': 'float'},
'employability_perc': {'type': 'numerical', 'subtype': 'float'},
'mba_perc': {'type': 'numerical', 'subtype': 'float'},
'work_experience': {'type': 'boolean'},
'degree_type': {'type': 'categorical'}},
'constraints': [],
'model_kwargs': {},
'name': None,
'primary_key': 'student_id',
'sequence_index': None,
'entity_columns': [],
'context_columns': []}

Now further the Time Series evaluation is divided into a few different ways to evaluate.

  1. Detection Metrics: These metrics try to train a Machine Learning Classifier that learns to distinguish the real data from the synthetic data, and reports a score of how successful this classifier is.

The output will be in 1 minus the average ROC AUC score across all the cross-validation splits

from sdv.metrics.timeseries import LSTMDetection, TSFCDetection
LSTMDetection.compute(real_data, synthetic_data, metadata)
TSFCDetection.compute(real_data, synthetic_data, metadata)

2. Machine Learning Efficacy Metrics: these metrics will evaluate whether it is possible to replace the real data with synthetic data in order to solve a Machine Learning Problem by learning a Machine Learning model on the synthetic data and then evaluating the score which it obtains when evaluated on the real data.

from sdv.metrics.timeseries import TSFClassifierEfficacy
TSFClassifierEfficacy.compute(real_data, synthetic_data, metadata, target='region')

that’s it

I hope you enjoyed the article, again i tried my best to replicate and make it simpler with my intuition to bring more innovative solutions from across.

Here is the repo link if you wish to explore more about the package: https://sdv.dev/SDV/user_guides/evaluation/timeseries_metrics.html

If you find this article useful…. do browse my other techniques like Bagging Classifier, Voting Classifier, Stacking, and more I guarantee you will like them too. See you soon with another interesting topic.

Some of my alternative internet presences Facebook, Instagram, Udemy, Blogger, Issuu, and more.

Also available on Quora @ https://www.quora.com/profile/Rupak-Bob-Roy

Have a good day.

pexel


Comments

Popular Posts