Skip to main content

pandas-profiling

Overview​

Pandas-profiling is a python library which can you help you profile any pandas dataframe. It will automatically generate insights about:

  • each of your columns
  • correlations between them
  • missing values
  • alerts

in your dataframe data.

You can think of it as a extension of pandas native describe command

Pandas profiling report is most often saved to HTML file to be inspected by data team and potentially data consumers. Below view of the example pandas-profiling output:

pandas_profiling_example

Uploading to re_cloud​

To collaborate on your pandas-profiling reports, you can easily upload it to re_cloud. Generating pandas profiling report is really easy:

from pandas_profiling import ProfileReport
...
profile = ProfileReport(df, title="Pandas Profiling Report")
profile.to_file("report.html")

Once generated you can upload report to the cloud:

re_cloud upload pandas-profiling --report-file report.html

re_cloud command​

Below we show all the currently supported options on how you can upload pandas-profiling to re_cloud

re_cloud upload pandas-profiling --name TEXT  --report-file TEXT

Options:
--channel-name-or-id TEXT The slack channel name to send the report
uploaded message if a slack account is connected
to the re_cloud account. It could be a channel
name, channel id or member id.
--name TEXT Name of the upload used for identification
--config-dir TEXT Path to the directory containing re_data.yml
config file
--report-file TEXT Pandas profiling file with html report
[required]
--help Show this message and exit.

For pandas profiling --report-file is required paramter. re_data will upload your docs in uncommitted/data_docs/local_site/ path then.

Next steps​

If you would like to jump into uploading data you can create your free account here 😊 if you have more questions for us: don't be reluctant to join our Slack! 😊