User Tools

Site Tools


vd

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
vd [2021/06/18 13:02] dblumevd [2024/05/13 11:23] (current) – [Process Data] dblume
Line 25: Line 25:
 | ; | Extract regex to new column. Ex, ''(video|audio)'', ''(^..)'' or ''(^([STUVW]%%...%%|..))'' | | ; | Extract regex to new column. Ex, ''(video|audio)'', ''(^..)'' or ''(^([STUVW]%%...%%|..))'' |
 | %%^%% | rename the column. Might have to be "product_id" or "platform_id" | | %%^%% | rename the column. Might have to be "product_id" or "platform_id" |
 +| = | Use Python function to create new column. Ex, hex to dec: ''int(curcol,16)'' |
 +| : | Split column by regex |
 | - | Hide column | | - | Hide column |
 | S | Go to "Sheets" sheet, to select another sheet to format. | | S | Go to "Sheets" sheet, to select another sheet to format. |
Line 33: Line 35:
 | , | Select all rows that match this column's value | | , | Select all rows that match this column's value |
 | " | Open duplicate sheet with only selected rows | | " | Open duplicate sheet with only selected rows |
 +
 +===== Inspecting Columnar Data =====
 +
 +^ Key ^ Meaning ^
 +| I | Describe all columns, errors, distinct, mode, mean, median, stdev, etc. |
 +| i | Add a column of incrementing numbers (useful for '.' charts) |
 +| . | Requires an "important" numeric column for row data. Make chart. |
 +| O | Options to enable "numeric_binning" and set number of "histogram_bins" (use 'e') |
 +| F | Frequency table of row counts, or histogram if numeric_binning is true |
 +
 +Calculating a percentage-of-total column for a numeric column:
 +
 +^ Key ^ Meaning ^
 +| # | Set column type to "int". |
 +| I | Describe all columns. (Highlight the "sum" cell for the column of interest.) |
 +| ~ | Convert that column to "text" so it'll be copied correctly for pasting later. |
 +| zy | Yank the value of the sum. |
 +| q | Quit the Describe sheet. |
 +| = | New column. Enter ''curcol/'', use Ctrl+y to paste the column sum value. |
 +====== Case Study Link: Exported CSV from PG&E ======
 +
 +Visit [[vd-pge]].
 +
 +====== Cast Study: Merging Two Tables, logs and metadata ======
  
 ==== Protip: Use column view to set multiple columns at once ==== ==== Protip: Use column view to set multiple columns at once ====
Line 102: Line 128:
   $ vd --play=my_cmdlog.vd --replay-wait=0.5   $ vd --play=my_cmdlog.vd --replay-wait=0.5
  
 +====== Lists in Cells for Frequency Tables ======
  
 +Sometimes you want one of the columns in a Frequency Table to be a list of unique values. Let's say the column title is "my_column", then:
 +
 +^ Key ^ Meaning ^
 +| + | Set the aggregator to "List" |
 +| F | Make a Frequency Table for the selected column. (gF for selected columns) |
 +| =, ','.join(set(my_column)) | Create a new column of a comma delimited Python Set of cell entries. |
vd.1624046559.txt.gz · Last modified: 2023/04/12 20:44 (external edit)