Data Import And Preparation
Supported Data Entry Paths
SCCalc supports common desktop analysis workflows:
- Import CSV data through File -> Import Data.
- Open built-in sample datasets through File -> Open Sample Data.
- Generate practice data through Data -> Generate Random Dataset.
- Create an empty CSV through Data -> Create Empty CSV.
- Paste tabular values into the data editor.
- Use SCL commands for repeatable import and preparation workflows.
For support, teaching, and first-run verification, sample data is usually the safest starting point. For real research or operational work, keep the original source file unchanged and work from an imported project copy.
CSV Import Checklist
CSV is the primary interchange format. For best results:
- Export the source data as UTF-8 CSV.
- Keep the first row as column names when possible.
- Use one consistent delimiter throughout the file.
- Avoid merged headers, titles, subtotal rows, footnotes, and report-style formatting.
- Remove blank trailing rows and columns if the source system adds them.
- Prefer one observation per row and one variable per column.
- Review inferred variable types immediately after import.
If characters appear garbled, convert the source file to UTF-8 and import it again. SCCalc attempts common legacy encodings, but UTF-8 is the most reliable path for multilingual data.
Variable Metadata
After import, open Variable View and check:
- Variable names are unique, meaningful, and stable enough to reference in SCL.
- Labels describe survey items or measured constructs.
- Measurement levels match analysis requirements.
- Missing-value conventions are documented.
- Numeric variables imported as numeric values, not text.
- Categorical variables have expected labels and category counts.
Use neutral names and labels when you plan to share examples outside your team. Remove names, IDs, account numbers, email addresses, locations, free-text comments, and other direct or indirect identifiers before creating support examples.
Data Management Tools
The Data menu contains preparation tools that usually happen before analysis:
- Define Variable Properties for names, labels, types, and metadata.
- Named Ranges for reusable groups of cells or variables.
- Sort Cases for review and ordered workflows.
- Restructure -> Transpose, Pivot Wider, and Pivot Longer for reshaping data.
- Merge Files -> Add Cases or Add Variables for combining compatible data.
- Aggregate for summarized datasets.
- Split File, Select Cases, and Weight Cases for scoped analysis.
- Transform -> Transform Variables, Compute Variable, and Create Scale for derived variables and survey scales.
- Data Cleaning -> Clean Data and Identify Duplicates for quality review.
- Validate Data for a structured validation summary.
- Data Exploration -> Explore Data and Statistical Dashboard for preliminary inspection.
Run these tools intentionally. A filter, split, weight, or transformation can change every analysis that follows. Save the project or duplicate it before large changes.
Missing Data
Inspect missing data before analysis. Different procedures may use listwise deletion, pairwise deletion, imputation, or procedure-specific handling. Record the choice in your notes, SCL script, or project documentation so exported output can be interpreted later.
Before choosing an approach, check:
- Whether missingness is expected by design.
- Whether missing values use one convention or several conventions.
- Whether placeholder values such as 999, -99, or “N/A” imported as real data.
- Whether missingness differs by group, time, site, or condition.
- Whether the planned procedure reports omitted cases or warnings.
Data Quality Checks
Before inferential analysis, check for:
- Outliers and impossible values.
- Miscoded categories.
- Unexpected text in numeric columns.
- Duplicated identifiers.
- Empty or constant variables.
- Reverse-coded items that have not been reversed.
- Scale items with inconsistent direction.
- Date or time fields imported as plain text.
- Labels that reveal confidential study or organizational context.
Use Data -> Validate Data and Data -> Data Exploration for broad review, then inspect suspicious variables directly in Data Editor and Variable View.
Preparing Shareable Examples
When you need to demonstrate an issue, start with SCCalc sample data whenever possible. If the issue only appears in your own dataset, create a small synthetic file with the same shape:
- The same number of relevant columns.
- Similar variable types and measurement levels.
- The same kind of missing-data pattern.
- Similar category labels, but neutral names.
- Enough rows to reproduce the issue, not the whole original dataset.
Do not include real names, IDs, raw responses, confidential categories, private comments, secrets, passwords, tokens, private file paths, or unreleased results in files shared for support.